Challenge: Comparing different hash functions


Problem overview

This will be a "hands on" version of the discussion in Dr. Plank's notes. Just as in class on Tuesday, we'll be using separate chaining so you should have a vector of lists (of type string)

Instead of submitting actual code, submit a brief report (report.txt) on canvas of the answers to these questions.

Inspiration

This will explain more practically collisions, load factors, and the value of better hash functions using string data as input.

Input / Output

You will write a series of 10-15 line programs that have this skeleton:

#include <iostream>
#include <vector>
#include <list>
#include <algorithm>
#include <fstream>

using namespace std;

// from Dr. Plank's lecture notes from hashing, most simple hash function you can think of           
unsigned int bad_hash(const string &s)
{
  size_t i;
  unsigned int h;

  h = 0;

  for (i = 0; i < s.size(); i++) {
    h += s[i];
  }
  return h;
}

int main() {

  string line;
  int cnt;
  vector<list<string> > data;
  data.resize(200000);

  int h;
  int collisions = 0;

  while (getline(cin, line)) {

    h = bad_hash(line) % 200000;
    data[h].push_back (line);

    if (data[h].size() > 1)
      collisions++;

  }

  // compute load factor of hash table here

}

In your groups, complete the following tasks:

  1. We will be using the same list of 100k names from Dr. Plank we used earlier in lecture.
  2. Compute the collisions (see code above) and the load factor of your hash table using the "bad hash" function and include it in the report you will upload on Friday
  3. Also compute the min and max values of h in your code and include those values in your report.
  4. Swap out the bad hash function for DJB. Repeat steps 2 and 3. Which hash function do you think is better and why?
  5. Swap out the DJB hash function for ACM_hash. Repeat steps 2, 3 and 5 and include the values in the report you will upload on Canvas.
  6. In your own words, is the ACM_hash better, worse, or roughly the same as DJB? Why?

Rubric

We will test your code using the following rubric as pass/fail:

+1  Questions answered in your report 

Testing your code prior to submission

I'm not going to run your code, so no worries using git. Copy and paste the skeleton above it you want to but its not required.

Submission

To submit your report, you must upload a report.txt on Canvas prior to the deadline. We highly recommend that all members of a group upload a version prior to the deadline.