This will be a "hands on" version of the discussion in Dr. Plank's notes. Just as in class on Tuesday, we'll be using separate chaining so you should have a vector of lists (of type string)
Instead of submitting actual code, submit a brief report (report.txt) on canvas of the runtime of each of the small toy programs below. Again, I've benchmarked this on tesla1 and have a sense of how long it should take.
This will explain more practically collisions, load factors, and the value of better hash functions using string data as input.
You will write a series of 10-15 line programs that have this skeleton:
#include <iostream> #include <vector> #include <list> #include <algorithm> #include <fstream> using namespace std; // from Dr. Plank's lecture notes from hashing, most simple hash function you can think of unsigned int bad_hash(const string &s) { size_t i; unsigned int h; h = 0; for (i = 0; i < s.size(); i++) { h += s[i]; } return h; } int main() { string line; int cnt; vector<list<string> > data; data.resize(200000); int h; int collisions = 0; while (getline(cin, line)) { h = bad_hash(line) % 200000; data[h].push_back (line); if (data[h].size() > 1) collisions++; } // compute load factor of hash table here }
In your groups, complete the following tasks:
We will test your code using the following rubric as pass/fail:
+0.5 Test values and runtimes reported +0.5 Questions above also answered in your report
I'm not going to run your code, so no worries using git. Copy and paste the skeleton above it you want to but its not required.
To submit your report, you must upload a report.txt on Canvas prior
to the deadline. We highly recommend that all members of a group upload
a version prior to the deadline as these likely will be on the final exam.