That means that if you iterate through them, they will be in any arbitrary order. What you gain for that difference is speed:
Operation | Set/Map | Unordered Set/Map |
insert() | O(log n) | O(1) |
find() | O(log n) | O(1) |
erase() | O(log n) | O(1) |
begin() | O(1) | O(1) |
end() | O(1) | O(1) |
Traversal | O(n) | O(n) |
The programs take two files as arguments and a "Y/N" for printing. They read the words in the first file into a set or unordered set. Then they read words in the second file and try to find them. If you said "Y" for printing, then it prints everything. Otherwise, it simply prints how many words it found.
The code is straightforward, and you should have no trouble reading it:
#include <set> #include <iostream> #include <fstream> using namespace std; int main(int argc, char **argv) { ifstream data_file, to_find_file; bool print; set <string> data; set <string>::const_iterator f; string s; int found; /* Parse the command line. */ try { if (argc != 4) throw (string) "usage: store_find_set data_file to_find_file print(Y/N)\n"; data_file.open(argv[1]); if (data_file.fail()) throw (string) "can't open " + argv[1]; to_find_file.open(argv[2]); if (to_find_file.fail()) throw (string) "can't open " + argv[2]; print = (argv[3][0] == 'Y'); } catch (const string &s) { cerr << s << endl; return 1; } /* Read the data file. */ while (data_file >> s) data.insert(s); if (print) { cout << "Data:" << endl; for (f = data.begin(); f != data.end(); f++) cout << *f << endl; cout << endl; } data_file.close(); /* Read the to_find_file, and try to find each word in the data file */ found = 0; while (to_find_file >> s) { f = data.find(s); if (f != data.end()) found++; if (print) cout << s << ": " << ((f == data.end()) ? "Not found" : "Found") << endl; } if (print) cout << endl; cout << "Found " << found << endl; return 0; } |
First, let's run it and look at output. I have the following files:
UNIX> bin/store_find_set txt/phones-small.txt txt/pfind-small.txt Y Data: 009-759-6084 # The only difference in the two outputs is that this is sorted. 062-707-0682 161-804-8876 276-780-5793 366-672-5281 392-698-1589 639-049-9982 874-615-3750 927-211-9485 943-433-6132 067-449-4119: Not found 634-692-2465: Not found 087-310-7338: Not found 062-707-0682: Found 750-158-1494: Not found 927-211-9485: Found 639-049-9982: Found 366-672-5281: Found 158-103-1526: Not found 276-780-5793: Found Found 5 UNIX> bin/store_find_unordered txt/phones-small.txt txt/pfind-small.txt Y Data: 927-211-9485 # And this is not sorted. 639-049-9982 366-672-5281 161-804-8876 943-433-6132 276-780-5793 009-759-6084 062-707-0682 874-615-3750 392-698-1589 067-449-4119: Not found 634-692-2465: Not found 087-310-7338: Not found 062-707-0682: Found 750-158-1494: Not found 927-211-9485: Found 639-049-9982: Found 366-672-5281: Found 158-103-1526: Not found 276-780-5793: Found Found 5 UNIX>Now, let's time them on the big files:
UNIX> time bin/store_find_set txt/phones-big.txt txt/pfind-big.txt N Found 50000 real 0m0.522s user 0m0.516s sys 0m0.005s UNIX> time bin/store_find_unordered txt/phones-big.txt txt/pfind-big.txt N Found 50000 real 0m0.180s user 0m0.174s sys 0m0.005s UNIX>The difference is significant! Even though O(log n) is pretty small (in this example, it is 17), the difference is enough to make the programs run at significantly different speeds. The comparison is not as strong as it should be, because in both programs, reading the files takes a lot of time.
Regardless, I hope this is convincing enough to you to pay attention to these data structures and use them when you don't need sorting.