When you traverse a set, you use an iterator, just as you do with lists. Thus, the simple program simple_set.cpp employs a set to sort the lines of standard input:
#include <set> #include <iostream> using namespace std; main() { string s; set <string> names; set <string>::iterator nit; while(getline(cin, s)) names.insert(s); for (nit = names.begin(); nit != names.end(); nit++) { cout << *nit << endl; } } |
To repeat, instead of using push_back(), like you do with lists or vectors, you use insert(), which puts the string into the right place. The traversal is exactly like traversing a list.
UNIX> cat input-1.txt Jack Journey Mackenzie Olympia James Splotch Dylan Ache UNIX> simple_set < input-1.txt Dylan Ache Jack Journey James Splotch Mackenzie Olympia UNIX>The first question you should have is: "What about duplicate entries?" For example, let's try input-2.txt, which has two duplicate entries:
UNIX> cat input-2.txt John Bevy Xavier Ornately Nicholas Wyatt Fecund Max Inadvertent III John Bevy Max Inadvertent III UNIX> simple_set < input-2.txt John Bevy Max Inadvertent III Nicholas Wyatt Fecund Xavier Ornately UNIX>As you can see, it does not insert duplicates. If you want to allow duplicates, you use a multiset, as in simple_multiset.cpp. The only difference with this program is the declaration of names and nit:
multiset <string> names; multiset <string>::iterator nit; |
Everything else is the same, and the duplicate entries each get their own entry in the multiset:
UNIX> simple_multiset < input-2.txt John Bevy John Bevy Max Inadvertent III Max Inadvertent III Nicholas Wyatt Fecund Xavier Ornately UNIX>We can use the find() method of a set or multiset to see if an element is in the set or multiset. This is done in log time, which means very fast -- much faster than traversing all elements of the set to find it. Find() returns an iterator to the element in the set if it is found. If it is not in the set, it returns an iterator that equals the end() method.
Here's an example in simple_set_find.cpp:
#include <set> #include <fstream> #include <iostream> #include <cstdlib> using namespace std; main(int argc, char **argv) { string s; ifstream f; set <string> names; if (argc != 2) { cerr << "usage: simple_set_find file\n"; exit(1); } f.open(argv[1]); if (f.fail()) { perror(argv[1]); exit(1); } while(getline(f, s)) names.insert(s); f.close(); while (1) { cout << "Enter a name: "; cout.flush(); // Don't worry about this too much -- I do this // to make sure that the string is printed to the screen. // Sometimes, partial lines aren't printed immediately, // and cout.flush() forces the partial line to be printed. if (!getline(cin, s)) exit(0); if (names.find(s) == names.end()) { cout << s << " is not in the set.\n"; } else { cout << s << " is in the set.\n"; } } } |
The program reads a file and puts each line into a set. It then reads lines from standard input and prints whether the line is in the set. For example:
UNIX> cat input-3.txt Madelyn Psychotic Joseph Halverson Aidan Pooh Bailey Cycad Wyatt Advantageous UNIX> simple_set_find input-3.txt Enter a name: Aidan Pooh Aidan Pooh is in the set. Enter a name: Jim Plank Jim Plank is not in the set. Enter a name: <CNTL-D> UNIX>
map <string, int> names; map <string, int>::iterator nit; |
We'll write a simple example. This example assumes that input is as in Roster.txt: it is composed of first and last names of people. (Our example is all the NFL players in 2009 whose last names begin with "A", in random order). We'll use a map as declared above, and what we are going to do is keep track of the last names, and how many players have each last name. The program for this is in simple_map.cpp:
#include <stdio.h> #include <iostream> #include <string> #include <map> using namespace std; main() { map <string, int> names; map <string, int>::iterator nit; string fn, ln; while (cin >> fn >> ln) { nit = names.find(ln); if (nit == names.end()) { names.insert(make_pair(ln, 1)); } else { nit->second++; } } for (nit = names.begin(); nit != names.end(); nit++) { cout << "Last name: " << nit->first << ". Number of players: " << nit->second << endl; } } |
When you insert into a map, since you are inserting two things (a key and value), you must combine them into a pair with the make_pair() procedure. The types of the arguments must match the types specified in the declaration -- in this case, they must be a string and an integer.
The iterator for a map is different, too. Instead of simply specifying it with pointer indirection, you can grab the key from an iterator with "->first" and the value with "->second". Yes, I wish they were called key and val, but that is life. When we run it on Roster.txt, we get:
UNIX> simple_map < Roster.txt Last name: Abdallah. Number of players: 1 Last name: Abdullah. Number of players: 2 Last name: Abiamiri. Number of players: 1 Last name: Abraham. Number of players: 1 Last name: Adams. Number of players: 7 .....We can check for correctness with grep:
UNIX> grep Abdallah Roster.txt Nader Abdallah UNIX> grep Adams Roster.txt Gaines Adams Jamar Adams Anthony Adams Michael Adams Titus Adams Flozell Adams Mike Adams UNIX> grep Adams Roster.txt | wc 7 14 90 UNIX>Like sets, you traverse the maps in ascending order, and you can't insert duplicate keys. Since simple_map.cpp calls find() and only performs insert() when the key is not found, the limitation on duplicate keys is not a problem. If you need duplicate keys, use a multimap.
#include <cstdio> #include <iostream> #include <string> #include <set> using namespace std; main() { multiset <string> names; multiset <string>::iterator nit; string fn, ln, pn; int count; while (cin >> fn >> ln) names.insert(ln); for (nit = names.begin(); nit != names.end(); nit++) { if (nit != names.begin()) { if (*nit == pn) { count++; } else { printf("%-20s %d\n", pn.c_str(), count); pn = *nit; count = 1; } } else { pn = *nit; count = 1; } } if (names.size() > 0) printf("%-20s %d\n", pn.c_str(), count); } |
Compared to the map, that's a pretty convoluted piece of code. However, make sure that you can step through it and convince yourself that it works.
UNIX> head -n 10 Roster.txt Russell Allen Gaines Adams Aundrae Allison David Anderson Adrian Arrington Hamza Abdullah Tim Anderson Devin Aromashodu Asher Allen Eric Alexander UNIX> head -n 10 Roster.txt | nnames_multiset Abdullah 1 Adams 1 Alexander 1 Allen 2 Allison 1 Anderson 2 Aromashodu 1 Arrington 1 UNIX>
#include <stdio.h> #include <iostream> #include <string> #include <set> #include <map> using namespace std; typedef set <string> fnset; main() { map <string, fnset *> lnames; map <string, fnset *>::iterator lnit; fnset *fnames; fnset::iterator fnit; string fn, ln; while (cin >> fn >> ln) { lnit = lnames.find(ln); if (lnit == lnames.end()) { fnames = new fnset; lnames.insert(make_pair(ln, fnames)); } else { fnames = lnit->second; } fnames->insert(fn); } for (lnit = lnames.begin(); lnit != lnames.end(); lnit++) { fnames = lnit->second; for (fnit = fnames->begin(); fnit != fnames->end(); fnit++) { cout << *fnit << " " << lnit->first << endl; } } } |
The program uses a map to sort the last names. The "second" field of the map is a pointer to a set, which sorts the first names that belong to that last name. When you read in a name, you check the last name to see if it's in the map. If so, then it sets fnames to be the set of first names with that last name. If not, it creates a new fnames set and inserts it and the last name into the map. Last, it inserts the first name into the set.
When it's done reading input, it does a nested traversal to print out all of the names.
Note the typedef statement to make the program read more easily.
This program will not print out duplicate names, because sets don't hold duplicate entries. If you wanted it to print out duplicate names, you would have to use a multiset.
UNIX> sort_names_1 < Roster.txt | head Nader Abdallah Hamza Abdullah Husain Abdullah Victor Abiamiri John Abraham Anthony Adams Flozell Adams Gaines Adams Jamar Adams Michael Adams UNIX>
#include <stdio.h> #include <iostream> #include <string> #include <set> #include <map> using namespace std; typedef set <string> fnset; main() { map <string, fnset> lnames; map <string, fnset>::iterator lnit; fnset fnames; fnset::iterator fnit; string fn, ln; while (cin >> fn >> ln) { lnit = lnames.find(ln); if (lnit == lnames.end()) { lnames.insert(make_pair(ln, fnames)); } else { fnames = lnit->second; } fnames.insert(fn); } for (lnit = lnames.begin(); lnit != lnames.end(); lnit++) { fnames = lnit->second; for (fnit = fnames.begin(); fnit != fnames.end(); fnit++) { cout << *fnit << " " << lnit->first << endl; } } } |
This program is very buggy. Take a simple example:
UNIX> head -n 2 Roster.txt Adam Anderson Andy Alleman UNIX> head -n 2 Roster.txt | sort_names_1 Andy Alleman Adam Anderson UNIX> head -n 2 Roster.txt | sort_names_bad Adam Alleman UNIX>Yuck. What's going on? Well, two things. Let's concentrate on the most egregious. This is the fact that you reuse fnames to insert a name into the set, and then you use that same fnames when you insert a last name into the map. That's wrong. We don't need to use a variable to create a new set when you insert a last name into the map. Instead we can simply call the constructor for the set using fnset(): (sort_names_bad2.cpp)
#include <stdio.h> #include <iostream> #include <string> #include <set> #include <map> using namespace std; typedef set <string> fnset; main() { map <string, fnset> lnames; map <string, fnset>::iterator lnit; fnset fnames; fnset::iterator fnit; string fn, ln; while (cin >> fn >> ln) { lnit = lnames.find(ln); if (lnit == lnames.end()) { lnames.insert(make_pair(ln, fnset())); lnit = lnames.find(ln); } fnames = lnit->second; fnames.insert(fn); } for (lnit = lnames.begin(); lnit != lnames.end(); lnit++) { fnames = lnit->second; for (fnit = fnames.begin(); fnit != fnames.end(); fnit++) { cout << *fnit << " " << lnit->first << endl; } } } |
This one still doesn't work:
UNIX> head -n 2 Roster.txt | sort_names_bad2 UNIX>Why? The culprit lies in these two lines:
fnames = lnit->second; fnames.insert(fn); |
The first of these lines makes a copy of lnit->second; You insert the first name into the copy, which does not modifiy the fnset that is actually in lnit->second. To fix this, you need to either 1) insert directly into lnit->second by writing:
lnit->second.insert(fn); |
or you need to store a reference to the fnset stored in lnit->second by writing:
fnset &names = lnit->second; names.insert(fn); |
This is the first time that you have seen a reference variable used outside a function declaration. A reference variable is much like a pointer in that it contains the address of a variable or object (in this case it contains the address of a fnset object). However, unlike a pointer, you use the . operator to access an object's fields via a reference variable. Also, unlike a pointer variable, you cannot change the object to which the reference variable points. This means that you must initialize the reference variable when you declare it, as I have done above. Unfortunately reference variables are confusing because they act like a pointer variable, but use different syntax. Nonetheless, much C++ code is written using reference variables, and hence you should be introduced to reference variables, even though we don't want you to use them in your code for this course. Reference variables tend to be safer than pointers, because when used in conjunction with stack-allocated objects, they can be used to avoid the memory problems associated with heap-allocated objects. Unfortunately, when you use stack-allocated objects, it is often easy to inadvertently copy these objects. I have fixed the copy problem mentioned above in the following code. It still performs one copy of an fnset that should be avoided. Can you spot it? sort_names_bad3.cpp:
#include <stdio.h> #include <iostream> #include <string> #include <set> #include <map> using namespace std; typedef set <string> fnset; main() { map <string, fnset> lnames; map <string, fnset>::iterator lnit; fnset fnames; fnset::iterator fnit; string fn, ln; while (cin >> fn >> ln) { lnit = lnames.find(ln); if (lnit == lnames.end()) { lnames.insert(make_pair(ln, fnset())); lnit = lnames.find(ln); } fnset &names = lnit->second; names.insert(fn); } for (lnit = lnames.begin(); lnit != lnames.end(); lnit++) { fnames = lnit->second; for (fnit = fnames.begin(); fnit != fnames.end(); fnit++) { cout << *fnit << " " << lnit->first << endl; } } } |
At least this code works as it should:
UNIX> sort_names_1 < Roster.txt > out1.txt UNIX> sort_names_bad3 < Roster.txt > out2.txt UNIX> diff out1.txt out2.txt UNIX>Have you spotted the inadvertent copy? If you haven't, look at the for loop that prints out the names:
for (lnit = lnames.begin(); lnit != lnames.end(); lnit++) { fnames = lnit->second; for (fnit = fnames.begin(); fnit != fnames.end(); fnit++) { cout << *fnit << " " << lnit->first << endl; } } |
It is making copies of lnit->second. Even though it's not a bug, it's extremely inefficient in terms of both time and memory. We can fix it by declaring a second reference variable named printnames:
for (lnit = lnames.begin(); lnit != lnames.end(); lnit++) { fnset &printnames = lnit->second; for (fnit = printnames.begin(); fnit != printnames.end(); fnit++) { cout << *fnit << " " << lnit->first << endl; } } |
So, now you say, "Ok, it works. Why can't I do this?" When you get a job it is quite possible that you will be allowed to do so. However, in this class we want you to use the pointer approach instead. The reason is threefold. First, since reference variables cannot be re-used (i.e., you cannot make them point to different objects), you typically need to litter your code with declarations for reference variables, as I had to do above. Second, you'll find yourself forgetting to declare reference variables. Instead you will set variables to lnit->second and make copies when you don't mean to. In the worst case this will lead to logic errors in your program. In the slightly less worst case, you will create inadvertent copies of an object, which will both slow down your program, since the copy must be created, and will use excessive memory. Finally, reference variables are confusing and will almost certainly lead to problems with your code that you will find difficult to resolve. Leave them for when you are a more experienced C++ programmer. For the time being, get into the habit of using pointers in the second field of your maps.
pair<iterator, bool> set::insert(const TYPE& val); |
The "(const TYPE& val)" simply means that it works with type that you specify when you define the set.
The return value is a pair much like what you pass to the insert() call of a map. Its first field will be an iterator for the set, and the second will be a boolean. If the element is inserted, then the iterator will point to the newly inserted element. Otherwise, you tried to insert a duplicate, and the iterator is to the value already in the set. The second field reports whether the item was inserted or not.
To see usage, take a look at setreturn.cpp:
#include <set> #include <iostream> using namespace std; typedef set <string> string_set; main() { string s; string_set names; string_set::iterator nit; pair <string_set::iterator, bool> retval; while(getline(cin, s)) { retval = names.insert(s); if (retval.second) { cout << s << ": Successfully inserted.\n"; } else { cout << s << ": Duplicate not inserted.\n"; } } } |
Note how it returns a pair, whose fields you access with dots rather than arrows. Why then do you use arrows in iterators on maps? Because those iterators point to pairs -- they are not pairs themselves.
UNIX> cat input-2.txt John Bevy Xavier Ornately Nicholas Wyatt Fecund Max Inadvertent III John Bevy Max Inadvertent III UNIX> setreturn < input-2.txt John Bevy: Successfully inserted. Xavier Ornately: Successfully inserted. Nicholas Wyatt Fecund: Successfully inserted. Max Inadvertent III: Successfully inserted. John Bevy: Duplicate not inserted. Max Inadvertent III: Duplicate not inserted. UNIX>