CS202 Lecture Notes

CS202 Lecture Notes - STL Sets and Maps

James S. Plank
Original Notes: September 3, 2009
Last revision: Mon Oct 4 12:38:25 EDT 2021
Directory: /home/jplank/cs202/Notes/SetMap

Sets and Maps are two very powerful parts of the STL. They let you do sorting and searching in log time, which gives you optimal performance with two extremely important functionalities (searching and sorting).

Topcoder Problems to help with Sets and Maps

SRM 685, D1, 250-Pointer: MultiplicationTable2. This is a straightforward usage of sets.
SRM 696, D2, 250-Pointer: Ropestring. This is a straightforward problem to let you practice multisets.
SRM 718, D2, 250-Pointer: RelativeHeights. Maps, sets and some string processing. This is a good practice problem.
SRM 615, D1, 250-Pointer: AmebaDiv1. This is another relatively straightforward set problem.
SRM 635, D1, 500-Pointer: StoryFromTCO. This is a straightforward multiset problem when you program it. Figuring out the solution is harder, but in the notes, I walk you through it. I recommend that you do this problem, as it is a good one to make you think algorithmically, using the multiset as a tool.

Sets

A set is an ordered collection of data, such as ints or strings. You may insert elements into the set, and then you may find them, or traverse the set in order. You do insertion just like calling push_back() or push_front() on a list. The difference is that the item goes into its proper place in the set, rather than on the back or front of a list.

When you traverse a set, you use an iterator, just as you do with lists. Thus, the simple program src/simple_set.cpp employs a set to sort the lines of standard input:

/* This program reads lines from standard input, and inserts each line into a set.
   It then traverses the set and prints the lines.  This has the effect of sorting
   standard input (while stripping duplicates). */

#include <set>
#include <iostream>
using namespace std;

int main()
{
  string s;
  set <string> names;
  set <string>::const_iterator nit;

  while(getline(cin, s)) names.insert(s);

  for (nit = names.begin(); nit != names.end(); nit++) {
    cout << *nit << endl;
  }

  return 0;
}

To repeat, instead of using push_back(), like you do with lists or vectors, you use insert(), which puts the string into the right place. The traversal is exactly like traversing a list.

UNIX> cat files/input_1.txt
Jack Journey
Mackenzie Olympia
James Splotch
Dylan Ache
UNIX> bin/simple_set < files/input_1.txt
Dylan Ache
Jack Journey
James Splotch
Mackenzie Olympia
UNIX>

The first question you should have is: "What about duplicate entries?" For example, let's try files/input_2.txt, which has two duplicate entries:

UNIX> cat files/input_2.txt
John Bevy
Xavier Ornately
Nicholas Wyatt Fecund
Max Inadvertent III
John Bevy
Max Inadvertent III
UNIX> bin/simple_set < files/input_2.txt
John Bevy
Max Inadvertent III
Nicholas Wyatt Fecund
Xavier Ornately
UNIX>

As you can see, it does not insert duplicates. If you want to allow duplicates, you use a multiset, as in simple_multiset.cpp. The only difference with this program is the declaration of names and nit:

multiset <string> names;
multiset <string>::const_iterator nit;

Everything else is the same, and the duplicate entries each get their own entry in the multiset:

UNIX> bin/simple_multiset < files/input_2.txt 
John Bevy
John Bevy
Max Inadvertent III
Max Inadvertent III
Nicholas Wyatt Fecund
Xavier Ornately
UNIX>

We can use the find() method of a set or multiset to see if an element is in the set or multiset. This is done in log time, which means very fast. I'll define "log time" more precisely below, but for now you should know that it is much faster than traversing all elements of the set to find it. Find() returns an iterator to the element in the set if it is found. If it is not in the set, it returns an iterator that equals the end() method.

Here's an example in src/simple_set_find.cpp

/* This program reads lines from a file and stores them in a set.
   It then queries the user for names, and uses the find() method to find them in the set.
   It returns whether or not it was successful. */

#include <set>
#include <fstream>
#include <iostream>
#include <cstdlib>
using namespace std;

int main(int argc, char **argv)
{
  string s, filename;
  ifstream f;
  set <string> names;

  /* Error check the command line. */

  try {
    if (argc != 2) throw((string) "usage: simple_set_find file");
    filename = argv[1];
    f.open(filename.c_str());
    if (f.fail()) throw((string) ("could not open " + filename));
  } catch (const string s) {
    cerr << s << endl;
    return 0;
  }

  /* Read the lines and insert them into the set */

  while(getline(f, s)) names.insert(s);
  f.close();

  /* Now query the user, and try to find the name.  Print out whether it was successful. */

  while (1) {
    cout << "Enter a name: ";
    cout.flush();                    // Don't worry about this too much -- I do this
                                     // to make sure that the string is printed to the screen.
                                     // Sometimes, partial lines aren't printed immediately,
                                     // and cout.flush() forces the partial line to be printed.
    if (!getline(cin, s)) return 0;
    if (names.find(s) == names.end()) {
      cout << s << " is not in the set.\n";
    } else {
      cout << s << " is in the set.\n";
    }
  }
}

The program reads a file and puts each line into a set. It then reads lines from standard input and prints whether the line is in the set. For example:

UNIX> cat files/input_3.txt
Madelyn Psychotic
Joseph Halverson
Aidan Pooh
Bailey Cycad
Wyatt Advantageous
UNIX> bin/simple_set_find files/input_3.txt
Enter a name: Aidan Pooh
Aidan Pooh is in the set.
Enter a name: Jim Plank
Jim Plank is not in the set.
Enter a name: <CNTL-D>
UNIX>

A digression: "Log" time

I've used the term "log time" above. What does that mean? It means that if there are n items in a set, then performing each find() operation takes roughly log₂n operations. This is nice, because log₂n is incremented by 1 when n doubles. To wit:

n	log₂n
2	1
4	2
8	3
16	4
32	5
64	6
128	7
256	8
512	9
1,024	10

n	log₂n
2,048	11
4,096	12
8,192	13
16,384	14
32,768	15
65,536	16
131,072	17
262,144	18
524,288	19
1,048,576	20

n	log₂n
2,097,152	21
4,194,304	22
8,388,608	23
16,777,216	24
33,554,432	25
67,108,864	26
134,217,728	27
268,435,456	28
536,870,912	29
1,073,741,824	30

As you can see, "log time" means fast. If my set has over a million entries, then it takes roughly 20 operations to find something. That's fast.

Maps

Although sets are nice, they are a little limited. Often we want to store key-value pairs, where we can search on the key and have data associated with a value. For that, we use a map. When you declare a map, you specify the type of the key and the value. For example, the following declaration is for a map whose keys are strings and whose values are integers. I also include the declaration for the map iterator as well.

map <string, int> names;
map <string, int>::iterator nit; // I didn't declare this as a const_iterator -- you'll see why.

We'll write a simple example. This example assumes that input is as in files/Roster.txt: it is composed of first and last names of people. (Our example is all the NFL players in 2021 whose last names begin with "A", in random order). We'll use a map as declared above, and what we are going to do is keep track of the last names, and how many players have each last name. The program for this is in src/simple_map.cpp:

/* This program reads first name / last name pairs, and keeps track of the last names
   in a map.  It uses an integer value in the map, and increments that value whenever
   it encounters a last name.  In other words, it keeps track of the number of people
   with each last name. */

#include <iostream>
#include <string>
#include <map>
using namespace std;

int main()
{
  map <string, int> names;
  map <string, int>::iterator nit;
  string fn, ln;
  
  /* Read in first name / Last names */

  while (cin >> fn >> ln) {

    /* Look up the last name in the map.  If we don't find the last name in the map,
       we insert it there with a value of 1.  Otherwise, we increment the value. */

    nit = names.find(ln);
    if (nit == names.end()) {
      names.insert(make_pair(ln, 1));
    } else {
      nit->second++;                // This statement is why nit cannot be a const_iterator.
    }
  }

  /* Print out the last names and the number of players */

  for (nit = names.begin(); nit != names.end(); nit++) {
    cout << "Last name: " << nit->first << ". Number of players: " << nit->second << endl;
  }

  return 0;
}

When you insert into a map, since you are inserting two things (a key and value), you must combine them into a pair with the make_pair() procedure. The types of the arguments must match the types specified in the declaration -- in this case, they must be a string and an integer.

The iterator for a map is different, too. Instead of simply specifying it with pointer indirection, you can grab the key from an iterator with "->first" and the value with "->second". Yes, I wish they were called key and val, but that is life. When we run it on files/Roster.txt, we get:

UNIX> bin/simple_map < files/Roster.txt | head
Last name: Abdullah. Number of players: 1
Last name: Aboushi. Number of players: 1
Last name: Abram. Number of players: 1
Last name: Abrams. Number of players: 1
Last name: Acy. Number of players: 1
Last name: Adams. Number of players: 13
Last name: Addae. Number of players: 1
Last name: Adderley. Number of players: 1
Last name: Addison. Number of players: 1
Last name: Adebo. Number of players: 1
UNIX>

We can check for correctness with grep:

UNIX> grep Aboushi files/Roster.txt
Oday Aboushi
UNIX> grep Adams files/Roster.txt
Matthew Adams
Josh Adams
Tyrell Adams
Myles Adams
Trey Adams
Andrew Adams
Montravius Adams
Davante Adams
Jonathan Adams
Rodney Adams
Paul Adams
Jamal Adams
Jerell Adams
UNIX> grep Adams files/Roster.txt | wc
      13      26     169
UNIX>

Like sets, you traverse the maps in ascending order, and you can't insert duplicate keys. Since src/simple_map.cpp calls find() and only performs insert() when the key is not found, the limitation on duplicate keys is not a problem. If you need duplicate keys, use a multimap.

A final word about the iterator. Since I am updating the contents of the map when I say "nit->second++", I cannot use a const_iterator. Were I to try to use a const_iterator for nit, I would get a compiler error.

Writing that last program with a multiset

As observed in class, we could have written that last program with a multiset or even a vector. Let's consider the multiset. Suppose we insert all the last names into the multiset. We then traverse the multiset, maintaing a string pn that holds the string in the previous element of the multiset, plus a count of the number of times that we have seen that string. If the current string equals the previous string, then we simply increment the count. Otherwise, we print the previous string and its count, and then reset the count. At the end of the traversal, we print out the last element. The code is in src/nnames_multiset.cpp:

/* This program outputs the same thing as src/simple_map.cpp, except it puts the
   last names into a multiset.  It then counts the duplicate names while traversing
   the multiset. */

#include <cstdio>
#include <iostream>
#include <string>
#include <set>
using namespace std;

int main()
{
  multiset <string> names;
  multiset <string>::const_iterator nit;
  string fn, ln, pn;
  int count;
  
  /* Read the last names into the multiset */

  while (cin >> fn >> ln) names.insert(ln);

  /* Traverse the multiset, keeping track of the previous name in the variable pn.
     When the current name is different from pn, print out the previous name and
     its count.  Otherwise, increment the count.  You need special code for the 
     first name in the multiset */

  for (nit = names.begin(); nit != names.end(); nit++) {
    if (nit == names.begin()) {                           // First name
      pn = *nit;
      count = 1;
    } else if (*nit == pn) {                              // Current name equals previous name
      count++;
    } else {                                              // Current name doesn't equal previous name
      cout << "Last name: " << pn << ". Number of players: " << count << endl;
      pn = *nit;
      count = 1;
    } 
  }

  /* You have to print the last name after the set traversal */

  if (names.size() > 0) {
    cout << "Last name: " << pn << ". Number of players: " << count << endl;
  }
  return 0;
}

Compared to the map, that's a pretty convoluted piece of code. However, make sure that you can step through it and understand how it works. Let's verify that bin/simple_map and bin/nnames_multiset produce the same output on files/Roster.txt by using MD5 hashes. If the outputs are identical, then the hashes will be the same. Otherwise, they will be different with an excruciatingly high probability:

UNIX> bin/simple_map < files/Roster.txt | openssl md5
(stdin)= f03386addee22995dec3d99c26329ad6
UNIX> bin/nnames_multiset < files/Roster.txt | openssl md5
(stdin)= f03386addee22995dec3d99c26329ad6
UNIX>

Nested Data Structures, Good Program Structure, Associative Arrays

The next program is a more detailed example of the type of program that you end up writing quite a bit. You have some data, and you want to process it in a variety of ways. The example that we'll use here continues the football theme. It's in the file files/2018-QB-Stats.txt, and it contains some statistics about NFL quarterbacks in the 2018 football season.

Let's take a look:

UNIX> head -n 5 < files/2018-QB-Stats.txt 
Jared Goff            LA       4688       101.1
Carson Wentz          PHI      3074       102.2
Mike Glennon          ARI       174       112.0
Matt Cassel           DET        59       26.3
Brock Osweiler        MIA      1247       86.0
UNIX>

The format of each line of the file is:

First-Name Last-Name Team Passing-Yards QB-Rating

There is one line for each quarterback who played in 2018, and the lines are in no particular order. Now, let's suppose that you are preparing for your 2019 fantasy football draft, and you want to crunch this data a little bit. In particular, suppose you want to be able to show:

The quarterbacks and their stats, sorted by last name, then first name.
The quarterbacks and their stats, sorted by rating.
The quarterbacks and their stats, sorted by team, and within each team sorted by yards.
Info on a particular quarterback when you ask for the quarterback by name.

Maybe you want more stuff. In the sections below, I show how I would organize this program, highlighting some things about maps along the way.

Step 1: Define the Quarterback class and read the QB's into a vector of pointers

My first step is to define a Quarterback class, which has all of the info for each Quarterback. It will also have a Print() method, which is what you call when you want to print each quarterback.

My next step is to define a vector of pointers to Quarterbacks. While this vector won't be sorted, it will be a convenient data structure that you can use when you want to do things that involve all of the quarterbacks.

We'll write this code and test it before moving on. It is in src/qb_1_read_input.cpp, and I won't put all of the code here, but instead, I'll show you the important parts. First is the Quarterback class, and its very simple Print() method.

/* The Quarterback class is quite simple -- data and a Print method. 
   We will construct the Name from the Firstname and Lastname. */

class Quarterback {
  public:
    string Name;
    string Firstname;
    string Lastname;
    string Team;
    int    Yards;
    double Rating;
    void Print() const;
};

void Quarterback::Print() const
{
  printf("%-25s %3s   Y: %4d    R: %5.1lf\n", Name.c_str(), Team.c_str(), Yards, Rating);
}

In the main() we open a file f and then from it, read quarterback entries. For each line, we create a new instance of the Quarterback class, and then put a pointer to it into a vector qbs. Here are the relevant variable declarations (there are more than these, but these are the ones that involve quarterbacks:

Quarterback *q;
vector <Quarterback *> qbs;

And here is the code that reads in the quarterbacks, then prints them out:

  /* Read the quarterbacks and put their pointers into qbs*/

  while(f >> fn >> ln >> team >> yards >> rating) {
    q = new Quarterback;
    q->Firstname = fn;
    q->Lastname = ln;
    q->Team = team;
    q->Yards = yards;
    q->Rating = rating;
    q->Name = q->Firstname + " " + q->Lastname;
    qbs.push_back(q);
  }

  /* Print out the quarterbacks to test the code. */

  for (i = 0; i < qbs.size(); i++) qbs[i]->Print();
  return 0;
}

To test it, we make sure that things look ok, and do some spot checking:

UNIX> bin/qb_1_read_input files/2018-QB-Stats.txt | head -n 5    # Make sure that the output looks right
Jared Goff                 LA   Y: 4688    R: 101.1
Carson Wentz              PHI   Y: 3074    R: 102.2
Mike Glennon              ARI   Y:  174    R: 112.0
Matt Cassel               DET   Y:   59    R:  26.3
Brock Osweiler            MIA   Y: 1247    R:  86.0
UNIX> bin/qb_1_read_input files/2018-QB-Stats.txt | grep Ben     # Spot check that Ben Roethlisberger's line is correct
Ben Roethlisberger        PIT   Y: 5129    R:  96.5
UNIX> grep Ben files/2018-QB-Stats.txt
Ben Roethlisberger    PIT      5129       96.5
UNIX> bin/qb_1_read_input files/2018-QB-Stats.txt | grep Mahomes # Spot check that Patrick Mahomes' line is correct
Patrick Mahomes            KC   Y: 5097    R: 113.8
UNIX> grep Mahomes files/2018-QB-Stats.txt         
Patrick Mahomes       KC       5097       113.8
UNIX> bin/qb_1_read_input files/2018-QB-Stats.txt | wc           # Make sure that it is producing the correct number of lines.
      69     483    3588
UNIX> wc files/2018-QB-Stats.txt 
      69     345    3256 files/2018-QB-Stats.txt
UNIX>

Step 2: Define a QBS class to manage the Quarterbacks, and add a map to it

The next step is one that none of us like to do, but I can tell you that it makes your life easier, especially when you do it at this point in your program. You're going to be manipulating your collection of quarterbacks a lot, so it's a good idea to create a class to manage quarterbacks. I do that in src/qb_2_qbs_class.cpp. Here's the class definition. Read the header comment for some more information.

/* The QBS class is to manage my quarterback data.  I use a default
   constructor, and then implement methods to read from a filename,
   and to find a Quarterback by name.  In the protected data, I have
   two data structures: QV, which is a vector of pointers, and QM,
   which is a map of quarterbacks keyed by name. */

class QBS {
  public:
    bool Read(const string &filename);
    const Quarterback *Find(const string &name) const;
  protected:
    vector <Quarterback *> QV;
    map <string, Quarterback *> QM;
};

Note, I didn't define a constructor. For that reason, it's very easy for me to simply declare an instance of the class as a local variable in main(). Instead, I have a Read() method which reads from the file, and creates the vector of pointers QV, just like in the main() above. At the end, it runs through the vector and inserts the quarterbacks into the map. Here's the important code:

bool QBS::Read(const string &filename)
{
  /* Read in the quarterbacks from the file, putting the pointers into the QV vector. */

  ....  I am omiting that code, because it's just like in the main() above.

  /* Now create the map QM, treating it like an associative array. */

  for (i = 0; i < QV.size(); i++) {
    q = QV[i];
    QM[q->Name] = q;
  }

  /* Close the file and return success. */
  f.close();
  return true;
}

One of the really convienient (and sometimes dangerous) features of a map is that you can treat it like an associative array. In this case, the string is the "index". This is more overloading from the standard template library. What is really going on here is the following:

The find() method of QM is called to see if q->Name is in the map.
If it's not there, then QM.insert(make_pair(q->Name, q)) is called to put q into the map.
If it is there, then the val gets set to q.

You can only do this on a map and not a multimap (because the map has unique keys). You'll find yourself using this feature a lot. It often helps with readability, but again, it can get you in trouble, so you need to pay attention to what you're doing when you use it.

The Find() method is really straightforward:

/* The find() simply calls find() on the map, returning NULL if it can't find the quarterback. */

const Quarterback *QBS::Find(const string &name) const
{
  map <string, Quarterback *>::const_iterator qit;

  qit = QM.find(name);
  if (qit == QM.end()) return NULL;
  return qit->second;
}

Let's talk about the three const declarations here:

It returns a (const Quarterback *). That means that whoever uses the return value cannot modify what it points to. That's a good feature to use when you can, because it helps you find bugs, by restricting how you use the pointer.
The argument is (const string &name). This is nothing new.
The procedure is declared const. That means that Find() doesn't modify the QBS class. Again, the compiler enforces this.

There's nothing exciting in the main(). It simply processes lines of standard input in a way that we're used to -- breaking the line up into words and putting the words into a vector of strings named sv. For now, we process three commands, "F" for Find, "Q" for Quit and "?" to print the commands. I'm only going to highlight two things about the main(). First, it declares an instance of QBS as a local variable, and it also declared a (const Quarterback *q), which will be the return value of qbs.Find():

QBS qbs;
const Quarterback *q;

Second, when we call Find(), we can't modify q. Since we declared the Print() method to be const, the compiler knows that calling q->Print() won't violate the fact that it is const:

    } else if (sv[0] == "F") {
      if (sv.size() != 3) {
        printf("usage: F firstname lastname\n");
      } else {
        name = sv[1] + " " + sv[2];
        q = qbs.Find(name);                   // Find the quarterback
        if (q == NULL) {
          printf("Not there.\n");
        } else {
          q->Print();                      // Since Print() is const, this is ok
        }
      }
    }

Let's run it, and make sure that it finds the quarterbacks correctly:

UNIX> bin/qb_2_qbs_class files/2018-QB-Stats.txt
QB> ?
F name -- Find the quarterback with the given name.
Q      -- Quit.
?      -- Print the commands.
QB> F Patrick Mahomes
Patrick Mahomes            KC   Y: 5097    R: 113.8
QB> F Carson Wentz
Carson Wentz              PHI   Y: 3074    R: 102.2
QB> F Jim Plank
Not there.
QB> Q
UNIX>

An aside - more array associativity, and more const

Above we treated the map like an associative array so that we could insert/set a value. You can also treat an associative array to find a value. We try to use that feature when we implement Find() in src/qb_2a_bad_find.cpp:

const Quarterback *QBS::Find(const string &name) const
{
  return QM[name];
}

There are two problems here, both related. The first is that the compiler won't allow this. When we try to compile it, it will fail:

UNIX> make bin/qb_2a_bad_find
g++ -Wall -Wextra -o bin/qb_2a_bad_find src/qb_2a_bad_find.cpp
src/qb_2a_bad_find.cpp:90:12: error: no viable overloaded operator[] for type 'const map<string,
      Quarterback *>' (aka 'const map<basic_string<char, char_traits<char>, allocator<char> >,
      Quarterback *>')
  return QM[name];
........

The reason it fails is the second problem -- when name isn't in the map, what happens is that it is inserted into the map with a default value of NULL, and then NULL is returned. Since that modifies the map, that is why you can't have the procedure be const. However, if you remove the const, now when I look up a name that is not in the map, that name is inserted into the map. That's going to mess things up here, so I'm glad that by using const correctly, we've avoided this bug!

Step 3: Sorting the quarterback by rating -- the buggy version

We're going to add three methods to our class to do the various printouts:

/* I'm adding the methods Print_By_Rating(), Print_By_Name() and Print_By_Team(). */

class QBS {
  public:
    bool Read(const string &filename);
    const Quarterback *Find(const string &name) const;
    void Print_By_Rating() const;
    void Print_By_Name() const;
    void Print_By_Team() const;
  protected:
    vector <Quarterback *> QV;
    map <string, Quarterback *> QM;
};

In src/qb_3_sort_by_rating_bad.cpp, we only implement Print_By_Rating(), and we love the convenience of associative arrays so much that we use them here. We'll use a temporary map keyed by double, into which we'll insert the negation of the quarterback rating. That will have the map store the ratings from high to low. We then print the map, and since it is a local variable, when the method returns, the map is deallocated. That's nice.

void QBS::Print_By_Rating() const
{
  Quarterback *q;
  size_t i;
  map <double, Quarterback *> m;
  map <double, Quarterback *>::const_iterator mit;

  for (i = 0; i < QV.size(); i++) {
    q = QV[i];
    m[-q->Rating] = q;
  }

  for (mit = m.begin(); mit != m.end(); mit++) {
    mit->second->Print();
  }
}

We run it, and it appears to be working ok: (I added extra commands for printing to the main(), and I made the prompt optional, so that I can call the program in non-interactive mode)

UNIX> bin/qb_3_sort_by_rating_bad                                           # I changed the command line processing, so test that.
usage: qb_2_qbs_class file [prompt]
UNIX> bin/qb_3_sort_by_rating_bad files/2018-QB-Stats.txt "QB>"             # Test setting the prompt, and the printing of new commands
QB> ?
F name -- Find the quarterback with the given name.
R      -- Print the quarterbacks by rating.
N      -- Print the quarterbacks by name (last, first).
T      -- Print the quarterbacks by team, then by yardage.
Q      -- Quit.
?      -- Print the commands.
QB> Q
UNIX> echo R | bin/qb_3_sort_by_rating_bad files/2018-QB-Stats.txt | head   # Take a look at the printout by rating -- it looks good!
Nate Sudfeld              PHI   Y:   22    R: 129.2
Matt Barkley              BUF   Y:  232    R: 117.4
Drew Brees                 NO   Y: 3992    R: 115.7
Patrick Mahomes            KC   Y: 5097    R: 113.8
Kyle Allen                CAR   Y:  266    R: 113.1
UNIX>

Let's do a spot check, though, to make sure that all of the quarterbacks are being printed. Something's wrong:

UNIX> echo R | bin/qb_3_sort_by_rating_bad files/2018-QB-Stats.txt | wc
      68     476    3536
UNIX> wc files/2018-QB-Stats.txt
      69     345    3256 files/2018-QB-Stats.txt
UNIX>

We're missing a quarterback, because we're only getting 68 lines of output, and we should be getting 69. Who are we missing? Here's how I'd find out -- I'm going to isolate the names of the output, and of the original file, and sort them. The difference between the two files will give me the answer:

UNIX> echo R | bin/qb_3_sort_by_rating_bad files/2018-QB-Stats.txt | sed 's/  .*//' | sort | head -n 5
AJ McCarron
Aaron Rodgers
Alex Smith
Andrew Luck
Baker Mayfield
UNIX> echo R | bin/qb_3_sort_by_rating_bad files/2018-QB-Stats.txt | sed 's/  .*//' | sort > tmp-1.txt
UNIX> sed 's/  .*//' files/2018-QB-Stats.txt | sort | head -n 5
AJ McCarron
Aaron Rodgers
Alex Smith
Andrew Luck
Andy Dalton
UNIX> sed 's/  .*//' files/2018-QB-Stats.txt | sort > tmp-2.txt
UNIX> diff tmp-1.txt tmp-2.txt
4a5
> Andy Dalton
UNIX>

What gives with Andy Dalton? Well, let's take a look at his quarterback rating, and what our Print_By_Rating() gives us:

UNIX> grep Dalton files/2018-QB-Stats.txt 
Andy Dalton           CIN      2566       89.6
UNIX> echo R | bin/qb_3_sort_by_rating_bad files/2018-QB-Stats.txt 
# ...... skipping a bunch of lines
Jimmy Garoppolo            SF   Y:  718    R:  90.0
Matthew Stafford          DET   Y: 3777    R:  89.9
Sean Mannion               LA   Y:   23    R:  89.6
Brock Osweiler            MIA   Y: 1247    R:  86.0
Alex Smith                WAS   Y: 2180    R:  85.7
# ...... skipping a bunch of lines
UNIX>

Do you see what happened? There's another quarterback, Sean Mannion, who has a rating of 89.6. Since I used a map and not a multimap, I can't store both Sean Mannion and Andy Dalton in the map, with their ratings of 89.6. That's a rough bug, and I'm lucky that I had two quarterbacks with the same rating to find it. (And that I did the extra testing).

Step 4: Using a multimap to implement Print_By_Rating() correctly.

We fix the bug by using a multimap instead of the map, and using insert() rather than the associative array feature. We can't use associative arrays with multimaps. The code is in src/qb_4_sort_by_rating.cpp. Here's Print_By_Rating():

/* I'm using a multimap now to handle duplicate ratings. */

void QBS::Print_By_Rating() const
{
  Quarterback *q;
  size_t i;
  multimap <double, Quarterback *> m;
  multimap <double, Quarterback *>::const_iterator mit;

  for (i = 0; i < QV.size(); i++) {
    q = QV[i];
    m.insert(make_pair(q->Rating, q));
  }

  for (mit = m.begin(); mit != m.end(); mit++) {
    mit->second->Print();
  }
}

Now it prints out all 69 entries:

UNIX> echo R | bin/qb_4_sort_by_rating files/2018-QB-Stats.txt | wc
      69     483    3588
UNIX>

Step 5: Implementing Print_By_Name()

With Print_By_Name(), we want to sort the quarterbacks by their last names, and then if two quarterbacks have the same last name, we sort them by their first names. I'm going to do this by having the following structure:

I will have a map keyed by last name, whose vals are also maps.
The maps in the vals are keyed by first name, and their vals are pointers to quarterbacks.
When I print out the quarterbacks, I'm going to do a nested traversal. At the first level, I traverse the last-name map. Then for each of these nodes, we traverse the first-name map.

The code is in src/qb_5_print_by_name.cpp. Here is Print_By_Name(), and you can see that it uses the associative array feature of maps. This is really nice, because if a last name is not in the map, it is created with an empty map as a val, and then the first name is inserted into the empty map with the quarterback as the val. That's really nice!

/* Here's the newly implemented method, with a two-level tree. */

void QBS::Print_By_Name() const
{
  map <string, map <string, Quarterback *> > m;
  map <string, map <string, Quarterback *> >::const_iterator mit;
  map <string, Quarterback *>::const_iterator qit;
  size_t i;
  Quarterback *q;

  /* Traverse the vector, and insert the quarterbacks into the map.
     As you can see, I'm using the associative array feature to do this rather easily. */

  for (i = 0; i < QV.size(); i++) {
    q = QV[i];
    m[q->Lastname][q->Firstname] = q;
  }

  /* Now do a nested traversal, and print out the quarterbacks. */

  for (mit = m.begin(); mit != m.end(); mit++) {
    for (qit = mit->second.begin(); qit != mit->second.end(); qit++) {
      qit->second->Print();
    }
  }
}

Here we see it working, since there are two quarterbacks whose last names are "Allen".

UNIX> echo N | bin/qb_5_print_by_name files/2018-QB-Stats.txt | head 
Josh Allen                BUF   Y: 2074    R:  67.9
Kyle Allen                CAR   Y:  266    R: 113.1
Derek Anderson            BUF   Y:  465    R:  56.0
Matt Barkley              BUF   Y:  232    R: 117.4
C.J. Beathard              SF   Y: 1252    R:  81.8
Blake Bortles             JAX   Y: 2718    R:  79.8
Sam Bradford              ARI   Y:  400    R:  62.5
Tom Brady                  NE   Y: 4355    R:  97.7
Drew Brees                 NO   Y: 3992    R: 115.7
Teddy Bridgewater          NO   Y:  118    R:  70.6
UNIX> echo N | bin/qb_5_print_by_name files/2018-QB-Stats.txt | wc
      69     483    3588
UNIX> vi index.html

Do you find that last for loop ugly? You may be tempted to use a temporary variable, like:

map <string, Quarterback *> tmp;

And then, do the following for that last for loop:

for (mit = m.begin(); mit != m.end(); mit++) {
  tmp = mit->second;
  for (qit = tmp.begin(); qit != tmp.end(); qit++) {
    qit->second->Print();
  }
}

The code will produce correct output, but you need to know that it makes a copy of mit->second, which is inefficient. If you want to do this correctly, you can have tmp be a pointer:

const map <string, Quarterback *> *tmp;  // This has to be const because mit is a const_iterator

.... and later:

for (mit = m.begin(); mit != m.end(); mit++) {
  tmp = &mit->second;
  for (qit = tmp->begin(); qit != tmp->end(); qit++) {
    qit->second->Print();
  }
}

You can also use a reference variable, but then you would need to declare it inside the for loop, and since I don't approve of that, I won't show it. You can google it...

Step 6: Implementing Print_By_Team()

Now, Print_By_Team() is similar. You want to print the quarterbacks sorted by team, and within a team, you want them sorted by yardage, highest to lowest. So, we're going to have the following structure:

We will have a map keyed by team name, with multimaps in the vals.
The multimaps in the vals are keyed by negated yardage, and their vals are pointers to quarterbacks.
When I print out the quarterbacks, I'm going to do a nested traversal. At the first level, I traverse the team map. Then for each of these nodes, we traverse the yardage multimap.

The code is in src/qb_6_print_by_team.cpp, and here is Print_By_Team():

void QBS::Print_By_Team() const
{
  map <string, multimap <int, Quarterback *> > m;
  map <string, multimap <int, Quarterback *> >::const_iterator mit;
  const multimap <int, Quarterback *> *tmp;
  multimap <int, Quarterback *>::const_iterator qit;
  size_t i;
  Quarterback *q;

  /* Traverse the vector, and insert all of the Quarterbacks into the map. */

  for (i = 0; i < QV.size(); i++) {
    q = QV[i];
    m[q->Team].insert(make_pair(-q->Yards, q));
  }

  /* Again we have a double-nested loop to print the map -- I'm using a
     temporary variable this time, which is a pointer, so that it doesn't make a copy. */
  
  for (mit = m.begin(); mit != m.end(); mit++) {
    tmp = &mit->second;
    for (qit = tmp->begin(); qit != tmp->end(); qit++) {
      qit->second->Print();
    }
  }
}

The code is similar to Print_By_Name(), except we have to use insert(), because we cannot treat a multimap as an associative array. You'll note that I use a temporary variable to traverse the loop, as advocated in the section above.

It looks good:

UNIX> echo T | bin/qb_6_print_by_team files/2018-QB-Stats.txt | head
Josh Rosen                ARI   Y: 2278    R:  66.7
Sam Bradford              ARI   Y:  400    R:  62.5
Mike Glennon              ARI   Y:  174    R: 112.0
Matt Ryan                 ATL   Y: 4924    R: 108.1
Matt Schaub               ATL   Y:   20    R:  74.1
Joe Flacco                BAL   Y: 2465    R:  84.2
Lamar Jackson             BAL   Y: 1201    R:  84.5
Robert Griffin-III        BAL   Y:   21    R:  44.4
Josh Allen                BUF   Y: 2074    R:  67.9
Derek Anderson            BUF   Y:  465    R:  56.0
UNIX> echo T | bin/qb_6_print_by_team files/2018-QB-Stats.txt | wc
      69     483    3588
UNIX>

Step 7: On Destructors, Copy Constructors and the Assignment Overload

At this point, were I writing this program, I'd be done. You might ask, "Don't you need a destructor, to delete what you've allocated with new?" The answer is that you don't here, because all memory is deallocated when the program exits. However, suppose that you wanted other programs to use the QBS class. You would put the two class definitions into a header file, and the method implementations into a .cpp file. And you would have to write a destructor, because you don't know how others will use the class. The destructor should free the memory. The code is in src/qb_7_destructor_etc.cpp:

class QBS {
  public:
    ~QBS();                               // Destructor
    QBS();                                // You need to specify a regular constructor, that does nothing
    QBS(const QBS &qbs) = delete;         // Disable the copy constructor
    QBS& operator= (QBS &qbs) = delete;   // Disable the assignment overload

    bool Read(const string &filename);
    const Quarterback *Find(const string &name) const;
    void Print_By_Rating() const;
    void Print_By_Name() const;
    void Print_By_Team() const;
  protected:
    vector <Quarterback *> QV;
    map <string, Quarterback *> QM;
};

QBS::QBS() {}

/* The destructor needs to delete what it allocated.
   You'll note, I don't need to clear QV or QM, because that is done automatically. */

QBS::~QBS()
{
  size_t i;

  for (i = 0; i < QV.size(); i++) delete QV[i];
}

You'll note, I disabled the copy constructor and assignment overload (and I compile with -std=c++11, because this is a C++ 11 feature). If I wanted to enable them, I would have to have them allocate new quarterbacks, copy them from the old quarterbacks, and then remake QM. Here's example code for a copy constructor:

QBS::QBS(const QBS &qbs)
{
  size_t i;
  Quarterback *q;

  for (i = 0; i < qbs.QV.size(); i++) {
    q = new Quarterback;
    *q = *(qbs.QV[i]);
    QV.push_back(q);
    QM[q->Name] = q;
  }
}

You'll note, this code isn't in src/qb_7_destructor_etc.cpp, though -- I just provide it as an example, in case you care what a copy constructor would look like.

The return value of insert()

In class we looked at the prototype for the insert() method of a set (not a multiset):

    pair<iterator, bool> set::insert(const TYPE& val);

The "(const TYPE& val)" simply means that it works with type that you specify when you define the set.

The return value is a pair much like what you pass to the insert() call of a map. Its first field will be an iterator for the set, and the second will be a boolean. If the element is inserted, then the iterator will point to the newly inserted element. Otherwise, you tried to insert a duplicate, and the iterator is to the value already in the set. The second field reports whether the item was inserted or not.

To see usage, take a look at src/setreturn.cpp:

/* This shows how to look at the return value of insert in a set (a map is similar) */

#include <set>
#include <iostream>
using namespace std;

int main()
{
  string s;
  set <string>  names;
  set <string>::iterator nit;
  pair <set <string>::iterator, bool> retval;

  while(getline(cin, s)) {
    retval = names.insert(s);
    if (retval.second) {
      cout << s << ": Successfully inserted.\n";
    } else {
      cout << s << ": Duplicate not inserted.\n";
    }
  }
  return 0;
}

Note how it returns a pair, whose fields you access with dots rather than arrows. Why then do you use arrows in iterators on maps? Because those iterators point to pairs -- they are not pairs themselves.

UNIX> bin/setreturn
James Plank
James Plank: Successfully inserted.
James Plank
James Plank: Duplicate not inserted.
UNIX>

Summary

Ok -- we've learned a ton in this lecture. Let's summarize:

A set is a data structure where you perform storage and retrieval. Each of those operations is log time in the number of elements in the set. That means that the operations are fast.
Sets don't store duplicate elements. Multisets store duplicate elements.
A map is a data structure like a set, but you associate a value (which I call a "val") with each element (which I call a "key"). Like the sets, storage and retrieval is log time.
Maps don't store duplicate elements -- multimaps do.
You can traverse a set, multiset, map or multimap with iterators.
When you try to find() something in a set, multiset, map or multimap, it returns an iterator, which is equal to the end() method if the item is not found.
You can treat a map like an associative array, where the "indices" to the array are the keys.
You need to be careful when treating a map like an associative array, because when you don't find the key, it is automatically inserted into the map.
We developed a program that used maps and multimaps in a variety of ways. With all of those, we only have one copy of the data (the quarterbacks) and our data structures store pointers.
If your program is just reading data, processing and exiting, you don't need to worry about destructors or copy constructors. If you want to make your classes so that they may be used by others, you need to worry about these things.
Sets (and maps) return pairs from their insert() methods. Multisets and multimaps return iterators to the newly inserted items.