C++ Standard Template Library


Introduction

The C++ standard template library (STL) provides a number of useful data structures and algorithms that you can use in your C++ programs. A good reference for the STL is www.cppreference.com. These notes provide a brief overview of the classes that you will be using in this course and an overview of pitfalls that I've observed in students' code using these classes.


Overview

The STL provides classes and algorithms that account for many of a programmer's day-to-day programming needs. It includes classes for:


Using the STL in Your Program

STL classes may be included in your program via the include statement. Unlike C include files, STL files do not have a .h extension. For example, if you want to include the list class in your program you would write:

include <list> After your include statements you will also typically add the following statement to your program: using namespace std; A namespace provides a way for you to divide your program into different named "spaces". Within each space you may re-use the names of variables, functions, etc. Classes provide another means for you to divide your program's namespace since within each class you may re-use the names of variables and functions. A namespace is a higher-level construct than a class and may be thought of as a module or library that contains the names of a number of related classes, functions, and variables. In this case the std namespace is the namespace that encapsulates all the classes and functions defined by the STL library.

Ordinarily you access names in a namespace by prefixing them with the name of their namespace and two colons (::). For example, to declare a variable as a list you would write:

std::list<int> intList; However, it can be bothersome to have to remember to always prefix these names with their namespace so C++ allows you to import these names into the global namespace via the using statement. Hence when we write: using namespace std; we are telling C++ to import all the names in the std namespace into the global namespace and we may now write: list<int> intList; If there are conflicts between a name that you've already declared and one in the namespace then your name will mask the imported name. For example, if you declare a class named list, then that declaration will mask the std list and cause the compiler to complain if you try to declare an std list. You can still use the std list class by prefixing it with "std::". Hence the compiler will be happy if you write: class list { public: int x: }; std::list<int> myList;

The pair Class

The STL makes extensive use of a poorly documented template class named pair. A pseudo-code definition for the pair class is:

class pair<Type1, Type2> { Type1 first; Type2 second; }; For example, the declaration: pair<int, string> a; creates an object whose first field is an int and whose second field is a string.

The STL defines a function named make_pair that takes two arguments of the proper type and returns a pair object (not a pointer to a pair object but an actual pair object). For example we could write:

a = make_pair(10, string("brad"));


Descriptions of the Container Objects

The descriptions of the container objects in this section will be limited to the objects' most frequently used operations. Further information about each container object can be found at www.cppreference.com.


Vector

Vectors behave like dynamically extendible arrays and elements of a vector can be accessed and assigned using the [] array notation. For example:

vector<string> people; people.push_back("brad"); people.push_back("mary"); people.push_back("susan"); for (int i = 0; i < people.size(); i++) cout << people[i] << endl; people[1] = "tom"; cout << people[1] << endl; The output of this code is: brad mary susan tom The push_back commands adds an element to the end of the vector and dynamically expands the size of the vector by 1. The size method returns the current size of the vector, so you do not have to explicitly maintain a count of the size of the vector.

You can only use the [] notation to assign to previously allocated elements. For example, if I try:

vector<string> names; name[2] = "brad"; there is a good chance my code will segfault. You must either use push_back to create new entries in the vector or resize. The resize command takes a numeric argument representing the desired size of the vector and an optional value to which any new entries will be initialized. For example: vector<string> names; names.push_back("brad"); names.resize(5, "mary"); names[3] = "tom"; for (int i = 0; i < names.size(); i++) cout << names[i] << endl; produces the output: brad mary mary tom mary Notice that once I resized the vector to 5 elements it was okay to assign "tom" to names using the [] operator.

A final method that I find useful for vectors is back. It returns the last element in the vector:

cout << names.back() << endl; // output = mary back has the same effect as names[names.size()-1] but is more readable.


List

STL lists are doubly-linked lists. You can insert elements at the front or end of the list using the push_front and push_back methods:

list<string> names; names.push_back("nels"); // list = "nels" names.push_front("billy"); // list = ("billy", "nels") names.push_back("brad"); // list = ("billy", "nels", "brad") Of course you typically want to insert elements into the middle of a list and you must do that using an iterator. An iterator can be thought of as a protected pointer to an element in the list. You can pass this pointer to various methods that the list provides in order to accomplish tasks like accessing the element's value, deleting the element, or inserting a new value before the element. The following methods and operators are important for dealing with iterators:

begin returns an iterator to the first element of the list
end returns an iterator just past the end of the list. I like to think of end as returning an iterator that points to a sentinel node, although the list may not actually be implemented using sentinel nodes.
operator++ Advances an iterator to the next element in the list
operator* returns the value of the element pointed to by the iterator

The following code illustrates these concepts by inserting "mary" into a list that is sorted in ascending order:

list<string>::iterator namesIter; for (namesIter = names.begin(); namesIter != names.end(); namesIter++) { if ("mary" < *namesIter) break; } names.insert(namesIter, "mary"); insert always inserts values before the element pointed at by the iterator. Hence if the above code reaches the end of the list (i.e., namesIter equals names.end() when the loop exits), then "mary" will be inserted before names.end() but after the last element in the list.

You can also use iterators to delete elements from a list. For example the following code searches for a person named "mary" and removes her if she is found:

class People { public: string name; int age; }; list<People *> peopleList; list<People *>::iterator peopleIter; ... code to insert people into the list ... for (peopleIter = peopleList.begin(); peopleIter != peopleList.end(); peopleIter++) { if ((*peopleIter)->name == "mary") { peopleList.erase(peopleIter); break; } } Notice that I put ()'s around *peopleIter in order to ensure that it gets properly dereferenced. The following code will not work because the -> operator binds more tightly than the *operator: if (*peopleIter->name == "mary") Some other common list methods that you might found helpful are:

back returns a reference to the last element of a list. If the list stores pointers then you can assume that back returns a pointer. For example: list<People *> peopleList; People *lastPerson = peopleList.back();
clear removes all elements from the list
empty returns true if the list is empty
front returns a reference to the first element of the list. If the list stores pointers, then like back you may assume that front returns a pointer.
rbegin, rend,
operator--
create iterators that allow you to traverse a list from back to front. The -- operator will move the iterator to the previous element in the list.
size returns the number of elements in the list


Balanced Trees

C++ provides two types of balanced tree classes: sets and maps. The set class only allows you to store a key, while the map class allows you to store <key,value> pairs. Both set and map store only unique keys. If you want to use duplicate keys then you must use multiset or multimap. Sets are useful when you are trying to eliminate duplicate items or when you have a multi-element key (e.g., first/last name or day/month/year) and think it would be prohibitively expensive to use a map because you would have to duplicate the key's value in both the key and value fields. Maps are useful when you have <key,value> pairs because their method interface is easier to use.

The declaration for set takes one of two forms:

set<KeyType> set<KeyType, ComparatorClass> You need to provide a Comparator class if your Type does not have the proper < operator defined. For example: set<int> zipcodes; // ok--< is defined for ints set<string> names; // ok--< is defined for strings set<employee *> employees; // error--the < operator will compare employee // pointers which is not what you want set<employee> employees; // ok if operator< has been defined for // employees; otherwise an error The ComparatorClass must define the boolean operator (). This operator will take two parameters of KeyType. For example: class CompareEmployees { public: bool operator() (employee* e1, employee* s2) { return e1->age < e2->age; } }; set<employee *, CompareEmployees> employees; The insert method inserts a key into the set if the key is not yet in the set. It returns a pair object of the form pair<iterator,bool> where the iterator is an iterator to the inserted value, if the insert succeeds, and the boolean indicates whether or not the insert succeeded. The insert fails if the key already exists. For example: pair<set<employee *, CompareEmployees>::iterator, bool> insertResult employees.insert(newEmployee); if (insertResult.second) { cout << (*insertResult.first)->salary << endl; } else { cerr << "duplicate key" << endl; }

Some other common set methods that you might found helpful are:

begin, end,
operator++
returns iterators that allow you to iterate through the elements of the set in sorted order.
clear removes all the elements from the set
erase(KeyType key) deletes the key from the set if the key exists; returns 1 if the key exists and 0 otherwise.
find(KeyType key) returns an iterator to the key if it's found and an iterator to the end of the set otherwise
rbegin, rend,
operator--
returns iterators that allow you to iterate through the elements of the set in reverse sorted order.
size returns the number of elements in the set

maps have almost the same interface as sets with the only differences being in the declarative form of maps, the find and insert methods, and an added [] operator for ease in inserting and finding elements. The declarative forms for map are:

map<KeyType, ValueType> map<KeyType, ValueType, ComparatorClass> // the set class contains a description // of the ComparatorClass For example: map<string, int> personMap; map<string, employee *> employeeMap; class DataComparison { public: bool operator() (date d1, date d2) { int d1_num = 366 * d1.year + 31 * d1.month + d1.day; int d2_num = 366 * d2.year + 31 * d2.month + d2.day; return d1_num < d2_num; } }; map<date, employee *, DateComparison> employeeMap; The [] operator makes it easy to insert and retrieve elements from the map: map<string, int> personMap; personMap["brad"] = 43; personMap["nels"] = 41; personMap["mary"] = 20; cout << personMap["mary"] << endl; // output is 20 cout << personMap["tony"] << endl; // output is 0 In general the [] operator returns a default value if it cannot locate the key in the table. Default values would be 0 for an int, 0 for a pointer type, and "" for a string.

The [] operator will clobber the value associated with a key if a key previously existed in the map. If you want to guard against this possibility then you must use the insert method. The insert method takes a pair object and returns an <iterator, bool> pair that indicates whether or not the operation succeeded. If the operation succeeded then the iterator is a pointer to a <KeyType, ValueType> pair for the found element. The operation fails if the key already exists in the map. For example:

map<string, int> personMap; string name; int age; while (cin >> name >> age) { if (personMap.insert(make_pair(name, age)).second) printf("%s successfully inserted\n", name.c_str()); else printf("%s: duplicate key\n"< name.c_str()); } You can use the find method if the default value returned by the [] operator is ambiguous. For example, if an int is the value type, then a return value of 0 could either mean that the key wasn't found or that the value was really 0. To dis-ambiguate this case one can use the find method. The find method takes a key and returns either an iterator to the found key or an iterator to the end of the map if the key is not found. The iterator will be a pointer to a <KeyType, ValueType> pair.

Finally the iterators that traverse the map are pointers to <KeyType, ValueType> pairs. For example:

map<string, int>::iterator personMapIter; for (personMapIter = personMap.begin(); personMapIter != personMap.end(); personMapIter++) { cout << (*personMapIter).first << " " << (*personMapIter).second << endl; }

Hash Tables

The hash_set, hash_map, hash_multiset, and hash_multimap classes implement hash tables. However, hash tables are still not part of the official C++ standard so some STL libraries include them and some don't. Our GNU compiler does include hash tables but you must access then through an extensions library. To include them in your program you should use the following statements: #include<ext/hash_map> using namespace __gnu_cxx; The rest of this section has not yet been written


Priority Queues

C++ priority queues are max queues so if you want to create a min queue you will need to provide a comparator class that appropriately orders your queue elements. The example at the end of this section shows you how to do this. You can include priority queues in your program via the following include statement:

#include <queue> The declarative forms for priority queues that you will use in this course are: priority_queue<Type> priority_queue<Type, vector<Type>, comparatorClass> For example:
  1. Simple max queue priority_queue<int> maxQueue;
  2. Simple min queue. You will need to write a comparator class and use the > operator to compare the keys: class compareInt { public: bool operator() (int key1, int key2) { return key1 > key2; } }; priority_queue<int, vector<int>, compareInt> minQueue;
  3. min queue with class pointers: class compareEmployees { public: bool operator() (Employee *e1, Employee *e2) { return e1->age > e2->age; } }; priority_queue<Employee *, vector<Employee *>, compareEmployees> empQueue;
The second argument, vector<Type> tells C++ which data structure to use for the heap. You can provide other data structures but a vector is always a good choice.

www.cppreference.com is a good source to check for the methods provided by the priority_queue class. The methods should be self-explanatory.


Sort Function

The sort function can be used to sort arrays, vectors, or any container that contains iterators. The sort function uses the quicksort algorithm and leaves the array, vector, etc. in sorted order. You must pass the sort function a comparison function if the < operator will not work. To use the sort function you must place the following include statement at the top of your code:

#include <algorithm> The following sections present several code examples of the sort function.


Sorting a Vector of Integers

#include <iostream> #include <vector> #include <algorithm> using namespace std; main() { vector<int> a; a.push_back(5); a.push_back(3); a.push_back(7); a.push_back(9); a.push_back(1); sort(a.begin(), a.end()); for (int i = 0; i < a.size(); i++) cout << a[i] << endl; }

Sorting an Array of Integers

int a[5]; a[0] = 5; a[1] = 3; a[2] = 7; a[3] = 9; a[4] = 1; sort(a, a+5); for (int i = 0; i < 5; i++) cout << a[i] << endl;

Sorting a Vector of Objects

class employee { public: int age; string name; employee(int emp_age, string emp_name) : age(emp_age), name(emp_name) {} }; // a function that compares two employees and returns true // if the first employee's age is less than the second // employee's age bool emp_compare(const employee *e1, const employee *e2) { return e1->age < e2->age; } main() { vector<employee *> a; a.push_back(new employee(5, "brad")); a.push_back(new employee(3, "ebber")); a.push_back(new employee(7, "smiley")); a.push_back(new employee(9, "lady")); a.push_back(new employee(1, "sunshine")); sort(a.begin(), a.end(), emp_compare); for (int i = 0; i < a.size(); i++) cout << a[i]->age << endl; }


Common STL Errors

This section covers a number of STL errors that I see continually in students' code.

Storing Objects in STL Container Classes

Students often write:

map<string, Employee> employeeMap; rather than: map<string, Employee *> employeeMap; You should almost always have your container objects store pointers to objects and not the objects themselves. Hence the latter declaration is preferable. The reason is one of both efficiency and correctness. From an efficiency standpoint, every time you store an object rather than a pointer to that object in a container you end up calling three methods instead of just one. Remember that when you call a method and pass an object as a parameter the compiler creates a copy of that object and it creates that copy by calling the copy constructor. So let's say that you write the following code: map<string, Employee> employeeMap; employeeMap.insert("vander zanden", Employee("vander zanden", "professor", 20000)); What happens "under the hood" is that 1) the constructor for Employee gets called in order to create a temporary employee object, 2) the temporary employee object gets passed to the insert method, 3) the insert method invokes the copy constructor to create a copy of the temporary object and it stores this copy in the map, and 4) the temporary object created in step 1 is destroyed by calling its destructor and de-allocating its memory. The copy in step 2 is necessary because C++ always destroys parameter objects when a function returns and hence insert cannot store the temporary object created in step 1. Notice that three methods were called in order to store the Employee record. Suppose instead that you rewrite the above code so that it uses pointers: map<string, Employee *> employeeMap; employeeMap.insert("vander zanden", new Employee("vander zanden", "professor", 20000)); What happens "under the hood" is that 1) the new operator allocates memory for an Employee object, 2) the new operator calls the constructor for Employee, 3) the pointer returned by new is passed to the insert method, and 4) insert stores this pointer in the map. Notice that only one method is called in order to store the Employee record. Hence the pointer code is almost 3 times as efficient as the non-pointer code.

The problem with correctness arises when students try to combine the two approaches shown above. In particular I've seen pseudo-code like:

map<string, Employee> employeeMap;
while(not eof) {
  read an employee line into the variables name, jobTitle,
      and salary
  Employee *newEmployee = new Employee(name, jobTitle, salary);
  employeeMap.insert(name, *newEmployee);
}
The call to insert does the following: 1) it calls the copy constructor for Employee passing it the reference given it by *newEmployee, and 2) it stores the copied object in employeeMap. Note that a copy of the object pointed to by newEmployee gets stored in employeeMap, rather than a pointer to the object. This means that if we ever modified the object pointed to by newEmployee, the change will not be reflected in the object stored in employeeMap. This is probably not what the programmer intended. For example, suppose that 20000 is the integer originally assigned to the salary field for the new employee record. Now suppose we execute the following code after inserting *newEmployee into employeeMap: newEmployee->salary = 40000; cout << employeeMap[newEmployee->name].salary << endl; // output = 20000 As shown by the above output, the salary field of the object that was stored in employeeMap is unchanged since it is a copy of the object pointed to by newEmployee.

An additional problem with the code that passes *newEmployee to insert is that when the program makes its next iteration through the loop, it loses the pointer to the object pointed to by newEmployee because newEmployee is assigned a pointer to a new object. Hence our program also has a memory leak.

Passing Containers as Objects Rather Than Pointers

When passing a container as an argument to a function, pass a reference to the container rather than a copy of the container:

correct: void read_course_file(char *golfCourseFile, map<string, Course *> &courseMap) { ... } incorrect: void read_course_file(char *golfCoursefile, map<string, Course *> courseMap) { ... } Notice that the first function declaration takes a reference pointer to courseMap while the second function declaration takes a copy of courseMap. Presumably we want read_course_file to read golf course information from a file and store it in courseMap. The version of read_course_map that declares courseMap as a reference pointer will cause the compiler to pass a pointer to the caller's version of courseMap. Hence read_course_map will modify the caller's version of courseMap, which is what the programmer desires. In contrast, the second version of read_course_file will cause the compiler to create and pass a copy of the caller's courseMap to read_course_file. Hence read_course_file will modify the copy of courseMap, rather than the caller's version of courseMap. This copy will be thrown away when the function returns and all of the work performed by read_course_file will be wasted.