C++ Standard Template Library
Introduction
The C++ standard template library (STL)
provides a number of useful data structures and
algorithms that you can use in your C++ programs. A good reference for the
STL is www.cppreference.com. These
notes provide a brief overview of the classes that you will be using in
this course and an overview of pitfalls that I've observed in students'
code using these classes.
Overview
The STL provides classes and algorithms that account for many of a
programmer's day-to-day programming needs. It includes classes for:
- Strings
- Vectors: behave like dynamically expandable arrays
- Lists
- Stacks
- Queues
- Double ended queues (dequeues)
- Balanced trees
- Hash tables
- Priority Queues
- Quicksort: The sort function is actually a function and
not a class but it provides a handy way to sort vectors, arrays,
and the elements of hash tables.
Using the STL in Your Program
STL classes may be included in your program via the include statement.
Unlike C include files, STL files do not have a .h extension. For
example, if you want to include the list class in your program you would
write:
include
After your include statements you will also typically add the following
statement to your program:
using namespace std;
A namespace provides a way for you to divide your program into
different named "spaces". Within each space you may re-use the names of
variables, functions, etc. Classes provide another means for you to
divide your program's namespace since within each class you may re-use
the names of variables and functions. A namespace is a higher-level
construct than a class and may be thought of as a module or library
that contains the names of a number of related classes, functions, and
variables. In this case the std namespace is the namespace that
encapsulates all the classes and functions defined by the STL library.
Ordinarily you access names in a namespace by prefixing them with the name
of their namespace and two colons (::). For example, to declare a variable
as a list you would write:
std::list intList;
However, it can be bothersome to have to remember to always prefix these
names with their namespace so C++ allows you to import these names into
the global namespace via the using statement. Hence when we
write:
using namespace std;
we are telling C++ to import all the names in the std namespace
into the global namespace and we may now write:
list intList;
If there are conflicts between a name that you've
already declared and one in the namespace then your name will mask the imported
name. For example, if you declare a class named list, then that
declaration will mask the std list and cause the compiler to complain if
you try to declare an std list. You can still use the std
list class by
prefixing it with "std::". Hence the compiler will be happy if you write:
class list {
public:
int x:
};
std::list myList;
The pair Class
The STL makes extensive use of a poorly documented template
class named pair. A pseudo-code definition for the pair class is:
class pair {
Type1 first;
Type2 second;
};
For example, the declaration:
pair a;
creates an object whose first field is
an int and whose second field is a string.
The STL defines a function named make_pair that takes two arguments
of the proper type and returns a pair object (not a pointer to a
pair object but an actual pair object). For example we could write:
a = make_pair(10, string("brad"));
Descriptions of the Container Objects
The descriptions of the container objects in this section will be limited
to the objects' most frequently used operations. Further information about
each container object can be found at
www.cppreference.com.
Vectors behave like dynamically extendible arrays and elements of a vector
can be accessed and assigned using the [] array notation. For example:
vector people;
people.push_back("brad");
people.push_back("mary");
people.push_back("susan");
for (int i = 0; i < people.size(); i++)
cout << people[i] << endl;
people[1] = "tom";
cout << people[1] << endl;
The output of this code is:
brad
mary
susan
tom
The push_back commands adds an element to the end of the vector
and dynamically expands the size of the vector by 1. The size method
returns the current size of the vector, so you do not have to explicitly
maintain a count of the size of the vector.
You can only use the [] notation to assign to previously allocated elements.
For example, if I try:
vector names;
name[2] = "brad";
there is a good chance my code will segfault. You must either use
push_back to create new entries in the vector or resize.
The resize command takes a numeric argument representing the
desired size of the vector and an optional value to which any new entries will
be initialized. For example:
vector names;
names.push_back("brad");
names.resize(5, "mary");
names[3] = "tom";
for (int i = 0; i < names.size(); i++)
cout << names[i] << endl;
produces the output:
brad
mary
mary
tom
mary
Notice that once I resized the vector to 5 elements it was
okay to assign "tom" to names using the [] operator.
A final method that I find useful for vectors is back. It returns
the last element in the vector:
cout << names.back() << endl; // output = mary
back has the same effect as names[names.size()-1] but
is more readable.
STL lists are doubly-linked lists. You can insert elements at the front or end
of the list using the push_front and push_back methods:
list names;
names.push_back("nels"); // list = "nels"
names.push_front("billy"); // list = ("billy", "nels")
names.push_back("brad"); // list = ("billy", "nels", "brad")
Of course you typically want to insert elements into the middle of a list and
you must do that using an iterator. An iterator can be thought of as a
protected pointer to an element in the list. You can pass this pointer to
various methods that the list provides in order to accomplish tasks like
accessing the element's value, deleting the element, or inserting a new
value before the element. The following methods and operators are
important for dealing with iterators:
begin |
returns an iterator to the first element of the list |
end |
returns an iterator just past the end of the list. I like to
think of end as returning
an iterator that points to a sentinel node, although the list may not
actually be implemented using sentinel nodes. |
operator++ |
Advances an iterator to the next element in the list |
operator* |
returns the value of the element pointed to by the iterator |
The following code illustrates
these concepts by inserting "mary" into a list that is sorted in ascending
order:
list::iterator namesIter;
for (namesIter = names.begin(); namesIter != names.end(); namesIter++) {
if ("mary" < *namesIter)
break;
}
names.insert(namesIter, "mary");
insert always inserts values before the element pointed
at by the iterator. Hence if the above code reaches the end of the list
(i.e., namesIter equals names.end() when the loop exits), then "mary" will
be inserted before names.end() but after the last element in the list.
You can also use iterators to delete elements from a list. For example the
following code searches for a person named "mary" and removes her if she
is found:
class People {
public:
string name;
int age;
};
list peopleList;
list::iterator peopleIter;
... code to insert people into the list ...
for (peopleIter = peopleList.begin(); peopleIter != peopleList.end();
peopleIter++) {
if ((*peopleIter)->name == "mary") {
peopleList.erase(peopleIter);
break;
}
}
Notice that I put ()'s around *peopleIter in order to ensure that
it gets properly dereferenced. The following code will not work because the
-> operator binds more tightly than the *operator:
if (*peopleIter->name == "mary")
Some other common list methods that you might found helpful are:
back |
returns a reference to the last element of a list. If the list
stores pointers then you can assume that back returns
a pointer. For example:
list peopleList;
People *lastPerson = peopleList.back();
|
clear |
removes all elements from the list |
empty |
returns true if the list is empty |
front |
returns a reference to the first element of the list. If the
list stores pointers, then like back you may assume that
front returns a pointer.
|
rbegin, rend, operator-- |
create iterators that allow you to traverse a list from back to front.
The -- operator will move the iterator to the previous element in the
list. |
size |
returns the number of elements in the list |
C++ provides two types of balanced tree classes: sets and maps. The set
class only allows you to store a key, while the map class
allows you to store <key,value> pairs. Both set and map
store only unique keys. If you want to use duplicate keys then you
must use multiset or multimap. Sets are useful when
you are trying to eliminate duplicate items or when you have a multi-element
key (e.g., first/last name or day/month/year) and think it would be
prohibitively expensive to use a map because you would have to duplicate
the key's value in both the key and value fields. Maps are useful when
you have <key,value> pairs because their method interface is easier to
use.
The declaration for set takes one of two forms:
set
set
You need to provide a Comparator class if your Type
does not have the proper < operator defined. For example:
set zipcodes; // ok--< is defined for ints
set names; // ok--< is defined for strings
set employees; // error--the < operator will compare employee
// pointers which is not what you want
set employees; // ok if operator< has been defined for
// employees; otherwise an error
The ComparatorClass must define the boolean operator (). This
operator will take two parameters of KeyType. For example:
class CompareEmployees {
public:
bool operator() (employee* e1, employee* s2) {
return e1->age < e2->age;
}
};
set employees;
The insert method inserts a key into the set if the key is
not yet in the set. It returns
a pair object of the form pair<iterator,bool> where
the iterator is an iterator to the inserted value, if the insert
succeeds, and the boolean indicates whether or not the insert
succeeded. The insert fails if the key already exists. For example:
pair::iterator, bool> insertResult employees.insert(newEmployee);
if (insertResult.second) {
cout << (*insertResult.first)->salary << endl;
}
else {
cerr << "duplicate key" << endl;
}
Some other common set methods that you might found helpful are:
begin, end, operator++ |
returns iterators that allow you to iterate through the
elements of the set in sorted order. |
clear |
removes all the elements from the set |
erase(KeyType key) |
deletes the key from the set if the key exists; returns 1 if the
key exists and 0 otherwise. |
find(KeyType key) |
returns an iterator to the key if it's found and an iterator
to the end of the set otherwise |
rbegin, rend, operator-- |
returns iterators that allow you to iterate through the
elements of the set in reverse sorted order. |
size |
returns the number of elements in the set |
maps have almost the same interface as sets with the
only differences being in the declarative form of maps, the
find and insert methods, and an added [] operator
for ease in inserting and finding elements. The declarative forms
for map are:
map
map // the set class contains a description
// of the ComparatorClass
For example:
map personMap;
map employeeMap;
class DataComparison {
public:
bool operator() (date d1, date d2) {
int d1_num = 366 * d1.year + 31 * d1.month + d1.day;
int d2_num = 366 * d2.year + 31 * d2.month + d2.day;
return d1_num < d2_num;
}
};
map employeeMap;
The [] operator makes it easy to insert and retrieve elements from the
map:
map personMap;
personMap["brad"] = 43;
personMap["nels"] = 41;
personMap["mary"] = 20;
cout << personMap["mary"] << endl; // output is 20
cout << personMap["tony"] << endl; // output is 0
In general the [] operator returns a default value if it cannot locate
the key in the table. Default values would be 0 for an int,
0 for a pointer type, and "" for a string.
The [] operator will clobber the value associated with a key if
a key previously existed in the map. If you want to guard against
this possibility then you must use the insert method. The
insert method takes a pair object and returns an
<iterator, bool> pair that indicates whether or not the operation
succeeded. If the operation succeeded then the iterator is a pointer to a
<KeyType, ValueType> pair for the found element.
The operation fails if the key already exists in the map. For example:
map personMap;
string name;
int age;
while (cin >> name >> age) {
if (personMap.insert(make_pair(name, age)).second)
printf("%s successfully inserted\n", name.c_str());
else
printf("%s: duplicate key\n"< name.c_str());
}
You can use the find method if the default value returned
by the [] operator is ambiguous. For example, if an int is
the value type, then a return value of 0 could either mean that the
key wasn't found or that the value was really 0. To dis-ambiguate this
case one can use the find method. The find method
takes a key and returns either an iterator to the found key or an
iterator to the end of the map if the key is not found. The iterator
will be a pointer to a <KeyType, ValueType> pair.
Finally the iterators that traverse the map are pointers to
<KeyType, ValueType> pairs. For example:
map::iterator personMapIter;
for (personMapIter = personMap.begin();
personMapIter != personMap.end();
personMapIter++) {
cout << (*personMapIter).first << " " << (*personMapIter).second << endl;
}
The hash_set, hash_map,
hash_multiset, and hash_multimap classes implement
hash tables. However, hash tables are still not part of the official
C++ standard so some STL libraries include them and some don't. Our
GNU compiler does include hash tables but you must access then through
an extensions library. To include them in your program you should use the
following statements:
#include
using namespace __gnu_cxx;
The rest of this section has not yet been written
C++ priority queues are max queues so if you want to create a min queue
you will need to provide a comparator class that appropriately orders your
queue elements. The example at the end of this section shows you how to do
this.
You can include priority queues in your program via the following include
statement:
#include
The declarative forms for priority queues that you will use in this
course are:
priority_queue
priority_queue, comparatorClass>
For example:
- Simple max queue
priority_queue maxQueue;
- Simple min queue. You will need to write a comparator class and
use the > operator to compare the keys:
class compareInt {
public:
bool operator() (int key1, int key2) {
return key1 > key2;
}
};
priority_queue, compareInt> minQueue;
- min queue with class pointers:
class compareEmployees {
public:
bool operator() (Employee *e1, Employee *e2) {
return e1->age > e2->age;
}
};
priority_queue, compareEmployees> empQueue;
The second argument, vector<Type> tells C++ which data
structure to use for the heap. You can provide other data structures but
a vector is always a good choice.
www.cppreference.com is a good
source to check for the methods provided by the priority_queue
class. The methods should be self-explanatory.
The sort function can be used to sort arrays, vectors, or any
container that contains iterators. The sort function uses the
quicksort algorithm and leaves the array, vector, etc. in sorted order.
You must pass the sort function a comparison function if the < operator
will not work. To use the sort function you must place the
following include statement at the top of your code:
#include
The following sections present several code
examples of the sort function.
Sorting a Vector of Integers
#include
#include
#include
using namespace std;
main() {
vector a;
a.push_back(5);
a.push_back(3);
a.push_back(7);
a.push_back(9);
a.push_back(1);
sort(a.begin(), a.end());
for (int i = 0; i < a.size(); i++)
cout << a[i] << endl;
}
Sorting an Array of Integers
int a[5];
a[0] = 5;
a[1] = 3;
a[2] = 7;
a[3] = 9;
a[4] = 1;
sort(a, a+5);
for (int i = 0; i < 5; i++)
cout << a[i] << endl;
Sorting a Vector of Objects
class employee {
public:
int age;
string name;
employee(int emp_age, string emp_name) :
age(emp_age), name(emp_name) {}
};
// a function that compares two employees and returns true
// if the first employee's age is less than the second
// employee's age
bool emp_compare(const employee *e1, const employee *e2) {
return e1->age < e2->age;
}
main() {
vector a;
a.push_back(new employee(5, "brad"));
a.push_back(new employee(3, "ebber"));
a.push_back(new employee(7, "smiley"));
a.push_back(new employee(9, "lady"));
a.push_back(new employee(1, "sunshine"));
sort(a.begin(), a.end(), emp_compare);
for (int i = 0; i < a.size(); i++)
cout << a[i]->age << endl;
}
This section covers a number of
STL errors that I see continually in students' code.
Storing Objects in STL Container Classes
Students often write:
map employeeMap;
rather than:
map employeeMap;
You should almost always have your container objects store pointers to
objects and not the objects themselves. Hence the latter declaration is
preferable. The reason is one of both efficiency and correctness. From
an efficiency standpoint, every time you store an object rather than a
pointer to that object in a container you end up calling three methods
instead of just one.
Remember that when you call a method and pass an object as a parameter the
compiler creates a copy of that object and it creates that copy by calling
the copy constructor. So let's say that you write the following code:
map employeeMap;
employeeMap.insert("vander zanden", Employee("vander zanden", "professor", 20000));
What happens "under the hood" is that 1) the constructor for Employee gets
called in order to create a temporary employee object, 2) the temporary
employee object gets passed to the insert method, 3) the insert method
invokes the copy constructor to create a copy of the temporary object
and it stores this copy in the map, and 4) the temporary
object created in step 1 is destroyed by calling its destructor and
de-allocating its memory. The copy in step 2 is necessary because C++
always destroys parameter
objects when a function returns and hence insert
cannot store the temporary object created in step 1. Notice that three
methods were called in order to store the Employee record. Suppose instead
that you rewrite the above code so that it uses pointers:
map employeeMap;
employeeMap.insert("vander zanden", new Employee("vander zanden", "professor", 20000));
What happens "under the hood" is that 1) the new operator allocates memory
for an Employee object, 2) the new operator calls the constructor for
Employee, 3) the pointer returned by new is passed to the insert
method, and 4) insert stores this pointer in the map. Notice that
only one method is called in order to store the Employee record. Hence the
pointer code is almost 3 times as efficient as the non-pointer code.
The problem with correctness arises when students try to combine the two
approaches shown above. In particular I've seen pseudo-code like:
map<string, Employee> employeeMap;
while(not eof) {
read an employee line into the variables name, jobTitle,
and salary
Employee *newEmployee = new Employee(name, jobTitle, salary);
employeeMap.insert(name, *newEmployee);
}
The call to insert does the following: 1) it calls the copy constructor
for Employee passing it the reference given it by *newEmployee, and
2) it stores the copied object in employeeMap. Note that a copy of the object
pointed to by newEmployee
gets stored in employeeMap, rather than a pointer to the object. This means
that if we ever modified the object pointed to by newEmployee, the change
will not be reflected in the object stored in employeeMap. This is probably
not what the programmer intended. For example, suppose that 20000 is the
integer originally assigned to the salary field for the new employee record.
Now suppose we execute the following code after inserting *newEmployee
into employeeMap:
newEmployee->salary = 40000;
cout << employeeMap[newEmployee->name].salary << endl; // output = 20000
As shown by the above output, the salary field of the object that was
stored in employeeMap is unchanged since it
is a copy of the object pointed
to by newEmployee.
An additional problem with the code that passes *newEmployee
to insert is that when the program
makes its next iteration through the loop, it loses the pointer to the
object pointed to by newEmployee because newEmployee is assigned a pointer to
a new object. Hence our program also has a memory leak.
Passing Containers as Objects Rather Than Pointers
When passing a container as an argument to a function, pass a reference
to the container rather than a copy of the container:
correct:
void read_course_file(char *golfCourseFile, map &courseMap) { ... }
incorrect:
void read_course_file(char *golfCoursefile, map courseMap) { ... }
Notice that the first function declaration takes a reference pointer to
courseMap while the second function declaration takes a copy of
courseMap. Presumably we want read_course_file to read
golf course information from a file and store it in courseMap.
The version of read_course_map that declares courseMap
as a reference pointer will cause the compiler to pass a pointer to the
caller's version of courseMap. Hence read_course_map
will modify the caller's version of courseMap, which is what the
programmer desires. In contrast, the second version of read_course_file
will cause the compiler to create and pass a copy of the caller's
courseMap to read_course_file. Hence read_course_file
will modify the copy of
courseMap, rather than the caller's version of courseMap.
This copy will be thrown away when the function returns and all of the work
performed by read_course_file will be wasted.