CS302 Lecture Notes - Lists, Iterators, Bad Vector Usage, Deques

The list type is one of the very useful parts of the STL. I'll present a canonical list example: reversing the lines of standard input. To do that, we'll create a list of strings, where the list has the lines in reverse order, and we'll traverse the list and print out the lines. To create the list, we start with an empty list and insert each string to the front of the list using the push_front() method.

The code is in reverse_1.cpp:

#include <iostream>
#include <list>
using namespace std;

main()
{
  list <string> lines;
  list <string>::iterator lit;
  string s;

  while (!cin.eof()) {
    getline(cin, s);
    if (!cin.eof()) lines.push_front(s);
  }
 
  for (lit = lines.begin(); lit != lines.end(); lit++) {
    cout << *lit << endl;
  }
}

A few things -- you declare an empty list just like you declare an empty vector. In fact, the code to create the list is very much like the code to create a vector, except we are using push_front() to prepend each string to the front of the list.

To traverse the list, we use an iterator, which is a special type defined by the template library. The for loop is typical -- you start with the first element of the list, obtained with the begin() method, and traverse until you are one element beyond the end of the list (signified by the end() method). To go from one element to the next, you increment the iterator. I don't like this usage of overloading, but it wasn't up to me.

Then, to access the element in the list, you use pointer indirection (the *). When you get used to seeing this code, it reads nicely. It does take a little acclimation though. Regardless, it works:

UNIX> cat input.txt
Born in the night
She would run like a leopard
That freaks at the sight
Of a mind close beside herself
UNIX> reverse_1 < input.txt
Of a mind close beside herself
That freaks at the sight
She would run like a leopard
Born in the night
UNIX> 

Reverse Iterators

Instead of creating a list in reverse order, we also could have created the list in order, and traversed it in reverse order. That code is in reverse_2.cpp:

#include <iostream>
#include <list>
using namespace std;

main()
{
  list <string> lines;
  list <string>::reverse_iterator lit;
  string s;

  while (!cin.eof()) {
    getline(cin, s);
    if (!cin.eof()) lines.push_back(s);
  }
 
  for (lit = lines.rbegin(); lit != lines.rend(); lit++) {
    cout << *lit << endl;
  }
}

We've created the list with push_back(), and we change lit to be a reverse_iterator. The iteration proceeds from rbegin(), which is the last element of the list, to rend(), which is one element before the first element of the list. Note, we still increment lit -- is that natural? You be the judge.

UNIX> reverse_2 < input.txt
Of a mind close beside herself
That freaks at the sight
She would run like a leopard
Born in the night
UNIX> 

List Insertion

Lists have an insert method, which takes an iterator as an argument, and inserts an element in front of that iterator. Thus lines.push_front() is equivalent to lines.insert(lines.begin()) and lines.push_back() is equivalent to lines.insert(lines.end()).

The program reverse_3.cpp implements reversal by inserting each element at the front and traversing the list in the forward direction:

#include <iostream>
#include <list>
using namespace std;

main()
{
  list <string> lines;
  list <string>::iterator lit;
  string s;

  while (!cin.eof()) {
    getline(cin, s);
    if (!cin.eof()) lines.insert(lines.begin(), s);
  }
 
  for (lit = lines.begin(); lit != lines.end(); lit++) {
    cout << *lit << endl;
  }
}

It works like the others:

UNIX> reverse_3 < input.txt
Of a mind close beside herself
That freaks at the sight
She would run like a leopard
Born in the night
UNIX> 

You can insert into vectors too, but you should not.

You can use iterators with vectors, and they work just like lists. Moreover, for some ill-judged reason, the implementors of the STL felt it ok to implement an insert() operation on vectors. This means that you can change the code of reverse_3.cpp to use vectors instead of lists. It is in reverse_4.cpp:

#include <iostream>
#include <vector>
using namespace std;

main()
{
  vector <string> lines;
  vector <string>::iterator lit;
  string s;

  while (!cin.eof()) {
    getline(cin, s);
    if (!cin.eof()) lines.insert(lines.begin(), s);
  }
 
  for (lit = lines.begin(); lit != lines.end(); lit++) {
    cout << *lit << endl;
  }
}

I call this ill-judged because when you perform an insertion such as v.insert(v.begin, x), the STL basically does the following:

  v.resize(v.size()+1);
  for (i = v.size(); i > 0; i--) v[i] = v[i-1];
  v[0] = x;

In other words, it copies all of the elements of the vector to make room for the new element at v[0]. This is expensive, and makes reverse_4.cpp above run in O(n2) time.

To illustrate, input-2.txt is an input file with 5000 lines, and input-3.txt is one with 10,000 lines. Look at the difference in speed between reverse_3 and reverse_4:

UNIX> wc input-2.txt
    5000    5000   40000 input-2.txt
UNIX> wc input-3.txt
   10000   10000   80000 input-3.txt
UNIX> time reverse_3 < input-2.txt > /dev/null
0.022u 0.009s 0:00.03 66.6%     0+0k 0+0io 0pf+0w
UNIX> time reverse_3 < input-3.txt > /dev/null
0.038u 0.014s 0:00.05 80.0%     0+0k 0+0io 0pf+0w
UNIX> time reverse_4 < input-2.txt > /dev/null
1.019u 0.010s 0:01.03 99.0%     0+0k 0+0io 0pf+0w
UNIX> time reverse_4 < input-3.txt > /dev/null
4.075u 0.029s 0:04.11 99.5%     0+0k 0+0io 0pf+0w
UNIX> 
As you can see, reverse_3 is very fast (0.03 and 0.05 seconds on my MacBook Pro), while reverse_4 is painfully slow (1 and 4 seconds). This is is important, and you should take care that it doesn't happen to you.

A good rule of thumb is to use a vector as an array and not a list. Don't use iterators -- use integer indices. Then you're ok.


Deques

The STL defines a deque, which stands for "double-ended queues." You use these like vectors, but with the proviso that you can insert or delete from either end of the deque with high efficiency. For that reason, they have a push_front() method (which vectors do not have). We can therefore use a deque to create a vector in reverse order, and then traverse it forwards. The code is in reverse_5.cpp:

#include <iostream>
#include <deque>
using namespace std;

main()
{
  deque <string> lines;
  int i;
  string s;

  while (!cin.eof()) {
    getline(cin, s);
    if (!cin.eof()) lines.push_front(s);
  }
 
  for (i = 0; i < lines.size(); i++) {
    cout << lines[i] << endl;
  }
}

Unlike the vector version, this one runs very fast:

UNIX> reverse_5 < input.txt
Of a mind close beside herself
That freaks at the sight
She would run like a leopard
Born in the night
UNIX> time reverse_5 < input-2.txt > /dev/null
0.019u 0.008s 0:00.02 50.0%     0+0k 0+0io 0pf+0w
UNIX> time reverse_5 < input-3.txt > /dev/null
0.034u 0.015s 0:00.05 80.0%     0+0k 0+0io 0pf+0w
UNIX>

Another example -- mytail

Another easy list program is mytail, which prints the last ten lines of a file (if there are more than ten lines -- if there are fewer, then it just prints the file). To write it, we use the erase() method, which takes an iterator as a parameter, and erases it from the list. Thus, we read in lines of text and append them to a list. When the list grows bigger than ten elements, we delete the first element, which will bring it down to the last ten lines. The code is straightforward, in mytail_list.cpp:

#include <iostream>
#include <list>
using namespace std;

main()
{
  list <string> lines;
  list <string>::iterator lit;
  string s;

  while (!cin.eof()) {
    getline(cin, s);
    if (!cin.eof()) {
      lines.push_back(s);
      if (lines.size() > 10) lines.erase(lines.begin());
    }
  }
 
  for (lit = lines.begin(); lit != lines.end(); lit++) {
    cout << *lit << endl;
  }
}

Works fine:

UNIX> mytail_list < input-2.txt
  4991
  4992
  4993
  4994
  4995
  4996
  4997
  4998
  4999
  5000
UNIX> mytail_list < input-3.txt
  9991
  9992
  9993
  9994
  9995
  9996
  9997
  9998
  9999
 10000
UNIX> 
As with the previous example, we can port the code directly to vectors and to deques, since they both implement an erase() method. As with the other example, we see that the vector implemention performs worse, since it copies all of the remaining elements upon deletion (the shell scripts make them do more work so that you can see the difference):
UNIX> time sh big_mytail_list.sh
0.411u 0.013s 0:00.42 100.0%    0+0k 0+0io 0pf+0w
UNIX> time sh big_mytail_deque.sh
0.370u 0.012s 0:00.38 100.0%    0+0k 0+0io 0pf+0w
UNIX> time sh big_mytail_vector.sh
0.507u 0.012s 0:00.52 98.0%     0+0k 0+1io 0pf+0w
UNIX> 
The difference isn't huge, but it is there. Were we to keep the last 100 lines instead of the last 10, the difference would be much more pronounced (we did this in class).

Nested List Traversal, and Why We Use Pointers

As an example in building lists of lists, we wrote the program list_o_list_1.cpp, which creates a list of lists. The top-level list is a list of ten lists of integers. These bottom-level lists contain ten integers each. The first bottom-level list contains the integers from 0 to 9. The next contains the integers from 10 to 19, etc.

After creating the list of lists, we traverse it an print out each bottom level list on on line:

#include <iostream>
#include <list>
using namespace std;

typedef list <int> intlist;

main()
{
  list <intlist *> numlists;
  list <intlist *>::iterator nlit;
  intlist *il;
  intlist::iterator ilit;
  int i, j;

  for (j = 0; j < 100; j += 10) {
    il = new intlist;
    numlists.push_back(il);
    for (i = 0; i < 10; i++) {
      il->push_back(i+j);
    }
  }

  for (nlit = numlists.begin(); nlit != numlists.end(); nlit++) {
    il = *nlit;
    for (ilit = il->begin(); ilit != il->end(); ilit++) cout << *ilit << " " ;
    cout << endl;
  }
}

The typedef makes the code cleaner, so that you don't have nested list declarations.

This code runs nicely, and as you'd expect:

UNIX> list_o_list_1
0 1 2 3 4 5 6 7 8 9 
10 11 12 13 14 15 16 17 18 19 
20 21 22 23 24 25 26 27 28 29 
30 31 32 33 34 35 36 37 38 39 
40 41 42 43 44 45 46 47 48 49 
50 51 52 53 54 55 56 57 58 59 
60 61 62 63 64 65 66 67 68 69 
70 71 72 73 74 75 76 77 78 79 
80 81 82 83 84 85 86 87 88 89 
90 91 92 93 94 95 96 97 98 99 
UNIX> 
You have undoubtedly noticed the fact that our bottom level list is a pointer to a list, which we create using new. Why do we do this? The answer is that if we don't we expose ourselves to problems with making copies of things. Let's see what happens if we try to avoid pointers.

A first straightforward try is in list_o_list_2.cpp, which just takes out the new and changes pointers to non-pointers:

#include <iostream>
#include <list>
using namespace std;

typedef list <int> intlist;

main()
{
  list <intlist> numlists;
  list <intlist>::iterator nlit;
  intlist il;
  intlist::iterator ilit;
  int i, j;

  for (j = 0; j < 100; j += 10) {
    numlists.push_back(il);
    for (i = 0; i < 10; i++) {
      il.push_back(i+j);
    }
  }

  for (nlit = numlists.begin(); nlit != numlists.end(); nlit++) {
    il = *nlit;
    for (ilit = il.begin(); ilit != il.end(); ilit++) cout << *ilit << " " ;
    cout << endl;
  }
}

When we run it, we get some icky results:

UNIX> list_o_list_2

0 1 2 3 4 5 6 7 8 9 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 
UNIX> 
What's going on? Well, first, you are always inserting il into each top-level list. When you do that, it makes a copy of il, and then does not insert the integers into the copy, but into il. This is why il keeps growing, and why the first line is blank -- we are printing an empty list.

We can fix this by getting rid of il and accessing the list elements directly. A solution is in list_o_list_3.cpp:

#include <iostream>
#include <list>
using namespace std;

typedef list <int> intlist;

main()
{
  list <intlist> numlists;
  list <intlist>::iterator nlit;
  intlist il;
  intlist::iterator ilit;
  int i, j;

  for (j = 0; j < 100; j += 10) {
    numlists.resize(numlists.size()+1);
    for (i = 0; i < 10; i++) {
      numlists.back().push_back(i+j);   /* Yuck */
    }
  }

  for (nlit = numlists.begin(); nlit != numlists.end(); nlit++) {
    il = *nlit;
    for (ilit = il.begin(); ilit != il.end(); ilit++) cout << *ilit << " " ;
    cout << endl;
  }
}

That's an awful line of code, isn't it? Spend some time reading it to make sure you understand it. It seems to work fine:

UNIX> list_o_list_3
0 1 2 3 4 5 6 7 8 9 
10 11 12 13 14 15 16 17 18 19 
20 21 22 23 24 25 26 27 28 29 
30 31 32 33 34 35 36 37 38 39 
40 41 42 43 44 45 46 47 48 49 
50 51 52 53 54 55 56 57 58 59 
60 61 62 63 64 65 66 67 68 69 
70 71 72 73 74 75 76 77 78 79 
80 81 82 83 84 85 86 87 88 89 
90 91 92 93 94 95 96 97 98 99 
UNIX> 
However, there is a bug. That is the line

    il = *nlit;

This line makes a copy of *nlit, which makes a copy of the list. As I said in class, it makes one pine for C, which doesn't let you make copies so wantonly. To fix this, remove il completely (list_o_list_4.cpp):

#include <iostream>
#include <list>
using namespace std;

typedef list <int> intlist;

main()
{
  list <intlist> numlists;
  list <intlist>::iterator nlit;
  intlist il;
  intlist::iterator ilit;
  int i, j;

  for (j = 0; j < 100; j += 10) {
    numlists.resize(numlists.size()+1);
    for (i = 0; i < 10; i++) {
      numlists.back().push_back(i+j);
    }
  }

  for (nlit = numlists.begin(); nlit != numlists.end(); nlit++) {
    for (ilit = nlit->begin(); ilit != nlit->end(); ilit++) cout << *ilit << " " ;
    cout << endl;
  }
}

Again, I find that code unreadable -- in fact, this code is so ugly, you may as well put it all on one line (list_o_list_5.cpp):

#include <iostream>
#include <list>
using namespace std; typedef list <int> intlist; main() { list <intlist> numlists; list <intlist>::iterator nlit; intlist il; intlist::iterator ilit; int i, j; for (j = 0; j < 100; j += 10) { numlists.resize(numlists.size()+1); for (i = 0; i < 10; i++) { numlists.back().push_back(i+j); } } for (nlit = numlists.begin(); nlit != numlists.end(); nlit++) { for (ilit = nlit->begin(); ilit != nlit->end(); ilit++) cout << *ilit << " " ; cout << endl; } }

For the record, I don't advocate doing this -- it's just that list_o_list_4.cpp is so unreadable it may as well be on one line.


Bottom Line

The bottom line is that you should not be afraid to use pointers in your STL structures. It will simplify your code and make it much more efficient.