CS140 Lecture Notes - Lists, Iterators, Bad Vector Usage, Deques


The list type is one of the very useful parts of the STL. I'll present a canonical list example: reversing the lines of standard input. To do that, we'll create a list of strings, where the list has the lines in reverse order, and we'll traverse the list and print out the lines. To create the list, we start with an empty list and insert each string to the front of the list using the push_front() method.

The code is in reverse_1.cpp:

#include <iostream>
#include <list>
using namespace std;

main()
{
  list <string> lines;
  list <string>::iterator lit;
  string s;

  while (getline(cin, s)) lines.push_front(s);
 
  for (lit = lines.begin(); lit != lines.end(); lit++) {
    cout << *lit << endl;
  }
}

A few things -- you declare an empty list just like you declare an empty vector. In fact, the code to create the list is very much like the code to create a vector, except we are using push_front() to prepend each string to the front of the list.

To traverse the list, we use an iterator, which is a special type defined by the template library. The for loop is typical -- you start with the first element of the list, obtained with the begin() method, and traverse until you are one element beyond the end of the list (signified by the end() method). To go from one element to the next, you increment the iterator. I don't like this usage of overloading, but it wasn't up to me.

Then, to access the element in the list, you use pointer indirection (the asterisk). When you get used to seeing this code, it reads nicely. It does take a little acclimation though. Regardless, it works:

UNIX> cat input.txt
Born in the night
She would run like a leopard
That freaks at the sight
Of a mind close beside herself
UNIX> reverse_1 < input.txt
Of a mind close beside herself
That freaks at the sight
She would run like a leopard
Born in the night
UNIX> 

Reverse Iterators

Instead of creating a list in reverse order, we also could have created the list in order, and traversed it in reverse order. That code is in reverse_2.cpp

#include <iostream>
#include <list>
using namespace std;

main()
{
  list <string> lines;
  list <string>::reverse_iterator lit;
  string s;

  while (getline(cin, s)) lines.push_back(s);
 
  for (lit = lines.rbegin(); lit != lines.rend(); lit++) {
    cout << *lit << endl;
  }
}

We've created the list with push_back(), and we change lit to be a reverse_iterator. The iteration proceeds from rbegin(), which is the last element of the list, to rend(), which is one element before the first element of the list. Note, we still increment lit -- is that natural? You be the judge.

UNIX> reverse_2 < input.txt
Of a mind close beside herself
That freaks at the sight
She would run like a leopard
Born in the night
UNIX> 

List Insertion

Lists have an insert method, which takes an iterator as an argument, and inserts an element in front of that iterator. Thus lines.push_front() is equivalent to lines.insert(lines.begin()) and lines.push_back() is equivalent to lines.insert(lines.end()).

The program reverse_3.cpp implements reversal by inserting each element at the front and traversing the list in the forward direction:

#include <iostream>
#include <list>
using namespace std;

main()
{
  list <string> lines;
  list <string>::iterator lit;
  string s;

  while (getline(cin, s)) lines.insert(lines.begin(), s);
 
  for (lit = lines.begin(); lit != lines.end(); lit++) {
    cout << *lit << endl;
  }
}

It works like the others:

UNIX> reverse_3 < input.txt
Of a mind close beside herself
That freaks at the sight
She would run like a leopard
Born in the night
UNIX> 

You can insert into vectors too, but you should not.

You can use iterators with vectors, and they work just like lists. Moreover, for some ill-judged reason, the implementors of the STL felt it ok to implement an insert() operation on vectors. This means that you can change the code of reverse_3.cpp to use vectors instead of lists. It is in reverse_4.cpp:

#include <iostream>
#include <vector>
using namespace std;

main()
{
  vector <string> lines;
  vector <string>::iterator lit;
  string s;

  while (getline(cin, s)) lines.insert(lines.begin(), s);
 
  for (lit = lines.begin(); lit != lines.end(); lit++) {
    cout << *lit << endl;
  }
}

I call this ill-judged because when you perform an insertion such as v.insert(v.begin(), x), the STL basically does the following:

  v.resize(v.size()+1);
  for (i = v.size(); i > 0; i--) v[i] = v[i-1];
  v[0] = x;

In other words, it copies all of the elements of the vector to make room for the new element at v[0]. This is expensive, and makes reverse_4.cpp above run in O(n2) time.

To illustrate, input-2.txt is an input file with 10,000 lines, and input-3.txt is one with 40,000 lines. Look at the difference in speed between reverse_3 and reverse_4:

UNIX> wc input-2.txt
10000 10000 80000 input-2.txt
UNIX> wc input-3.txt
 40000  40000 320000 input-3.txt
UNIX> time reverse_3 < input-2.txt > /dev/null
0.012u 0.000s 0:00.01 100.0%  0+0k 0+0io 0pf+0w
UNIX> time reverse_3 < input-3.txt > /dev/null
0.024u 0.008s 0:00.03 66.6% 0+0k 0+0io 0pf+0w
UNIX> time reverse_4 < input-2.txt > /dev/null
0.452u 0.000s 0:00.45 100.0%  0+0k 0+0io 0pf+0w
UNIX> time reverse_4 < input-3.txt > /dev/null
7.008u 0.012s 0:07.04 99.5% 0+0k 0+0io 0pf+0w
UNIX> 
As you can see, reverse_3 is very fast (0.012 and 0.024 seconds on one of our hydra machines in 2014), while reverse_4 is painfully slow (0.45 and 7 seconds). This is is important, and you should take care that it doesn't happen to you.

A good rule of thumb is to use a vector as an array and not a list. Don't use iterators -- use integer indices. Then you're ok.


Deques

The STL defines a deque, which stands for "double-ended queues." You use these like vectors, but with the proviso that you can insert or delete from either end of the deque with high efficiency. For that reason, they have a push_front() method (which vectors do not have). We can therefore use a deque to create a vector in reverse order, and then traverse it forwards. The code is in reverse_5.cpp:

#include <iostream>
#include <deque>
using namespace std;

main()
{
  deque <string> lines;
  int i;
  string s;

  while (getline(cin, s)) lines.push_front(s);

  for (i = 0; i < lines.size(); i++) cout << lines[i] << endl;
}

Unlike the vector version, this one runs very fast:

UNIX> reverse_5 < input.txt
Of a mind close beside herself
That freaks at the sight
She would run like a leopard
Born in the night
UNIX> time reverse_5 < input-2.txt > /dev/null
0.004u 0.004s 0:00.01 0.0%  0+0k 0+0io 0pf+0w
UNIX> time reverse_5 < input-3.txt > /dev/null
0.032u 0.000s 0:00.03 100.0%  0+0k 0+0io 0pf+0w
UNIX>

Another example -- mytail

Another easy list program is mytail, which prints the last ten lines of a file (if there are more than ten lines -- if there are fewer, then it just prints the file). To write it, we use the erase() method, which takes an iterator as a parameter, and erases it from the list. Thus, we read in lines of text and append them to a list. When the list grows bigger than ten elements, we delete the first element, which will bring it down to the last ten lines. The code is straightforward, in mytail_list.cpp:

#include <iostream>
#include <list>
using namespace std;

main()
{
  list <string> lines;
  list <string>::iterator lit;
  string s;

  while (getline(cin, s)) {
    lines.push_back(s);
    if (lines.size() > 10) lines.erase(lines.begin());
  }
 
  for (lit = lines.begin(); lit != lines.end(); lit++) {
    cout << *lit << endl;
  }
}

Works fine:

UNIX> mytail_list < input-2.txt
  9991  
  9992  
  9993  
  9994  
  9995  
  9996  
  9997  
  9998  
  9999  
 10000  
UNIX> mytail_list < input-3.txt
 39991  
 39992  
 39993  
 39994  
 39995  
 39996  
 39997  
 39998  
 39999  
 40000  
UNIX> 
As with the previous example, we can port the code directly to vectors and to deques, since they both implement an erase() method. As with the other example, we see that the vector implemention performs worse, since it copies all of the remaining elements upon deletion (the shell scripts make them do more work so that you can see the difference):
UNIX> time sh big_mytail_list.sh
0.411u 0.013s 0:00.42 100.0%    0+0k 0+0io 0pf+0w
UNIX> time sh big_mytail_deque.sh
0.370u 0.012s 0:00.38 100.0%    0+0k 0+0io 0pf+0w
UNIX> time sh big_mytail_vector.sh
0.507u 0.012s 0:00.52 98.0%     0+0k 0+1io 0pf+0w
UNIX> 
The difference isn't huge, but it is there. Were we to keep the last 100 lines instead of the last 10, the difference would be much more pronounced (we did this in class).

The Topcoder DiamondHunt Example

This is from Topcoder SRM 346 D2, 250-pointer. Here's their problem description. The bottom line is that you have a string s, composed of less-than and greater-than signs. Your job is to look for "diamonds" which are '<>' substrings. If you find a diamond, you remove it from the string, and continue to look for more diamonds. You return the number of diamonds that you find.

They give a few examples:

StringNumber of diamonds
"><<><>>><"3
">>>><<"0
"<<<<<<<<<>>>>>>>>>"9
"><<><><<>>>><<>><<><<>><<<>>>>>><<<"14

I've hacked up two solutions to this problem. The first is in DiamondHunt1.cpp. I've added a main() so that you can enter strings on standard input, and it will print countDiamonds() for each string.

This implementation works directly from the problem statement, using the find() method of strings to find a diamond, and then using substr() to remove the diamond:

#include <iostream>
#include <cstdio>
#include <cstdlib>
using namespace std;

class DiamondHunt {
  public:
    int countDiamonds(string mine);
};

int DiamondHunt::countDiamonds(string mine)
{
  int nd, i;

  nd = 0;
  while (1) {
    i = mine.find("<>");
    if (i == string::npos) return nd;
    nd++;
    mine = mine.substr(0, i) + mine.substr(i+2);
  }
}

main()
{
  DiamondHunt d;
  string s;

  while (cin >> s) {
    cout << d.countDiamonds(s) << endl;
  }
  exit(0);
}

When we test it out, it works fine:

UNIX> g++ -o DiamondHunt1 DiamondHunt1.cpp
UNIX>  DiamondHunt1
<>
1
><
0
><<><>>><
3
>>>><<
0
<<<<<<<<<>>>>>>>>>
9
><<><><<>>>><<>><<><<>><<<>>>>>><<<
14
UNIX> 
Although this solution works, think about its running time. In particular, think about the "<<<<<<<<<>>>>>>>>>" input. It has to scan nine characters before finding the diamond. Then the next time it has to scan 8, then 7, etc. In other words, if you have a string of n less-than signs followed by n greater-than signs, you will have to perform n2 scans to find the diamonds. When n is small (25 in the topcoder constraints), that doesn't make a difference. However, it can matter. The program make_bad_diamond.cpp is a very simple C++ program that takes n on the command line and produces a string with n less-than signs followed by n greater-than signs. See what happens when we call it with successively larger values and time the output:
UNIX> time sh -c "make_bad_diamond 10 | DiamondHunt1"
10
0.000u 0.000s 0:00.00 0.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "make_bad_diamond 100 | DiamondHunt1"
100
0.010u 0.000s 0:00.00 0.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "make_bad_diamond 1000 | DiamondHunt1"
1000
0.010u 0.000s 0:00.01 100.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "make_bad_diamond 10000 | DiamondHunt1"
10000
0.800u 0.000s 0:00.79 101.2%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "make_bad_diamond 100000 | DiamondHunt1"
100000
79.350u 0.010s 1:19.66 99.6%	0+0k 0+0io 0pf+0w
UNIX> 
When the input size is increased by a factor of 10, the running time is increased by a factor of 100. That's not good.

Instead, DiamondHunt2.cpp uses a list. It copies the elements of mine to a list, and then uses three iterators on the list:

After we erase left and right, we set left to be newleft:

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <list>
using namespace std;

class DiamondHunt {
  public:
    int countDiamonds(string mine);
};

int DiamondHunt::countDiamonds(string mine)
{
  int nd, i;
  list <char> l;
  list <char>::iterator left, right, newleft;

  for (i = 0; i < mine.size(); i++) l.push_back(mine[i]);
  
  nd = 0;
  left = l.begin();
  while (left != l.end()) {
    if (*left == '>') {
      left++;             // If left is not the beginning of a diamond, move on.
    } else {
      right = left;
      right++;
      if (right == l.end()) return nd;

      if (*right == '<') {   // If right is not the end of a diamond, move on
        left++;
      } else {            // Otherwise, we've found a diamond.  We need to
        nd++;             // increment nd, and set newleft to point to the previous
                          // char, or if left is at the beginning, to the next one.

        if (left == l.begin()) {
          newleft = right;
          newleft++;
        } else {
          newleft = left;
          newleft--;
        }

        l.erase(left);      // Now erase left and right, and set left to newleft.
        l.erase(right);
        left = newleft;
      }
    }
  }
  return nd;
}

main()
{
  DiamondHunt d;
  string s;

  while (cin >> s) {
    cout << d.countDiamonds(s) << endl;
  }
  exit(0);
}

It works on the examples as before:

UNIX> g++ -o DiamondHunt2 DiamondHunt2.cpp
UNIX>  DiamondHunt2
<>
1
><
0
><<><>>><
3
>>>><<
0
<<<<<<<<<>>>>>>>>>
9
><<><><<>>>><<>><<><<>><<<>>>>>><<<
14
UNIX> 
However, it is much faster than the previous version because we don't traverse the list on each iteration as we did with m.find():
UNIX> time sh -c "make_bad_diamond 1000 | DiamondHunt2"
1000
0.020u 0.000s 0:00.00 0.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "make_bad_diamond 10000 | DiamondHunt2"
10000
0.020u 0.000s 0:00.00 0.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "make_bad_diamond 100000 | DiamondHunt2"
100000
0.040u 0.010s 0:00.04 125.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "make_bad_diamond 1000000 | DiamondHunt2"
1000000
0.470u 0.000s 0:00.44 106.8%	0+0k 0+0io 0pf+0w
UNIX> 
As we increase the string by a factor of 10, we increase the running time by a factor of ten. That's much better than DiamondHunt1.

It's important for you to understand the code in DiamondHunt2.cpp. To help you, here's an example when we call it on the string: "<<>><<>": I will draw every iteration of the while() loop. Here are the list and the iterators in the first iteration:

I'm drawing the list with two sentinel nodes at each end. Before the first node is a sentinel node for l.rend(), and after the last node is a sentinel node for l.end(). We start with left equaling l.begin(), and since it points to a less-than character, we set right to be the next node. Since right also points to a less-than node, there is no diamond -- we increment left and go to the next iteration of the while() loop:

Now left points to a less-than and right points to a greater-than. So, we increment nd and then set newleft to be the node before left. That is pictured below:

We then erase left and right, and set left to newleft before going back to the top of the while() loop. Here's what happens in the next iteration:

The two erased nodes are gone from the picture, and left and right point to a diamond. Thus, nd is incremented, and since left is equal to l.begin(), we set newleft to be the node after right. That is the state pictured above. We then erase left and right, and set left to newleft before going back to the top of the while() loop. Here's what happens in the fourth iteration:

This is the same case as the first iteration -- no diamond. We increment left and move on:

We have a diamond. We first increment nd. Next, since left is not equal to l.begin(), we set newleft to point to the node before left. That is depicted above. We then erase, set left to newleft and reach the last iteration of the while() loop:

Since right equals l.end(), we return 3, and we're done. It's important that you step through this example until you understand it. You may even want to step through what happens when the string is we call it on the string "<<>>><>". The execution is very similar, except the fourth and sixth iterations look a little different.


When do I use each data structure?

We've covered quite a bit in this lecture. One of the bottom lines that I want you to receive from this lecture is that there are three data structures: vectors, deques and lists, and you should use each appropriately:
I did not go over the following material this year, because I've gone over similar material. However, I keep the notes here in case you want to reinforce your understanding of lists and pointers.

Nested List Traversal, and Why We Use Pointers

As an example in building lists of lists, we wrote the program list_o_list_1.cpp, which creates a list of lists. The top-level list is a list of ten lists of integers. These bottom-level lists contain ten integers each. The first bottom-level list contains the integers from 0 to 9. The next contains the integers from 10 to 19, etc.

After creating the list of lists, we traverse it an print out each bottom level list on one line:

#include <iostream>
#include <list>
using namespace std;

typedef list <int> intlist;

main()
{
  list <intlist *> numlists;
  list <intlist *>::iterator nlit;
  intlist *il;
  intlist::iterator ilit;
  int i, j;

  for (j = 0; j < 100; j += 10) {
    il = new intlist;
    numlists.push_back(il);
    for (i = 0; i < 10; i++) {
      il->push_back(i+j);
    }
  }

  for (nlit = numlists.begin(); nlit != numlists.end(); nlit++) {
    il = *nlit;
    for (ilit = il->begin(); ilit != il->end(); ilit++) cout << *ilit << " " ;
    cout << endl;
  }
}

The typedef makes the code cleaner, so that you don't have nested list declarations.

This code runs nicely, and as you'd expect:

UNIX> list_o_list_1
0 1 2 3 4 5 6 7 8 9 
10 11 12 13 14 15 16 17 18 19 
20 21 22 23 24 25 26 27 28 29 
30 31 32 33 34 35 36 37 38 39 
40 41 42 43 44 45 46 47 48 49 
50 51 52 53 54 55 56 57 58 59 
60 61 62 63 64 65 66 67 68 69 
70 71 72 73 74 75 76 77 78 79 
80 81 82 83 84 85 86 87 88 89 
90 91 92 93 94 95 96 97 98 99 
UNIX> 
You have undoubtedly noticed the fact that our bottom level list is a pointer to a list, which we create using new. Why do we do this? The answer is that if we don't use pointers, we expose ourselves to problems with making copies of things. Let's see what happens if we try to avoid pointers.

A first straightforward try is in list_o_list_2.cpp, which just takes out the new and changes pointers to non-pointers:

#include <iostream>
#include <list>
using namespace std;

typedef list <int> intlist;

main()
{
  list <intlist> numlists;
  list <intlist>::iterator nlit;
  intlist il;
  intlist::iterator ilit;
  int i, j;

  for (j = 0; j < 100; j += 10) {
    numlists.push_back(il);
    for (i = 0; i < 10; i++) {
      il.push_back(i+j);
    }
  }

  for (nlit = numlists.begin(); nlit != numlists.end(); nlit++) {
    il = *nlit;
    for (ilit = il.begin(); ilit != il.end(); ilit++) cout << *ilit << " " ;
    cout << endl;
  }
}

When we run it, we get some icky results:

UNIX> list_o_list_2

0 1 2 3 4 5 6 7 8 9 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 
UNIX> 
What's going on? Well, first, you are always inserting il into each top-level list. When you do that, it makes a copy of il, and then does not insert the integers into the copy, but into il. This is why il keeps growing, and why the first line is blank -- we are printing an empty list.

We can fix this by getting rid of il and accessing the list elements directly. A solution is in list_o_list_3.cpp:

#include <iostream>
#include <list>
using namespace std;

typedef list <int> intlist;

main()
{
  list <intlist> numlists;
  list <intlist>::iterator nlit;
  intlist il;
  intlist::iterator ilit;
  int i, j;

  for (j = 0; j < 100; j += 10) {
    numlists.resize(numlists.size()+1);
    for (i = 0; i < 10; i++) {
      numlists.back().push_back(i+j);   /* Yuck */
    }
  }

  for (nlit = numlists.begin(); nlit != numlists.end(); nlit++) {
    il = *nlit;
    for (ilit = il.begin(); ilit != il.end(); ilit++) cout << *ilit << " " ;
    cout << endl;
  }
}

That's an awful line of code, isn't it? Spend some time reading it to make sure you understand it. It seems to work fine:

UNIX> list_o_list_3
0 1 2 3 4 5 6 7 8 9 
10 11 12 13 14 15 16 17 18 19 
20 21 22 23 24 25 26 27 28 29 
30 31 32 33 34 35 36 37 38 39 
40 41 42 43 44 45 46 47 48 49 
50 51 52 53 54 55 56 57 58 59 
60 61 62 63 64 65 66 67 68 69 
70 71 72 73 74 75 76 77 78 79 
80 81 82 83 84 85 86 87 88 89 
90 91 92 93 94 95 96 97 98 99 
UNIX> 
However, there is a bug. That is the line

    il = *nlit;

This line makes a copy of *nlit, which makes a copy of the list. As I said in class, it makes one pine for C, which doesn't let you make copies so wantonly. To fix this, remove il completely (list_o_list_4.cpp):

#include <iostream>
#include <list>
using namespace std;

typedef list <int> intlist;

main()
{
  list <intlist> numlists;
  list <intlist>::iterator nlit;
  intlist il;
  intlist::iterator ilit;
  int i, j;

  for (j = 0; j < 100; j += 10) {
    numlists.resize(numlists.size()+1);
    for (i = 0; i < 10; i++) {
      numlists.back().push_back(i+j);
    }
  }

  for (nlit = numlists.begin(); nlit != numlists.end(); nlit++) {
    for (ilit = nlit->begin(); ilit != nlit->end(); ilit++) cout << *ilit << " " ;
    cout << endl;
  }
}

Again, I find that code unreadable -- in fact, this code is so ugly, you may as well put it all on one line (list_o_list_5.cpp):

#include <iostream>
#include <list>
using namespace std; typedef list <int> intlist; main() { list <intlist> numlists; list <intlist>::iterator nlit; intlist il; intlist::iterator ilit; int i, j; for (j = 0; j < 100; j += 10) { numlists.resize(numlists.size()+1); for (i = 0; i < 10; i++) { numlists.back().push_back(i+j); } } for (nlit = numlists.begin(); nlit != numlists.end(); nlit++) { for (ilit = nlit->begin(); ilit != nlit->end(); ilit++) cout << *ilit << " " ; cout << endl; } }

For the record, I don't advocate doing this -- it's just that list_o_list_4.cpp is so unreadable it may as well be on one line.