CS140 Lecture notes -- The Basics of Strings and Vectors (plus getline)

Directory: ~plank/cs140/notes/SV-Basics

Lecture notes: http://www.cs.utk.edu/~plank/plank/classes/cs140/Notes/SV-Basics

Notes created in 2011

Last modification date: Tue Jan 14 09:20:22 EST 2014

String Basics

Strings are a fundamental class supported by C++ to do text processing. Although most installations won't require you to include <string> for your programs to compile, it can't hurt, so it's not a bad habit to acquire.

Strings in C++ are very natural, so you can often write programs with them without thinking about it too much. For example, the following program (string-basic.cpp) illustrates many functionalities with strings:

Assigning them from string literals.
Using size() to determine their size.
Changing their contents by treating them like an array.
Testing equality using "=="
Comparing them using "<", etc.
Concatenating them with "+".

I'm assuming that all of this is review from CS102.

#include <iostream>
#include <string>
#include <cstdlib>
#include <cstdio>
using namespace std;

main()
{
  string a, b, c;
  int i;

  a = "LIGHTNING";
  b = "Lightning";
  c = "Strikes";

  cout << "a: " << a << " -- " << a.size() << " characters." << endl;
  cout << "b: " << b << " -- " << b.size() << " characters." << endl;
  cout << "c: " << c << " -- " << c.size() << " characters." << endl;
  cout << endl;

  printf("Changing all but the first character of a to lower case:\n\n");

  for (i = 1; i < a.size(); i++) a[i] += ('a' - 'A');
  cout << "Changed a to: " << a << endl << endl;

  printf("Testing equality: (a == b): %d.  (a == c): %d.  (b == c): %d\n", (a == b), (a == c), (b == c));

  printf("Comparison:       (a >= b): %d.  (a >= c): %d.  (b >= c): %d\n", (a >= b), (a >= c), (b >= c));
  printf("Comparison:       (a <= b): %d.  (a <= c): %d.  (b <= c): %d\n", (a <= b), (a <= c), (b <= c));
  printf("Comparison:       (a <  b): %d.  (a <  c): %d.  (b <  c): %d\n", (a <  b), (a <  c), (b <  c));
  printf("Comparison:       (a >  b): %d.  (a >  c): %d.  (b >  c): %d\n", (a >  b), (a >  c), (b >  c));
  cout << endl;

  a = b + c;
  cout << "a = b + c: a is now: " << a << endl;

  return 0;
}

Nothing surprising happens when we run this:

UNIX> g++ -o string-basic string-basic.cpp
UNIX> string-basic
a: LIGHTNING -- 9 characters.
b: Lightning -- 9 characters.
c: Strikes -- 7 characters.

Changing all but the first character of a to lower case:

Changed a to: Lightning

Testing equality: (a == b): 1.  (a == c): 0.  (b == c): 0
Comparison:       (a >= b): 1.  (a >= c): 0.  (b >= c): 0
Comparison:       (a <= b): 1.  (a <= c): 1.  (b <= c): 1
Comparison:       (a <  b): 0.  (a <  c): 1.  (b <  c): 1
Comparison:       (a >  b): 0.  (a >  c): 0.  (b >  c): 0

a = b + c: a is now: LightningStrikes
UNIX>

Those last three functionalities (comparison and concatenation) exploit a feature of C++ that you may learn more about someday. It's called "operator overloading," which allows you to redefine basic operators in the language to work on user-defined objects. In this case, '==', '<=', '>=', '<', '>' and '+' have all been defined to work on strings. It's pretty convenient in this case, but you should know that I'm not a fan of operator overloading in general. Put that in the back of your mind.

Getline()

The procedure getline(cin, s) reads a line of input from standard input and puts it into the string s. Spaces are preserved. Later, we will use getline() and stringstreams to process input. However, here, we simply read lines of standard input with getline() (this in getline.cpp):

#include <iostream>
using namespace std;

main()
{
  string s;
  int ln, len;

  ln = 0;
  while (getline(cin, s)) {
    ln++;
    len = s.size();
    printf("Line %2d - Size: %3d - %s\n", ln, len, s.c_str());
  }
}

Running it, we see that it works as promised:

UNIX> g++ -o getline getline.cpp
UNIX> cat input.txt
Give me a weapon of power, which no one else may hold,
Defend the Gods with honor, To lead the BRAVE and BOLD

    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN

Bequeathed to me by Odin, Molded by the Dwarfs
MINE! This shimmering mallet, The Symbol of the Norse

    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
UNIX> getline < input.txt
Line  1 - Size:  54 - Give me a weapon of power, which no one else may hold,
Line  2 - Size:  54 - Defend the Gods with honor, To lead the BRAVE and BOLD
Line  3 - Size:   0 - 
Line  4 - Size:  45 -     LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
Line  5 - Size:  45 -     LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
Line  6 - Size:   0 - 
Line  7 - Size:  46 - Bequeathed to me by Odin, Molded by the Dwarfs
Line  8 - Size:  53 - MINE! This shimmering mallet, The Symbol of the Norse
Line  9 - Size:   0 - 
Line 10 - Size:  45 -     LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
Line 11 - Size:  45 -     LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
UNIX>

Vector Basics

Vectors are the simplest part of the Standard Template Library (STL). There's a lot of linguistic hocus pocus that has to go on for the STL to work as it does. We won't concern ourselves with that in this class, but we will teach you how to make use of the library effectively. Vectors are a general-purpose array class. They differ from C arrays in many important ways, and they are easier to use. To use a vector, you include vector, and then you declare a vector as follows:

vector <TYPE> variable-name(s)

The TYPE can be a basic type like int or double, or it can be more complex, like a C++ class or even another vector. You can declare a vector as starting with a certain number of elements, or you can declare it to be empty. In either case, you can dynamically modify the vector's size with the resize() method. You can use the size() method to get the vector's current size.

A very simple example program is in vec1.cpp:

#include <cstdio>
#include <vector>
#include <iostream>
using namespace std;

main()
{
  vector <int> v1;
  vector <double> v2(10);
  int i;

  printf("V1's size: %ld.  V2's size: %ld\n", v1.size(), v2.size());
      // It is unfortunate that size() returns a "long", so you must use "%ld".
      // We'll talk about it in class

  v1.resize(5);
  v2.resize(8);

  printf("V1's size: %ld.  V2's size: %ld\n", v1.size(), v2.size());

  for (i = 0; i < v1.size(); i++) v1[i] = 10 + i;
  for (i = 0; i < v2.size(); i++) v2[i] = 20.3 + i;

  printf("V1:"); for (i = 0; i < v1.size(); i++) printf(" %d", v1[i]); printf("\n");
  printf("V2:"); for (i = 0; i < v2.size(); i++) printf(" %.1lf", v2[i]); printf("\n");
}

This program declares an empty integer vector v1 and a ten-element double vector v2, and prints their sizes. It then resizes the vectors to five and eight and prints their sizes again. It then initializes the elements of v1 and v2 in two for loops and prints out the two vectors. Straightforward stuff:

UNIX> vec1
V1's size: 0.  V2's size: 10
V1's size: 5.  V2's size: 8
V1: 10 11 12 13 14
V2: 20.3 21.3 22.3 23.3 24.3 25.3 26.3 27.3
UNIX>

One note -- when you print a size() with printf(), you need to specify "%ld" instead of "%d". This is because sizes are 64-bit quantities. If you don't do it, you will get a compiler warning.

When you create array elements, default values are placed in there. For example, string arrays start with default empty strings. Numerical values should be zero, but frankly I'd be leery of trusting that. You can specify what the default values should be as a second parameter to the resize() method.

Take a careful look at vec2.cpp:

#include <cstdio>
#include <cstdlib>
#include <vector>
#include <iostream>
using namespace std;

main()
{
  vector <int> v1;
  vector <double> v2;
  int i;

  v1.resize(5, 22);
  v1.resize(8, 33);

  for (i = 0; i < v1.size(); i++) printf(" %d", v1[i]); 
  printf("\n");

  v1.resize(6);
  for (i = 0; i < v1.size(); i++) printf(" %d", v1[i]); 
  printf("\n");

  v1.resize(10, 44);
  for (i = 0; i < v1.size(); i++) printf(" %d", v1[i]); 
  printf("\n");

  v2.resize(10, drand48());
  printf("\n");
  for (i = 0; i < v1.size(); i++) printf(" %4.2lf", v2[i]); 
  printf("\n");
}

We first resize v1 to hold five elements that are 22, and then we resize v2 to hold eight elements with a default of 33. This raises a question -- will only the new elements be 33, or will all of them? Look at the output:

UNIX> vec2
 22 22 22 22 22 33 33 33
 22 22 22 22 22 33
 22 22 22 22 22 33 44 44 44 44

 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40 0.40
UNIX>

Only the new ones are given values. The third resize() removes two elements, and then the fourth increases the size from 6 to 10, putting the value of 44 into the new elements.

The last resize() changes v2's size from 0 to ten, setting drand48() as the default. As you see above, that chooses one random value and puts it into every element. Perhaps you thought it would put random elements throughout v2 -- that doesn't happen because drand48() is only called once, and its one return value is what is passed to resize().

push_back(), reverse, tail

Two canonical vector programs are reverse.cpp and mytail.cpp. The first is really simple -- it prints out the lines of standard input in reverse order. You need a data struture like a vector to do this, because you can't print the first line of output until you've read the last line of input, and you have to store all of those lines somewhere.

Here's reverse.cpp

#include <cstdio>
#include <vector>
#include <iostream>
using namespace std;

main()
{
  vector <string> lines;
  int i;
  string s;

  while (getline(cin, s)) lines.push_back(s);
  for (i = lines.size()-1; i >= 0; i--) cout << lines[i] << endl;
}

It makes use of the vector method push_back(), which appends an element to a vector. push_back() is guaranteed to run quickly, and it is much more convenient than resizing the array.

To show reverse running, I first call cat -n on input.txt. That prints input.txt to the screen with line numbers. The second command uses a "pipe" -- which specifies to have standard output of one command be standard input of another. That turns out to be a very powerful feature of Unix -- one that you will use frequently:

UNIX> cat -n input.txt
     1	Give me a weapon of power, which no one else may hold,
     2	Defend the Gods with honor, To lead the BRAVE and BOLD
     3	
     4	    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
     5	    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
     6	
     7	Bequeathed to me by Odin, Molded by the Dwarfs
     8	MINE! This shimmering mallet, The Symbol of the Norse
     9	
    10	    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
    11	    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
UNIX> cat -n input.txt | reverse
    11	    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
    10	    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
     9	
     8	MINE! This shimmering mallet, The Symbol of the Norse
     7	Bequeathed to me by Odin, Molded by the Dwarfs
     6	
     5	    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
     4	    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
     3	
     2	Defend the Gods with honor, To lead the BRAVE and BOLD
     1	Give me a weapon of power, which no one else may hold,
UNIX>

The second program performs the same functionality as the tail command -- it prints out the last ten lines of standard input. We can write a simple version of tail that is like reverse.cpp. It reads all of the lines into a vector and then prints out just the last ten. It's in mytail1.cpp:

#include <cstdio>
#include <vector>
#include <iostream>
using namespace std;

main()
{
  vector <string> lines;
  int i;
  string s;

  while (getline(cin, s)) lines.push_back(s);
  i = lines.size()-10;
  if (i < 0) i = 0;
  for ( ; i < lines.size(); i++) cout << lines[i] << endl;
}

We need the if statement to handle files that are smaller than 10 lines. If we didn't have the if statement, i would be less than zero, and our program would have a bug. Hopefully that bug would be manifested by a segmentation violation, but you never know. Since we've put in that if statement, there is no bug:

UNIX> cat -n input.txt | mytail1
     2	Defend the Gods with honor, To lead the BRAVE and BOLD
     3	
     4	    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
     5	    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
     6	
     7	Bequeathed to me by Odin, Molded by the Dwarfs
     8	MINE! This shimmering mallet, The Symbol of the Norse
     9	
    10	    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
    11	    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
UNIX> head -n 4 input.txt | mytail1
Give me a weapon of power, which no one else may hold,
Defend the Gods with honor, To lead the BRAVE and BOLD

    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
UNIX>

I'll contend, though, that mytail1.cpp is not as good of a program as it could be. Why? Consider what happens if you call it on a file 1,000,000 lines. You are storing all 1,000,000 lines, but you are only printing the last ten. That's a big waste of memory!

This problem is fixed in mytail2.cpp:

#include <cstdio>
#include <vector>
#include <iostream>
using namespace std;

main()
{
  vector <string> lines;
  int i, ln;
  string s;

  ln = 0;
  while (getline(cin, s)) {
    if (ln < 10) {
      lines.push_back(s);
    } else {
      lines[ln%10] = s;
    }    
    ln++;
  }
  i = ln-10;
  if (i < 0) i = 0;
  for ( ; i < ln; i++) cout << lines[i%10] << endl;
}

Once lines becomes ten elements long, we no longer call push_back(), but instead replace the oldest element with s. When reading is done, we have the last ten lines, but not always starting at element 0. To print them out, we need to find the array element for each of the last ten lines. Consider line l. If it is in the array, it will be in element l%10. Thus, if the file has ln total lines and ln > 10, then we want to print out lines ln-10 to ln-1. The for loop that ends the program does just that.

Put this bug in the back of your head

I'll explain this in class. I won't test you on this, but remember it, because it will happen to you someday (I'll go over it again later as well).

mytail3.cpp is identical to mytail1.cpp, except the for loop is as follows:

  for (i = (lines.size()-10 < 0) ? 0 : lines.size()-10 ; i < lines.size(); i++) {
    cout << lines[i] << endl;
  }

Take a look at the following output:

UNIX> cat input.txt | mytail3
Defend the Gods with honor, To lead the BRAVE and BOLD

    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN

Bequeathed to me by Odin, Molded by the Dwarfs
MINE! This shimmering mallet, The Symbol of the Norse

    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
    LIGHTNING STRIKES LIGHTNING STRIKES AGAIN
UNIX> head -n 5 input.txt | mytail3
UNIX>

Do you see a problem? Again, we'll go over this in class, because it is confusing. And evil.