CS302 Lecture Notes

CS302 Lecture Notes - C++ Strings & Vectors

August 21, 2008
James S. Plank
Directory: /home/plank/cs302/Notes/Strings

See these notes about simple file I/O in C++.

We live and will continue to live in a schizophrenic world regarding strings in C++. Eradicating one's desire to lean on the C-style string constructs that we know and love is difficult, and I'm not sure if it's even beneficial. Worse, before you learn the ins and outs of classes, operator overloading and the like, understanding C++ strings in true detail is impossible.

Nonetheless, we will use them and enjoy the good things that they bring to the table. The major difference between C and C++ strings is explicit memory allocation. A C string is a (char *). When you want to use one, you need to explicitly allocate memory for it, either statically, as in:

   char s[100];

or using malloc() as in:

   s = (char *) malloc(sizeof(char)*100);

   s = strdup(argv[1]);

C++ strings, on the other hand, perform memory allocation automatically, so the analogous line to the above strdup() call would be:

   string s;

   s = argv[1];

With C++ strings, you make copies upon assignment, so the above line makes a copy of argv[1], whereas in the code below, a copy is not made. It is simply a pointer assignment:

   char *s;
   
   s = argv[1];

To hammer this home further, look at stringhelp.cpp:

#include <stdio.h>
#include <iostream>
#include <string>
using namespace std;

main(int argc, char **argv)
{
  char *s;
  string str;

  if (argc != 2) {
    cerr << "usage: stringhelp word\n";
    exit(1);
  }

  s = argv[1];
  str = argv[1];

  str[0] = 'J';

  cout << "After changing str[0] to J: argv[1] = " << argv[1] << 
         ", str = " << str << " and s = " << s << endl;

  s[0] = 'P';
  cout << "After changing s[0] to P:   argv[1] = " << argv[1] << 
         ", str = " << str << " and s = " << s << endl;

  cout << endl;
  cout << "Memory pointers:\n\n";

  printf("%15s: 0x%x\n", "argv[1]", argv[1]);
  printf("%15s: 0x%x\n", "s", s);
  printf("%15s: 0x%x\n", "str.c_str()", str.c_str());
}

As stated above, the first assignment sets the (char *) pointer s to argv[1]. The second sets the C++ string to argv[1], which makes a copy of it. Thus, when we change str[0], argv[1] is unaffected. However, when we change s[0], it affects argv[1], since they both point to the same memory. The printf() statement, printing the memory location of the C strings and the underlying C string in str, confirms this:

UNIX> stringhelp AAAA
After changing str[0] to J: argv[1] = AAAA, str = JAAA and s = AAAA
After changing s[0] to P:   argv[1] = PAAA, str = JAAA and s = PAAA

Memory pointers:

        argv[1]: 0xbfffed67
              s: 0xbfffed67
    str.c_str(): 0x30030c
UNIX>

Something to be aware of is that when you pass around C++ strings, you make copies. So, for example, if you read a file into a string s, and then pass s to a function, then you are making a copy of s. Keep this in mind, because this will happen a lot in C++. If you are passing around big things, pass pointers, and not the things themselves.

Joys of C strings

I will use C-style strings a lot. So you will see the characters c_str() quite a bit. This is how you convert C++ strings to C strings. Why will I use C-style strings? A few reasons:

You can use them with printf().
You can pass them to functions without worrying about making copies.
You can use strcmp(), strchr() and strstr().

C++ has equivalent functionalities, but in my opinion, C does it better, so that's why you'll be seeing C.

Useful C++ string methods

C++ strings are a class, which means that they have many predefined methods, many of which are extremely useful. You can see all of them in http://www.cppreference.com/cppstring/index.html or http://www.cplusplus.com/reference/string/string/. Here are a few of the more useful ones:

int empty() - Returns whether the string is empty.
int length() - Returns the string's length.
int find(s, index) - Find a character or substring s. The parameter s may be a C++ string, a C-style string, or a character. If you call it on a string, it doesn't make a copy of if because it defines its argument as a reference type. Thus find() performs strchr() and strstr() both, depending on the type of the argument. That is known as polymorphism. Remember that word.
The parameter index is the index of where to start the searching. The return value is an index as well. If s is not found, then find() returns string::npos.
So, upon reading all of that, do you think you'll like find() better or worse than strchr() and strstr()?
string substr(int index, int length) - Create a substring from a string. If you leave out length, it will go from index to the end of the string. That's more polymorphism!

The program findthings.cpp illustrates three use of find(): finding the character 'a', the first argument as a C-style string, and the second argument as a C++ string:

#include <stdio.h>
#include <iostream>
#include <fstream>
#include <string>
using namespace std;

main(int argc, char **argv)
{
  string s;
  string arg2;
  int index_a, index_1, index_2;
  int line;

  if (argc != 3) {
    cerr << "usage: findthings string1 string2\n";
    exit(1);
  }

  arg2 = argv[2];

  line = 0;

  while (1) {
    getline(cin, s);
    if (cin.fail()) exit(0);
    line++;

    index_a = s.find('a');
    index_1 = s.find(argv[1]);
    index_2 = s.find(arg2);
    if (index_a == string::npos) {
      cout << "Line " << line << ": No character 'a'" << endl;
    } else {
      cout << "Line " << line << ": Character 'a' at index: " << index_a << endl;
    }
    if (index_1 == string::npos) {
      cout << "Line " << line << ": No string " << argv[1] << endl;
    } else {
      cout << "Line " << line << ": String " << argv[1] << " at index: " << index_1 << endl;
    }
    if (index_2 == string::npos) {
      cout << "Line " << line << ": No string " << argv[2] << endl;
    } else {
      cout << "Line " << line << ": String " << argv[2] << " at index: " << index_2 << endl;
    }
    cout << endl;
  }
}

Here it is on the file input.txt:

Neckbone
Candied Yams
Turnips
Smothered Steak
Smothered Steak!
Grits and Gravy

UNIX> findthings ea n < input.txt
Line 1: No character 'a'
Line 1: No string ea
Line 1: String n at index: 6

Line 2: Character 'a' at index: 1
Line 2: No string ea
Line 2: String n at index: 2

Line 3: No character 'a'
Line 3: No string ea
Line 3: String n at index: 3

Line 4: Character 'a' at index: 13
Line 4: String ea at index: 12
Line 4: No string n

Line 5: Character 'a' at index: 13
Line 5: String ea at index: 12
Line 5: No string n

Line 6: Character 'a' at index: 6
Line 6: No string ea
Line 6: String n at index: 7

UNIX>

Reference Parameters in C++: Swallowing the Red Pill, But Not In A Good Way

Reference parameters appall me, but they are ubiquitous in the STL, so unlike previous years, I will not pretend that they don't exist, but instead address them head-on.

You may declare a procedure parameter to take a reference to a variable. You this by putting an ampersand (&) between the parameter's type and its name. When you do this, a reference is passed to the parameter rather than a copy, which means that if you change the parameter's value, it will change that value in the caller as well.

Let's see an example in refparam.cpp:

#include <stdio.h>
#include <iostream>
#include <string>
using namespace std;

void add_5_to_i_non_ref(int i)
{
  i += 5;
}

void add_5_to_i_ref(int& i)
{
  i += 5;
}

main(int argc, char **argv)
{
  int i;

  if (argc != 2) {
    cerr << "usage: refparam number\n";
    exit(1);
  }

  i = atoi(argv[1]);
  cout << "I is " << i << endl;

  add_5_to_i_non_ref(i);
  cout << "After calling add_5_to_i_non_ref(i).  I is " << i << endl;
  
  add_5_to_i_ref(i);
  cout << "After calling add_5_to_i_ref(i).      I is " << i << endl;
  
  cout << "I feel ill." << endl;
  exit(0);
}

As you can see, the procedures add_5_to_i_non_ref(i) and add_5_to_i_non(i) are identical except that add_5_to_i_non(i) declares i as a reference parameter. That explains the output:

UNIX> refparam 3
I is 3
After calling add_5_to_i_non_ref(i).  I is 3
After calling add_5_to_i_ref(i).      I is 8
I feel ill.
UNIX>

Since add_5_to_i_non(i) declares i as a reference parameter, the value of i is changed in main().

I believe that reference parameters are used for convenience, when you want to pass an object to a procedure that is not going to modify it, and evidently you are too lazy to pass a pointer. Then reference parameters are nice because you get efficiency and you can live in denial that pointers exist for a reason.

Why do they make me sick? Because when I see a procedure call like:

  add_5_to_i_ref(i);

I believe that i will not change, and that is enforced by a well-designed language like C. I don't want to have to go hunting down the prototype of add_5_to_i_ref() to see that i's value might get changed from under me. It is an atrocity.

Nevertheless, the STL uses reference parameters, and better yet, it will not change the things that you pass it. That makes me breathe easier. However, in this class:

IF YOU DECLARE PROCEDURES WITH REFERENCE TYPES, YOU'D BETTER HAVE A GOOD REASON!

What's a good reason? It may make the code read a lot easier, and you don't change the values inside the procedure. I can handle that. I can't handle much else.

Operator Overloading with C++ Strings

Operator overloading means that you can define how operators like plus and comparison work with a class. Take a look at oo.cpp:

#include <stdio.h>
#include <iostream>
#include <string>
using namespace std;

main()
{
  string s1, s2, s3;

  s1 = "AAA";
  s2 = "BBB";
  s3 = "BBB";

  cout << s1 + s2 << endl;
  cout << (s1 == s2) << endl;
  cout << (s2 == s3) << endl;
  cout << (s1 < s3) << endl;

}

There's lots of overloading here, but again it is convenient. Using '+' allows you to concatenate strings. Using '==' and '<' performs strcmp(). By the way, are copies made when you do these operations? The answer is no, but it's always good form to ask.

UNIX> oo
AAABBB
0
1
1
UNIX>

Vectors

Vectors are part of the C++ Standard Template Library, which is a collection of tools that implements a large variety of data structures and algorithms of which you can take advantage. Vectors are like arrays, but are richer, as you can resize them more easily and create them incrementally. You can declare a vector with a specific size, or without a size and it will be sized upon assignment. An example is in v1.cpp:

#include <stdio.h>
#include <iostream>
#include <string>
#include <vector>
using namespace std;

main()
{
  vector <int> v1(10);
  vector <int> v2;
  int i;

  for (i = 0; i < 10; i++) v1[i] = 100+i;
  v2 = v1;
  for (i = 0; i < 10; i++) v1[i] = 200+i;

  for (i = 0; i < 10; i++) {
    cout << i << " " << v1[i] << " " << v2[i] << endl;
  }
}

Here we have two vectors. v1 is a vector of ten integers, and v2's size is unspecified when you declare it. We set the elements of v1 to 100-109, and then set v2 equal to v1. As with strings, this makes a copy of v1, so a new vector of 10 integers is created and initlialized to be 100-109. We then set the elements of v1 to 200-209. Note, that since v2 is a copy of v1, its elements remain unchanged. For that reason, the output of the program is:

Another useful feature of vectors is the ability to treat them like append-only lists. The push_back() method adds an element to the end of a vector, resizing it if necessary. The size() method returns the size of a vector. Therefore, performing a task such as reversing the lines of standard input is very simple with vectors and strings, requiring no malloc() or newstatements: (revstdin.cpp)

#include <stdio.h>
#include <iostream>
#include <string>
#include <vector>
using namespace std;

main()
{
  vector <string> lines(0);
  string s;
  int i;

  while (!cin.eof()) {
    getline(cin, s);
    if (!cin.fail()) lines.push_back(s);
  }

  for (i = lines.size()-1; i >= 0; i--) {
    cout << lines[i] << endl;
  }
}

UNIX> cat input.txt
Neckbone
Candied Yams
Turnips
Smothered Steak
Smothered Steak!
Grits and Gravy
UNIX> revstdin < input.txt
Grits and Gravy
Smothered Steak!
Smothered Steak
Turnips
Candied Yams
Neckbone
UNIX>

Two more vector methods: The method pop_back() removes the last element of the vector, and the method at(i) returns the element at index i, error checking in case i is a bad index. The procedure revstdinstack.cpp reverses standard input using these two procedures:

#include <stdio.h>
#include <iostream>
#include <string>
#include <vector>
using namespace std;

main()
{
  vector <string> lines(0);
  string s;
  int i;

  while (!cin.eof()) {
    getline(cin, s);
    if (!cin.fail()) lines.push_back(s);
  }

  while (!lines.empty()) {
    cout << lines.at(lines.size()-1) << endl;
    lines.pop_back();
  }
}