At the end of this lecture, I give you some optional material about C and C++ strings. I am making that optional, though, because I think at this stage it is more confusing than good. Thus, you should simply use C++ style strings for everything, but be aware that you will have to deal with C-style strings in a few situations.
#include <iostream> #include <sstream> #include <cstdlib> #include <cstdio> using namespace std; main(int argc, char **argv) { double d; int n, i; istringstream ss; if (argc != 2) { cerr << "usage: gendouble iterations\n"; exit(1); } ss.str(argv[1]); if (!(ss >> n)) { cerr << "usage: gendouble iterations\n"; exit(1); } d = 0; for (i = 0; i < n; i++) { d += drand48(); } cout << d << endl; } |
When we run it, we expect the final sum to be roughly n/2, and it runs pretty quickly on my MacbookPro. In fact, running it for roughly 1G iterations (109) takes 30 seconds (time prints out timing information from the operating system -- the third word is the wall-clock time):
UNIX> time gendouble 1000 497.784 0.000u 0.001s 0:00.00 0.0% 0+0k 0+0io 0pf+0w UNIX> time gendouble 10000 4983.82 0.001u 0.001s 0:00.00 0.0% 0+0k 0+0io 0pf+0w UNIX> time gendouble 100000 49964.4 0.006u 0.001s 0:00.00 0.0% 0+0k 0+0io 0pf+0w UNIX> time gendouble 1000000 500184 0.047u 0.001s 0:00.05 80.0% 0+0k 0+0io 0pf+0w UNIX> time gendouble 10000000 5.00124e+06 0.382u 0.002s 0:00.38 100.0% 0+0k 0+0io 0pf+0w UNIX> time gendouble 100000000 5.00023e+07 3.794u 0.012s 0:03.82 99.4% 0+0k 0+0io 0pf+0w UNIX> time gendouble 1000000000 4.99991e+08 37.854u 0.121s 0:38.13 99.5% 0+0k 0+0io 0pf+0w UNIX>Now, let's change the program slightly to append random doubles to a string genstring.cpp
#include <iostream> #include <sstream> #include <cstdlib> #include <cstdio> using namespace std; main(int argc, char **argv) { string s; int n, i; istringstream ss; ostringstream so; if (argc != 2) { cerr << "usage: gendouble iterations\n"; exit(1); } ss.str(argv[1]); if (!(ss >> n)) { cerr << "usage: gendouble iterations\n"; exit(1); } s = ""; for (i = 0; i < n; i++) { so.clear(); so.str(""); so << drand48() << endl; s += so.str(); } if (n <= 10) cout << s; } |
When we run it with an argument of 10, we get ten random doubles:
UNIX> genstring 10 0.396465 0.840485 0.353336 0.446583 0.318693 0.886428 0.0155828 0.58409 0.159369 0.383716 UNIX>And when we try to time it, we get much slower running times than with gendouble:
UNIX> time genstring 1000 0.002u 0.001s 0:00.00 0.0% 0+0k 0+0io 0pf+0w UNIX> time genstring 10000 0.019u 0.001s 0:00.02 50.0% 0+0k 0+0io 0pf+0w UNIX> time genstring 100000 0.150u 0.004s 0:00.15 100.0% 0+0k 0+0io 0pf+0w UNIX> time genstring 1000000 1.417u 0.038s 0:01.46 98.6% 0+0k 0+0io 0pf+0w UNIX> time genstring 10000000 14.000u 0.314s 0:14.37 99.5% 0+0k 0+0io 0pf+0w UNIX> time genstring 100000000 136.256u 2.652s 2:25.52 95.4% 0+0k 0+0io 6pf+0w UNIX>That last output line -- where it says "6pf+0w" -- means that we're starting to have problems finding memory for the program. When we run it with twice that value, we actually run out of memory!
UNIX> time genstring 200000000 genstring(21164) malloc: *** mmap(size=1073745920) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug terminate called after throwing an instance of 'std::bad_alloc' what(): St9bad_alloc Abort 162.765u 2.998s 2:50.55 97.1% 0+0k 0+0io 10pf+0w UNIX>Think about it -- each random number consumes about 8 digits, so including the newline, each double consumes 9 bytes. With 100,000,000 doubles, that is nearly a gigabyte of memory. My Macbook has 1.5 GB, so it's not surprising that I would run out of memory when n equals 200,000,000.
The reason I go over this program is that with C++, it's really easy to write programs that behave pathelogically concerning memory, and memory-burning programs are much harder on a computer than CPU-burning programs. Thus, I want you to start thinking about memory when you write your programs.
#include <iostream> #include <string> #include <cstdlib> #include <cstdio> using namespace std; main() { string a, b; int i; a = "Lighting Strikes. Lightning Strikes Again."; b = "Light"; printf(" "); for (i = 0; i < 43; i++) printf("%d", i%10); printf("\n"); printf("a = %s\n", a.c_str()); printf("b = %s\n", b.c_str()); printf("a.find(b) = %ld\n", a.find(b)); printf("a.find(b, 1) = %ld\n", a.find(b, 1)); printf("a.find(b, 20) = %ld\n", a.find(b, 20)); printf("a.find('g') = %ld\n", a.find('g')); printf("a.find('g', 20) = %ld\n", a.find('g', 20)); printf("a.find(\"Strike\") = %ld\n", a.find("Strike")); printf("a.find(\"Strike\", 20) = %ld\n", a.find("Strike", 20)); printf("a.find(\"Aging\", 0, 2) = %ld\n", a.find("Aging", 0, 2)); printf("string::npos = %ld\n", string::npos); } |
The first three find() calls illustrate finding a C++ string within a string. It returns the index of the first occurrence of the substring. If you call find() with a second argument, it says to start looking after that index. The first occurrence of "Light" after character 1 is at character 19. If find() fails, it returns string::npos, which is in reality -1. However, you should use string::npos rather than -1 to make your programs more portable.
The next two find()'s show finding a character, and the next two show finding a C style substring. The last one shows that if you give it a C style substring, a starting index and a third argument -- the length -- it will only look for length characters of the substring. Thus, even though "Aging" doesn't appear in the string, we're only looking for the first two characters -- "Ag" -- which occur at index 37.
UNIX> string-find 0123456789012345678901234567890123456789012 a = Lighting Strikes. Lightning Strikes Again. b = Light a.find(b) = 0 a.find(b, 1) = 19 a.find(b, 20) = -1 a.find('g') = 2 a.find('g', 20) = 21 a.find("Strike") = 9 a.find("Strike", 20) = 29 a.find("Aging", 0, 2) = 37 string::npos = -1 UNIX>The feature of C++ that lets you define multiple instances of a procedure or method that work on multiple types of arguments is called polymorphism. If you give a combination of arguments that is not supported, then you will get a compilation error. For example, in bad-find.cpp we make a seemingly innocuous call of "a.find(b, 1, 3)":
#include <iostream> #include <string> #include <cstdlib> #include <cstdio> using namespace std; main() { string a, b; int i; a = "Lighting Strikes. Lightning Strikes Again."; b = "Light"; printf(" "); for (i = 0; i < 43; i++) printf("%d", i%10); printf("\n"); printf("a = %s\n", a.c_str()); printf("a.find(b, 1, 3) = %d\n", a.find(b, 1, 3)); } |
This doesn't compile, because there is no definition of find(string, int, int). There are the following definitions:
UNIX> g++ -o bad-find bad-find.cpp bad-find.cpp: In function 'int main()': bad-find.cpp:20: error: no matching function for call to 'std::basic_stringThere are bunch of other types of find(). Read the reference from www.cppreference.com to see how they all work., std::allocator >::find(std::string&, int, int)' /usr/include/c++/4.4/bits/basic_string.tcc:714: note: candidates are: typename std::basic_string<_CharT, _Traits, _Alloc>::size_type std::basic_string<_CharT, _Traits, _Alloc>::find(const _CharT*, typename _Alloc::rebind<_CharT>::other::size_type, typename _Alloc::rebind<_CharT>::other::size_type) const [with _CharT = char, _Traits = std::char_traits , _Alloc = std::allocator ] /usr/include/c++/4.4/bits/basic_string.h:1660: note: typename _Alloc::rebind<_CharT>::other::size_type std::basic_string<_CharT, _Traits, _Alloc>::find(const std::basic_string<_CharT, _Traits, _Alloc>&, typename _Alloc::rebind<_CharT>::other::size_type) const [with _CharT = char, _Traits = std::char_traits , _Alloc = std::allocator ] /usr/include/c++/4.4/bits/basic_string.h:1674: note: typename _Alloc::rebind<_CharT>::other::size_type std::basic_string<_CharT, _Traits, _Alloc>::find(const _CharT*, typename _Alloc::rebind<_CharT>::other::size_type) const [with _CharT = char, _Traits = std::char_traits , _Alloc = std::allocator ] /usr/include/c++/4.4/bits/basic_string.tcc:737: note: typename std::basic_string<_CharT, _Traits, _Alloc>::size_type std::basic_string<_CharT, _Traits, _Alloc>::find(_CharT, typename _Alloc::rebind<_CharT>::other::size_type) const [with _CharT = char, _Traits = std::char_traits , _Alloc = std::allocator ] UNIX>
Substr() is a method that takes a starting index and an optional count, and returns a substring of a string. The simple example program is string-sub.cpp
#include <iostream> #include <string> #include <cstdlib> #include <cstdio> using namespace std; main() { string a; int i; a = "Lighting Strikes. Lightning Strikes Again."; printf(" "); for (i = 0; i < 43; i++) printf("%d", i%10); printf("\n"); printf("a = %s\n", a.c_str()); printf("a.substr(19) = %s\n", a.substr(19).c_str()); printf("a.substr(19, 13) = %s\n", a.substr(19, 13).c_str()); printf("a.substr(19, 13).substr(5) = %s\n", a.substr(19, 13).substr(5).c_str()); } |
When only one argument is given, it returns a substring from the given index to the end of the string. If two arguments are given, it returns the specified number of characters. Since the substring is a string, you can call its methods, such as c_str() and substr().
UNIX> string-sub 0123456789012345678901234567890123456789012 a = Lighting Strikes. Lightning Strikes Again. a.substr(19) = Lightning Strikes Again. a.substr(19, 13) = Lightning Str a.substr(19, 13).substr(5) = ning Str UNIX>
One inconvenient fact of life is that we have to acknowledge and utilize a second representation of strings: their representation in C. This is for many reasons:
#include <iostream> #include <string> #include <cstdlib> #include <cstdio> using namespace std; main(int argc, char **argv) { string a, b; char *ca, *ca2, *ca4; const char *ca3; if (argc != 2) { cerr << "usage: argv-mess arg1\n"; exit(1); } a = argv[1]; ca = argv[1]; ca2 = ca; b = a; ca3 = a.c_str(); printf("%-30s %7s %7s %7s %7s %7s %7s\n", "", "a", "b", "ca", "ca2", "ca3", "argv[1]"); printf("%-30s %7s %7s %7s %7s %7s %7s\n", "", "-------", "-------", "-------", "--------", "-------", "-------"); printf("%-30s %7s %7s %7s %7s %7s %7s\n", "Start:", a.c_str(), b.c_str(), ca, ca2, ca3, argv[1]); a[0] = 'Y'; printf("%-30s %7s %7s %7s %7s %7s %7s\n", "After setting a[0] to 'Y':", a.c_str(), b.c_str(), ca, ca2, ca3, argv[1]); ca[0] = 'L'; printf("%-30s %7s %7s %7s %7s %7s %7s\n", "After setting ca[0] to 'L':", a.c_str(), b.c_str(), ca, ca2, ca3, argv[1]); a = "XX"; printf("%-30s %7s %7s %7s %7s %7s %7s\n", "After setting a to \"XX\":", a.c_str(), b.c_str(), ca, ca2, ca3, argv[1]); } |
This program has two C++ strings (a and b), two (char *)'s (ca and ca2), and a const char * (ca3). It first sets a to equal argv[1]. This converts a C style string (argv[1]) to a C++ style string, which makes a copy. Second, it sets ca to equal argv[1]. This is different -- because ca is a pointer, it doesn't make a copy -- ca and argv[1] simply point to the same character array.
We next set ca2 to equal ca. Once again, that simply sets one pointer to another. It doesn't make a copy of the array's contents. The next statement, which sets b to equal a does make a copy -- when you set one string equal to another, the string library makes a copy.
Finally, we set ca3 to equal a.c_str() -- the C++ string library maintains strings as C-style strings with extra information. When you asking for c_str(), you get a pointer to the underlying C-style string. However, the compiler makes you declare the pointer as a const, which means that you cannot modify the string. That is for safety -- you can look at the string, but you can't mess with it.
We print everything out, and then we change a[0] to 'Y'. We print everything out again, and then we change ca[0] to 'L'. We print everything out again, and then we set a to "XX". We finish by printing everything out again:
UNIX> argv-mess Ho a b ca ca2 ca3 argv[1] ------- ------- ------- -------- ------- ------- Start: Ho Ho Ho Ho Ho Ho After setting a[0] to 'Y': Yo Ho Ho Ho Ho Ho After setting ca[0] to 'L': Yo Ho Lo Lo Ho Lo After setting a to "XX": XX Ho Lo Lo Ho Lo UNIX>It's important for you to understand this output. In the beginning, all strings are "Ho", but there are in actuality three copies of the string:
Now, when we set ca[0] to 'L', you see that ca, ca2 and argv[1] are all changed. That's because they all point to the same character array, and we just changed the first character in that array.
Finally, when we set a to "XX", again only a is changed. Once again -- ca3's contents cannot be relied upon.
Hammering home the point: C-style strings are simply arrays of characters. A C-style string will be a (char *), which points to the first element of the array. Making copies of C-style strings does not make actual copies -- you are simply assigning a pointer.
C++ style strings, on the other hand, are heavyweight objects that maintain extra information like the size of the string. When you copy a C++ string, you make a copy of the contents. That's usually what you want.