You declare that a variable is a pointer to a data type with the asterisk. For example, if I want the variable ip to point to an integer, I declare it as:
int *ip; |
The pointer itself is a four or eight byte piece of data. For now, think of that data as being an arrow that points to other data. In the above code, it points to an integer.
You can set a pointer's value by having it point to a piece of data that already exists. You do that using the ampersand. When you want to reference what a pointer points to, you use the asterisk. For example, the program pointer_mess_1.cpp has the pointer ip point to the integer i, and then does some setting and printing of that integer:
#include <vector> #include <string> #include <cstdio> #include <cstdlib> #include <cctype> using namespace std; main() { int i; int *ip1; ip1 = &i; i = 5; printf("i: %2d *ip1: %2d\n", i, *ip1); *ip1 = 10; printf("i: %2d *ip1: %2d\n", i, *ip1); i = 15; printf("i: %2d *ip1: %2d\n", i, *ip1); } |
When we run it, we see the following output:
UNIX> pointer_mess_1 i: 5 *ip1: 5 i: 10 *ip1: 10 i: 15 *ip1: 15 UNIX>I'll try to illustrate this with pictures below. When you first declare i and ip, that allocates space for them. Setting ip to be &i has ip "point" to i. Then, when you set i or *ip, it sets the integer stored in i. For that reason, printing i or *ip prints the same value:
The program pointer_mess_2.cpp is a little more complex:
#include <cstdio> #include <cstdlib> using namespace std; main() { int i; int *ip1, *ip2; ip1 = &i; i = 5; printf("i: %2d *ip1: %2d\n", i, *ip1); ip1 = new int; *ip1 = 10; printf("i: %2d *ip1: %2d\n", i, *ip1); ip1 = new int; *ip1 = 15; printf("i: %2d *ip1: %2d\n", i, *ip1); ip2 = ip1; *ip2 = 20; printf("i: %2d *ip1: %2d *ip2: %2d\n", i, *ip1, *ip2); ip1 = new int; *ip1 = 25; printf("i: %2d *ip1: %2d *ip2: %2d\n", i, *ip1, *ip2); delete ip1; printf("i: %2d *ip1: %2d *ip2: %2d\n", i, *ip1, *ip2); } |
This program has two pointers, ip1 and ip2. The first statements are the same as the previous program -- ip is set to be &i and then i is set to five:
In the next set of statements, we call new. This creates a new integer and has ip1 point to it. This is called "allocating memory." This memory is guaranteed to exist until either the program ends, or you explicitly call delete on the pointer. That makes it different from local variables or procedure parameters which are created when a procedure starts running and is reclaimed when the procedure ends. Here's a picture -- you should see why the printf() statement prints out i as 5, and *ip1 as 10.
In the next set of statements, we call new again and get the following picture:
Printing out i and *ip1 yields 5 and 15. However, you'll note that the integer that holds 10 has nothing pointing to it. This is called a memory leak, because that memory will not be deallocated until the program ends, and no part of the program can access it. Memory leaks are things that you want to avoid, especially when you're writing programs that will run for a long time (like a web browser or editor).
The next statements have ip2 equal ip1. In other words, they point to the same integer. Thus, when you change *ip2 to 20, then both *ip2 and *ip1 equal 20:
Now we call new again, so that ip1 and ip2 point to different integers. This is not a memory leak, because we can still access ip2. Printing i, *ip1 and *ip2 all print different values.
Finally, I call delete ip1. This releases the integer so that the memory can be reused by subsequent new calls. You'll note that I still print out *ip1 -- the results of this may differ from machine to machine. It might print out the old value because the memory hasn't been reused, or something else might happen. This is a source of bugs that are extremely difficult to track, and the compiler usually won't helpful. This is another reason why students hate pointers -- they can get you into trouble. Regardless, you need to learn and understand them.
Here's the output:
UNIX> pointer_mess_2 i: 5 *ip1: 5 i: 5 *ip1: 10 i: 5 *ip1: 15 i: 5 *ip1: 20 *ip2: 20 i: 5 *ip1: 25 *ip2: 20 i: 5 *ip1: 25 *ip2: 20 UNIX>
#include <vector> #include <string> #include <cstdio> #include <cstdlib> using namespace std; typedef vector <int> IVec; class VP { public: IVec iv; }; main() { IVec iv; IVec *ivp; vector <IVec *> *ivpp; int i, j; VP *vp; for (i = 0; i < 10; i++) iv.push_back(lrand48()%100); for (i = 0; i < iv.size(); i++) printf("%2d ", iv[i]); printf("\n"); ivp = &iv; for (i = 0; i < ivp->size(); i++) printf("%2d ",ivp->at(i)); printf("\n"); ivp->resize(4); for (i = 0; i < iv.size(); i++) printf("%2d ", iv[i]); printf("\n"); ivpp = new vector <IVec *>; ivpp->push_back(ivp); ivpp->push_back(ivp); for (i = 0; i < ivpp->size(); i++) { printf("Element %d of ivpp points to the vector:", i); ivp = ivpp->at(i); for (j = 0; j < ivp->size(); j++) printf(" %2d", ivp->at(j)); printf("\n"); } iv[0] = -50; for (i = 0; i < ivpp->size(); i++) { printf("Element %d of ivpp points to the vector:", i); ivp = ivpp->at(i); for (j = 0; j < ivp->size(); j++) printf(" %2d", ivp->at(j)); printf("\n"); } vp = new VP; for (i = 0; i < 10; i++) vp->iv.push_back(lrand48()%100); for (i = 0; i < vp->iv.size(); i++) printf("%2d ", vp->iv[i]); printf("\n"); } |
I'll illustrate again. First, we set iv to be a vector of 10 random integers. Then we set ivp to point to iv and we print it out. I use the at() method instead of square brackets just because it looks better. Instead of ivp->at(i), I could have done (*ip)[i], but I don't like how that looks.
In case you've forgotten from CS102/ECE206, doing ivp->at() is equivalent to (*ivp).at(). It looks nicer.
Next, I call ivp->resize(4). As you can see, that resizes the vector to which ivp points, which ends up resizing iv:
The next part of the code will probably confuse you at first, but work through it. Ivpp is a pointer to a vector. The elements of the vector are pointers to integer vectors. The new statement and two push_back() statements create a new vector, which has two elements. Each of them point to iv:
After printing ivpp we set iv[0] to -50. Thus, when we print out ivpp as before, both lines start with -50, since they point to the same thing:
The final piece of code involving vp shows how I like to handle pointers to vectors. I usually like to bundle them up in a class or struct, so that I don't have to use at() and it still looks nice.
Here's the output. As with the previous program, make sure you understand every line of output and why it happens.
UNIX> pointer_mess_3 18 87 91 23 17 65 14 97 19 66 18 87 91 23 17 65 14 97 19 66 18 87 91 23 Element 0 of ivpp points to the vector: 18 87 91 23 Element 1 of ivpp points to the vector: 18 87 91 23 Element 0 of ivpp points to the vector: -50 87 91 23 Element 1 of ivpp points to the vector: -50 87 91 23 92 54 8 54 50 56 68 19 61 39 UNIX>
#include <iostream> #include <algorithm> #include <vector> #include <string> using namespace std; main() { vector <string> lines; string s; int i; while (getline(cin, s)) lines.push_back(s); sort(lines.begin(), lines.end()); for (i = 0; i < lines.size(); i++) cout << lines[i] << endl; } |
By default, strings are sorted lexicographically and numbers numerically (duh):
UNIX> cat input.txt Sydney Ingratiate Arianna Exonerate Sarah Charcoal Alexis Kaitlyn Inferential Cole Hunter Ourselves Joseph Dilatory Julian Fluvial Mia Leah Skied Oliver Celebrate Liam Trisyllable UNIX> sortlines1 < input.txt Alexis Kaitlyn Inferential Arianna Exonerate Cole Hunter Ourselves Joseph Dilatory Julian Fluvial Liam Trisyllable Mia Leah Skied Oliver Celebrate Sarah Charcoal Sydney Ingratiate UNIX>If you want to sort, but you don't want to use the default sorting mechanism, you can provide your own comparison function. It should take two parameters of the same type as the vector and return a bool (true or false). It should return true if the first parameter is less than the second, and false otherwise. Therefore, the following program (sortlines2.cpp) sorts standard input by the value of the second character in each line:
#include <iostream> #include <algorithm> #include <vector> #include <string> using namespace std; bool string_compare(string s1, string s2) { if (s1[1] < s2[1]) return true; return false; } main() { vector <string> lines; string s; int i; while (getline(cin, s)) lines.push_back(s); sort(lines.begin(), lines.end(), string_compare); for (i = 0; i < lines.size(); i++) cout << lines[i] << endl; } |
I've highlighted the second characters in blue so that you can see that they are indeed sorted:
UNIX> sortlines2 < input.txt Sarah Charcoal Mia Leah Skied Liam Trisyllable Alexis Kaitlyn Inferential Oliver Celebrate Cole Hunter Ourselves Joseph Dilatory Arianna Exonerate Julian Fluvial Sydney Ingratiate UNIX>
The way that many apps work is diagrammed below:
The app runs on people's phones, and then each app talks with a central server across the Internet (or via a cellular network) using a networking mechanism like TCP/IP sockets. You'll learn how to do that in CS360. The server contains all of the data that gets shared among the apps. When one app updates information, it is updated at the server so that the other apps may have access to it. It's a nice and simple way of getting everything to work.
The apps are written on the platform of the phone (e.g. IOS or Android). The server, though, is often a C++ program running on a Unix platform. Now, Zark is going to write the apps, and he wants you to write the server.
To start with, Bacefook is going to manage people, who have just four attributes:
You'd be surprised at how unsexy a server is -- very often, they read lines of text from their sockets, and write lines of text back to the sockets. And they store data in data structures that you'll learn about in this class.
What we're going to do is write a very primitive server for Zark's app. Instead of using TCP/IP sockets, it's going to read from standard input and write to standard output. We're just going to pretend that standard input comes from cell phones, and standard output goes to cell phones. Again, you'd be surprised at how close to reality this can be.
Our server is going to read lines of text. It will read a "key," and then more information in subsequent lines. Here are the keys that it processes:
Each person is represented with an instance of the Person class, defined below:
class Person { public: string Name; string Mood; Person *InRelationship; vector <Person *> Friends; }; |
You'll note that InRelationship is a pointer to a person, and Friends is a vector of pointers to people. That allows you to access each person's data directly, and when it changes, you'll see the changes.
In reality, our "database" is a hash table that is accessed by names. We'll use djbhash as the hash function, and separate chaining to resolve collisions. Each vector in the hash table will hold a pointer to an instance of a Person class. When we first insert that person into the hash table, that's when we create the instance using new.
Take a look at the function find_person(). It takes a name and a hash table, and finds the person with that name in the hash table. If the person is there, it returns a pointer to the person. If the person is not there, then it looks at the add parameter. If that parameter is true, it creates a new person (with new), and inserts it into the hash table. The person's Mood is "Neutral", and its InRelationship pointer is set to NULL. That is a special pointer value that points to "nothing." If you try to dereference NULL, you'll get a segmentation violation. The person's friend list is empty automatically. The person is returned.
If add is false, then find_person() simply returns NULL.
typedef vector <Person *> PVec; Person *find_person(string &name, vector <PVec> &HT, int add) { int h; int i; Person *p; h = djb_hash(name) % HT.size(); for (i = 0; i < HT[h].size(); i++) { if (HT[h][i]->Name == name) return HT[h][i]; } if (add) { p = new Person; p->Name = name; p->Mood = "Neutral"; p->InRelationship = NULL; HT[h].push_back(p); return p; } else { return NULL; } } |
main(int argc, char **argv) { istringstream iss; // For processing argv[1] string n1, n2, mood; // For reading in names and moods Person *p, *p2; // Pointers to people, that we manipulate or print. int Table_Size; // The table size is read from the command line. vector <PVec> Hash_Table; // The hash table. // Entries are vectors because we use separate chaining int i; // Temporary variable string s; // Temporary variable // -------------------------------------------------- // Process the command line and create the hash table. if (argc != 2) { cerr << "usage: bacefook_server table-size\n"; exit(1); } iss.clear(); iss.str(argv[1]); if (!(iss >> Table_Size) || Table_Size <= 0) { cerr << "usage: bacefook_server table-size\n"; cerr << "bad table-size\n"; exit(1); } Hash_Table.resize(Table_Size); // -------------------------------------------------- // Process the input from standard input. while (getline(cin, s)) { |
Now we process standard input. Let's start with "NEW-PERSON". You'll note, when standard input ends, we simply exit the program. When we read the person's name, we call find_person() to look the person up in the hash table. We set add to zero, so that if the person is not there, find_person() returns NULL, and we can print "UNSUCCESSFUL." Otherwise, we call find_person() with add set to one to create the person and put him/her into the table:
if (s == "NEW-PERSON") { if (!getline(cin, n1)) exit(1); p = find_person(n1, Hash_Table, 0); if (p != NULL) { printf("UNSUCCESSFUL\n"); } else { p = find_person(n1, Hash_Table, 1); printf("SUCCESSFUL\n"); } |
Let's test what we've done so far. At the very least, we should be able to add people to the hash table, and if we try to add the same person twice, we'll be unsuccessful.
UNIX> bacefook_server 100 NEW-PERSON Scarlett O'Hara SUCCESSFUL NEW-PERSON Rhett Butler SUCCESSFUL NEW-PERSON Rhett Butler UNSUCCESSFUL <CNTL-D> UNIX>Let's set moods. This is straightforward. To test it, let's write the part of "QUERY" that prints out the name and mood:
} else if (s == "MOOD") { if (!getline(cin, n1)) exit(1); if (!getline(cin, mood)) exit(1); p = find_person(n1, Hash_Table, 0); if (p == NULL) { printf("UNSUCCESSFUL\n"); } else { p->Mood = mood; printf("SUCCESSFUL\n"); } } else if (s == "QUERY") { if (!getline(cin, n1)) exit(1); p = find_person(n1, Hash_Table, 0); if (p == NULL) { printf("UNSUCCESSFUL\n"); } else { printf("SUCCESSFUL\n"); printf("NAME %s\n", p->Name.c_str()); printf("MOOD %s\n", p->Mood.c_str()); printf("END\n"); } |
Here we'll change Scarlett's mood:
UNIX> bacefook_server 100 NEW-PERSON Scarlett O'Hara SUCCESSFUL QUERY Scarlett O'Hara SUCCESSFUL NAME Scarlett O'Hara MOOD Neutral END MOOD Scarlett O'Hara Impetuous SUCCESSFUL QUERY Scarlett O'Hara SUCCESSFUL NAME Scarlett O'Hara MOOD Impetuous END <CNTL-D> UNIX>Alright -- time for "IN-RELATIONSHIP". Here, we get pointers to the two people, and we set p->InRelationship to p2. This is where it is important to have a pointer. If we didn't use pointers in the hash table, and if p->InRelationship were a Person instead of a pointer to a Person, then setting p->InRelationship would make a copy of whatever is in the hash table.
Here's the code.
} else if (s == "IN-RELATIONSHIP") { if (!getline(cin, n1)) exit(1); if (!getline(cin, n2)) exit(1); p = find_person(n1, Hash_Table, 0); p2 = find_person(n2, Hash_Table, 0); if (p == NULL || p2 == NULL) { printf("UNSUCCESSFUL\n"); } else { p->InRelationship = p2; printf("SUCCESSFUL\n"); } |
And here's the relevant line of "QUERY" -- note how we test p->InRelationship to make sure that we only print out the relationship when it has been explicitly set:
if (p->InRelationship != NULL) { printf("IN-RELATIONSHIP %s : %s\n", p->InRelationship->Name.c_str(), p->InRelationship->Mood.c_str()); } |
To test this, let's change Scarlett's mood after Rhett has set his InRelationship. Because we are using pointers, you see the mood change when you query Rhett:
UNIX> bacefook_server 100 NEW-PERSON Scarlett O'Hara SUCCESSFUL NEW-PERSON Rhett Butler SUCCESSFUL IN-RELATIONSHIP Rhett Butler Scarlett O'Hara SUCCESSFUL QUERY Rhett Butler SUCCESSFUL NAME Rhett Butler MOOD Neutral IN-RELATIONSHIP Scarlett O'Hara : Neutral END MOOD Scarlett O'Hara Impetuous SUCCESSFUL QUERY Rhett Butler SUCCESSFUL NAME Rhett Butler MOOD Neutral IN-RELATIONSHIP Scarlett O'Hara : Impetuous END <CNTL-D> UNIX>Finally, let's add friends. The code for this is very similar to "IN-RELATIONSHIP":
} else if (s == "ADD-FRIEND") { if (!getline(cin, n1)) exit(1); if (!getline(cin, n2)) exit(1); p = find_person(n1, Hash_Table, 0); p2 = find_person(n2, Hash_Table, 0); if (p == NULL || p2 == NULL) { printf("UNSUCCESSFUL\n"); } else { p->Friends.push_back(p2); printf("SUCCESSFUL\n"); } |
And here's the code to print out friends in "QUERY":
for (i = 0; i < p->Friends.size(); i++) { printf("FRIEND %s : %s\n", p->Friends[i]->Name.c_str(), p->Friends[i]->Mood.c_str()); |
Let's test this. I'm going to do a moderate-sized example of Margaret Mitchell's famous love triangle from Gone With The Wind. Afterwards, we're going to take a very detailed look at how memory and the various data structures are laid out. I'm using a small hash table, because I want to be able to look at everything in my pictures.
Go ahead and cut-and-paste the following into bacefook_server 10:
NEW-PERSON Scarlett O'Hara NEW-PERSON Rhett Butler NEW-PERSON Ashley Wilkes NEW-PERSON Melanie Hamilton | ADD-FRIEND Scarlett O'Hara Melanie Hamilton ADD-FRIEND Scarlett O'Hara Ashley Wilkes ADD-FRIEND Melanie Hamilton Scarlett O'Hara | IN-RELATIONSHIP Rhett Butler Scarlett O'Hara IN-RELATIONSHIP Scarlett O'Hara Rhett Butler IN-RELATIONSHIP Ashley Wilkes Melanie Hamilton IN-RELATIONSHIP Melanie Hamilton Ashley Wilkes | MOOD Rhett Butler Amorous MOOD Scarlett O'Hara Petulant, Angry MOOD Ashley Wilkes Limpid and Wimpy MOOD Melanie Hamilton Pious |
If we do some queries on this state, we see all of those inter-relationships:
QUERY Scarlett O'Hara SUCCESSFUL NAME Scarlett O'Hara MOOD Petulant, Angry IN-RELATIONSHIP Rhett Butler : Amorous FRIEND Melanie Hamilton : Pious FRIEND Ashley Wilkes : Limpid and Wimpy END QUERY Rhett Butler SUCCESSFUL NAME Rhett Butler MOOD Amorous IN-RELATIONSHIP Scarlett O'Hara : Petulant, Angry END QUERY Ashley Wilkes SUCCESSFUL NAME Ashley Wilkes MOOD Limpid and Wimpy IN-RELATIONSHIP Melanie Hamilton : Pious END QUERY Melanie Hamilton SUCCESSFUL NAME Melanie Hamilton MOOD Pious IN-RELATIONSHIP Ashley Wilkes : Limpid and Wimpy FRIEND Scarlett O'Hara : Petulant, Angry END |
Now, suppose we want to see what all of these structures look in memory. First, open another window and run djbhash from the hashing lecture notes:
UNIX> djbhash Scarlett O'Hara 1585424537 Rhett Butler 3831887322 Ashley Wilkes 2242265818 Melanie Hamilton 1839425308 <CNTL-D> UNIX>That means that "Scarlett O'Hara" goes into hash table entry 7, "Rhett Butler" goes into hash table entry 2, and both "Ashley Wilkes" and "Melanie Hamilton" go into hash table entry 8. Armed with that knowledge, take a look at the hash table, the four Person class instances, and all of their pointers. There are three places where pointers are used:
Now, let's go back to the bacefook_server program, and change some things. We'll reflect what happens at the end of "Gone with the Wind" -- Melanie dies, Scarlett professes her love to Ashley, Ashley dissolves into a weepy mess while he pronounces his everlasting love Melanie, and Rhett gets pissed:
MOOD Melanie Hamilton Dead SUCCESSFUL IN-RELATIONSHIP Scarlett O'Hara Ashley Wilkes SUCCESSFUL MOOD Ashley Wilkes Weepy and still loving Melanie SUCCESSFUL MOOD Rhett Butler Pissed SUCCESSFUL MOOD Scarlett O'Hara Jilted and angry SUCCESSFULIf you query them, you'll see all of those changes reflected. Here's what the data structures look like now. Again, you should double-check them.