1986 Kurt Russell - Jack Burton Kim Cattrall - Gracie Law Dennis Dun - Wang Chi James Hong - Lo Pan Victor Wong - Egg Shen ... |
All the files are in this directory (I want it to be known that this is just a contrived example. The only Kurt Russell DVD I own is Big-Trouble-In-Little-China, one of the best movies of all times, BTW):
UNIX> ls *.txt 3000-Miles-to-Graceland.txt Miracle.txt Amber-Waves.txt Overboard.txt Backdraft.txt Poseidon.txt Best-Of-Times.txt Silkwood.txt Big-Trouble-In-Little-China.txt Sky-High.txt Breakdown.txt Soldier.txt Captain-Ron.txt Stargate.txt Dark-Blue.txt Swing-Shift.txt Death-Proof.txt Tango-And-Cash.txt Dreamer.txt Tequila-Sunrise.txt Escape-From-L.A..txt The-Thing.txt Escape-From-New-York.txt Tombstone.txt Executive-Decision.txt Unlawful-Entry.txt Interstate.txt Used-Cars.txt Jiminy-Glick-in-La-La-Wood.txt Vanilla-Sky.txt Mean-Season.txt Winter-People.txt UNIX>For each movie, there are three pieces of information -- the title, which is in the filename, the year, which is in the first line of the file, and the actors/roles, which are in the remaining lines. We are going to write a program that reads in all of this information into three sets of data structures:
We are then going to print out each set of data structures. The point of this exercise is to get further practice with maps, and also to see how to get data structures to point to each other.
We'll build the program incrementally. Our first pass is in kurtproc1.cpp. This defines all of the data structures that we will use:
#include <iostream> #include <map> #include <fstream> #include <string> #include <cstdlib> using namespace std; class Actor { public: string name; map <string, class Movie *> movies; map <int, class Year *> years; }; class Movie { public: string name; int year; map <string, Actor *> actors; }; class Year { public: int year; map <string, Movie *> movies; map <string, Actor *> actors; }; |
These are straightforward, however there is a quirk. The Actor class has a map of Movie's, but we haven't defined an Movie yet. This is a forward reference. To deal with this, we say "class Movie *" instead of "Movie *." This tells the compiler that we will define a Movie later. to struct actor. We also do that with the Year class. There's no way we can avoid doing this, as each data structure points to the others.
Note also that all of the maps contain pointers to the various classes rather than instances of the classes themselves. Why? Because if we didn't use pointers, each map would hold a copy of each class instance, and the maps would not point to the same things. By using a pointer, if a map cointains a pointer to the actor Kurt Russell, that is the same pointer that all of the other maps use.
Our first pass doesn't deal with the data structures -- it just deals with constructing the movie's name from its filename:
main(int argc, char **argv) { ifstream fin; int i, j; string s; for (i = 1; i < argc; i++) { fin.open(argv[i]); if (fin.fail()) { cerr << "Problem opening " << argv[i] << endl; exit(1); } s = argv[i]; j = s.find(".txt"); if (j == string::npos) { cerr << "File does not have a .txt extension: " << s << endl; exit(1); } s.resize(j); for (j = 0; j < s.length(); j++) { if (s[j] == '-') s[j] = ' '; } cout << s << endl; fin.close(); fin.clear(); } exit(0); } |
For each file, we substitute a space for each hyphen, and then we strip out the ".txt" suffix. To do that, we use the find() method of the string class to get the index of the ".txt" substring. Then we resize the string, which cuts off the ".txt". After that, we print it out.
We also open and close each file. One quirk of many C++ implementations is that we have to call fin.clear() after closing the file . We have to do this because some C++ implementations don't clear the fin.fail() flag when they close a file, and if you don't call fin.clear(), the subsequent fin.open() call will fail. Remember this, because it will come up from time to time, and it's really irritating.
In class, I showed how you can double-check the output to make sure it is correct. I did the following sequence of operations:
UNIX> kurtproc1 *.txt | head 3000 Miles to Graceland Amber Waves Backdraft Best Of Times Big Trouble In Little China Breakdown Captain Ron Dark Blue Death Proof Dreamer UNIX> ls *.txt | sed 's/-/ /g' | sed 's/.txt//' | head 3000 Miles to Graceland Amber Waves Backdraft Best Of Times Big Trouble In Little China Breakdown Captain Ron Dark Blue Death Proof Dreamer UNIX> kurtproc1 *.txt > output1 UNIX> ls *.txt | sed 's/-/ /g' | sed 's/.txt//' > output2 UNIX> diff output1 output2 UNIX>I'll leave it to you to read the sed man page - I am using the command to convert all hyphens to spaces and strip out the ".txt" suffix. I then compare the output of that to the output of kurtproc1. They are the same, so I am satisfied.
int year; |
And in the movie file reading:
fin >> year; if (fin.fail()) { cerr << "The first line of " << s << " should be the year\n"; exit(1); } cout << "Movie: " << s << ". Year: " << year << endl; while (!fin.fail()) { getline(fin, s); if (!fin.fail()) { j = s.find(" - "); if (j == string::npos) { cerr << "Actor specifications should be actor name '-' role name\n"; cerr << "S: " << s << endl; exit(1); } while (s[j] == ' ') j--; s.resize(j+1); cout << "Actor: " << s << endl; } } |
Our first test reveals a bug:
UNIX> kurtproc2 *.txt Movie: 3000 Miles to Graceland. Year: 2001 Actor specifications should be actor name '-' role name S: UNIX>It appears that s is an empty string. Why? Because after reading the year, the fin pointer is to the end of the first line, meaning that the getline() call reads the empty string at the end of the first line. We fix this in kurtproc3.cpp, which gets rid of the end of the first line:
... fin >> year; if (fin.fail()) { cerr << "The first line of " << s << " should be the year\n"; exit(1); } cout << "Movie: " << s << ". Year: " << year << endl; getline(fin, s); // Just one new line of code. ... |
Now, it works like it should. Eyeballing the output, it seems to be working fine:
UNIX> kurtproc3 *.txt | head -n 20 Movie: 3000 Miles to Graceland. Year: 2001 Actor: Kurt Russell Actor: Kevin Costner Actor: Courteney Cox Arquette Actor: Christian Slater Actor: Kevin Pollak Actor: David Arquette Actor: Jon Lovitz Actor: Howie Long Actor: Thomas Haden Church Actor: Bokeem Woodbine Actor: Ice-T Actor: David Kaye Actor: Louis Lombardi Actor: Shawn Michael Howard Actor: Peter Kent Actor: Robert "Bobby Z" Zajonc Movie: Amber Waves. Year: 1982 Actor: Fran Brill Actor: Wilford Brimley UNIX> | UNIX> cat *.txt | head -n 20 2001 Kurt Russell - Michael Zane Kevin Costner - Thomas Murphy Courteney Cox Arquette - Cybil Waingrow Christian Slater - Joseph Hanson Kevin Pollak - Federal Marshal Damitry David Arquette - Gus Watson Jon Lovitz - Jay Peterson Howie Long - Jack Thomas Haden Church - Federal Marshal Quigley Bokeem Woodbine - Benjamin Franklin Ice-T - Hamilton David Kaye - Jesse Waingrow Louis Lombardi - Otto Sinclair Shawn Michael Howard - Roller Elvis Peter Kent - SWAT Leader Robert "Bobby Z" Zajonc - Helicoptor Pilot 1982 Fran Brill - Suze Winter Wilford Brimley - Pete Alberts UNIX> |
main(int argc, char **argv) { ifstream fin; int i, j; string s; int year; Movie *m; map <string, Movie *> movies; map <string, Movie *>::iterator mit; |
Here is the code that creates the instance of the movie class and inserts it into the new map:
m = new Movie; m->name = s; fin >> year; if (fin.fail()) { cerr << "The first line of " << s << " should be the year\n"; exit(1); } getline(fin, s); m->year = year; movies.insert(make_pair(m->name, m)); |
And here is the code that prints out the movies after reading everything:
for (mit = movies.begin(); mit != movies.end(); mit++) { m = mit->second; cout << "Movie: " << m->name << ". Year: "<< m->year << ".\n"; } |
When we run it, all looks good:
UNIX> kurtproc4 *.txt | head Movie: 3000 Miles to Graceland. Year: 2001. Movie: Amber Waves. Year: 1982. Movie: Backdraft. Year: 1991. Movie: Best Of Times. Year: 1986. Movie: Big Trouble In Little China. Year: 1986. Movie: Breakdown. Year: 1997. Movie: Captain Ron. Year: 1992. Movie: Dark Blue. Year: 2003. Movie: Death Proof. Year: 2007. Movie: Dreamer. Year: 2005. UNIX>
Once we have an Actor * for the actor, we will insert the actor into the movie's actor map, and we will insert the movie into the actor's movie map.
The changes are in kurtproc5.cpp: Here are the variables:
Actor *a; map <string, Actor *> actors; map <string, Actor *>::iterator ait; |
Here is the new code where we read actors:
/* Read the actors */ while (!fin.fail()) { getline(fin, s); if (!fin.fail()) { j = s.find(" - "); if (j == string::npos) { cerr << "Actor specifications should be actor name '-' role name\n"; cerr << "S: " << s << endl; exit(1); } while (s[j] == ' ') j--; s.resize(j+1); /* Check the actors map to see if the actor exists already. If not, create a new actor and put it into the map. */ ait = actors.find(s); if (ait == actors.end()) { a = new Actor; a->name = s; actors.insert(make_pair(a->name, a)); } else { a = ait->second; } /* Put movies & actors into each others' maps. */ m->actors.insert(make_pair(a->name, a)); a->movies.insert(make_pair(m->name, m)); } } |
And here's the code at the end that prints out each movie, followed by its actors sorted by name:
/* Print out the movies */ for (mit = movies.begin(); mit != movies.end(); mit++) { m = mit->second; cout << "Movie: " << m->name << ". Year: "<< m->year << ".\n"; for (ait = m->actors.begin(); ait != m->actors.end(); ait++) { a = ait->second; cout << " Actor: " << a->name << endl; } cout << endl; } |
A quick scan of some output looks good:
UNIX> kurtproc5 *.txt | head Movie: 3000 Miles to Graceland. Year: 2001. Actor: Bokeem Woodbine Actor: Christian Slater Actor: Courteney Cox Arquette Actor: David Arquette Actor: David Kaye Actor: Howie Long Actor: Ice-T Actor: Jon Lovitz Actor: Kevin Costner UNIX> kurtproc5 *.txt | sed -n '/Vanilla Sky/,/^$/p' Movie: Vanilla Sky. Year: 2001. Actor: Alicia Witt Actor: Cameron Diaz Actor: Jason Lee Actor: Johnny Galecki Actor: Kurt Russell Actor: Michael Shannon Actor: Noah Taylor Actor: Penelope Cruz Actor: Tilda Swinton Actor: Timothy Spall Actor: Tom Cruise UNIX> |
UNIX> sort 3000-Miles-to-Graceland.txt | head 2001 Bokeem Woodbine - Benjamin Franklin Christian Slater - Joseph Hanson Courteney Cox Arquette - Cybil Waingrow David Arquette - Gus Watson David Kaye - Jesse Waingrow Howie Long - Jack Ice-T - Hamilton Jon Lovitz - Jay Peterson Kevin Costner - Thomas Murphy UNIX> sort Vanilla-Sky.txt 2001 Alicia Witt - Libby Cameron Diaz - Julie Gianni Jason Lee - Brian Shelby Johnny Galecki - Peter Brown Kurt Russell - McCabe Michael Shannon - Aaron Noah Taylor - Edmund Ventura Penelope Cruz - Sofia Serrano Tilda Swinton - Rebecca Dearborn Timothy Spall - Thomas Tipp Tom Cruise - David Aames UNIX> |
/* Print out the actors */ for (ait = actors.begin(); ait != actors.end(); ait++) { a = ait->second; cout << "Actor: " << a->name << ". # Movies: " << a->movies.size() << ".\n"; } |
Again, we can check some of the output to make sure that it is ok:
UNIX> kurtproc6 *.txt | head Actor: A.J. Langer. # Movies: 1. Actor: Adam Tomei. # Movies: 1. Actor: Adrienne Barbeau. # Movies: 1. Actor: Al Cerullo. # Movies: 1. Actor: Al Leong. # Movies: 1. Actor: Al Lewis. # Movies: 1. Actor: Alan Davidson. # Movies: 1. Actor: Alan Toy. # Movies: 1. Actor: Alana Stewart. # Movies: 1. Actor: Alexis Cruz. # Movies: 1. UNIX> kurtproc6 *.txt | grep 'Kurt Russell' Actor: Kurt Russell. # Movies: 32. UNIX> kurtproc6 *.txt | wc 911 5573 31877 UNIX> kurtproc6 *.txt | awk '{ n += $NF; print n}' | tail -n 1 977 UNIX> |
UNIX> cat *.txt | sed -n 's/ *- .*//p' | sort | head A.J. Langer Adam Tomei Adrienne Barbeau Al Cerullo Al Leong Al Lewis Alan Davidson Alan Toy Alana Stewart Alexis Cruz UNIX> grep 'Kurt Russell' *.txt | wc 32 160 1501 UNIX> cat *.txt | sed -n 's/ *- .*//p' | sort -u | wc 911 1929 12745 UNIX> cat *.txt | grep ' - ' | wc 977 4919 29219 UNIX> |
#include <iostream> #include <map> #include <fstream> #include <string> #include <cstdlib> using namespace std; class Actor { public: string name; map <string, class Movie *> movies; map <int, class Year *> years; }; class Movie { public: string name; int year; map <string, Actor *> actors; }; class Year { public: int year; map <string, Movie *> movies; map <string, Actor *> actors; }; main(int argc, char **argv) { ifstream fin; int i, j; string s; int year; Movie *m; map <string, Movie *> movies; map <string, Movie *>::iterator mit; Actor *a; map <string, Actor *> actors; map <string, Actor *>::iterator ait; Year *y; map <int, Year *> years; map <int, Year *>::iterator yit; for (i = 1; i < argc; i++) { /* Open the movie file */ fin.open(argv[i]); if (fin.fail()) { cerr << "Problem opening " << argv[i] << endl; exit(1); } /* Construct the movie's name from the file name */ s = argv[i]; j = s.find(".txt"); if (j == string::npos) { cerr << "File does not have a .txt extension: " << s << endl; exit(1); } s.resize(j); for (j = 0; j < s.length(); j++) { if (s[j] == '-') s[j] = ' '; } /* Create the movie instance, read the year and insert the movie */ m = new Movie; m->name = s; fin >> year; if (fin.fail()) { cerr << "The first line of " << s << " should be the year\n"; exit(1); } getline(fin, s); m->year = year; movies.insert(make_pair(m->name, m)); /* Find/Create the year and add the movie to the year map */ yit = years.find(year); if (yit == years.end()) { y = new Year; y->year = year; years.insert(make_pair(year, y)); } else { y = yit->second; } y->movies.insert(make_pair(m->name, m)); /* Read the actors */ while (!fin.fail()) { getline(fin, s); if (!fin.fail()) { j = s.find(" - "); if (j == string::npos) { cerr << "Actor specifications should be actor name '-' role name\n"; cerr << "S: " << s << endl; exit(1); } while (s[j] == ' ') j--; s.resize(j+1); /* Check the actors map to see if the actor exists already. If not, create a new actor and put it into the map. */ ait = actors.find(s); if (ait == actors.end()) { a = new Actor; a->name = s; actors.insert(make_pair(a->name, a)); } else { a = ait->second; } /* Put movies & actors into each others' maps. */ m->actors.insert(make_pair(a->name, a)); a->movies.insert(make_pair(m->name, m)); /* Put years & actors into each others' maps. Duplicates will be ignored, but that's ok */ y->actors.insert(make_pair(a->name, a)); a->years.insert(make_pair(year, y)); } } fin.close(); fin.clear(); } /* Print out the movies */ /* for (mit = movies.begin(); mit != movies.end(); mit++) { m = mit->second; cout << "Movie: " << m->name << ". Year: "<< m->year << ".\n"; for (ait = m->actors.begin(); ait != m->actors.end(); ait++) { a = ait->second; cout << " Actor: " << a->name << endl; } cout << endl; } */ /* Print out the actors */ for (ait = actors.begin(); ait != actors.end(); ait++) { a = ait->second; cout << "Actor: " << a->name << ". # Movies: " << a->movies.size() << ". Years:"; for (yit = a->years.begin(); yit != a->years.end(); yit++) { cout << " " << yit->first; } cout << ".\n"; } /* Print out years: */ for (yit = years.begin(); yit != years.end(); yit++) { y = yit->second; cout << "Year: " << y->year << ". Actors: " << y->actors.size() << ".\n"; for (mit = y->movies.begin(); mit != y->movies.end(); mit++) { m = mit->second; cout << " Movie: " << m->name << ".\n"; } } } |
We can sanity check this a little:
UNIX> kurtproc7 *.txt | head Actor: A.J. Langer. # Movies: 1. Years: 1996. Actor: Adam Tomei. # Movies: 1. Years: 2005. Actor: Adrienne Barbeau. # Movies: 1. Years: 1981. Actor: Al Cerullo. # Movies: 1. Years: 1981. Actor: Al Leong. # Movies: 1. Years: 1986. Actor: Al Lewis. # Movies: 1. Years: 1980. Actor: Alan Davidson. # Movies: 1. Years: 2003. Actor: Alan Toy. # Movies: 1. Years: 1984. Actor: Alana Stewart. # Movies: 1. Years: 1984. Actor: Alexis Cruz. # Movies: 1. Years: 1994. UNIX> kurtproc7 *.txt | tail Year: 2004. Actors: 45. Movie: Jiminy Glick in La La Wood. Movie: Miracle. Year: 2005. Actors: 47. Movie: Dreamer. Movie: Sky High. Year: 2006. Actors: 29. Movie: Poseidon. Year: 2007. Actors: 31. Movie: Death Proof. UNIX> grep 2004 *.txt Jiminy-Glick-in-La-La-Wood.txt:2004 Miracle.txt:2004 UNIX> grep 2005 *.txt Dreamer.txt:2005 Sky-High.txt:2005 UNIX> kurtproc7 *.txt | grep 'Kurt Russell' Actor: Kurt Russell. # Movies: 32. Years: 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1991 1992 1993 1994 1996 1997 1998 2001 2002 2003 2004 2005 2006 2007. UNIX> kurtproc7 *.txt | grep 'Kurt Russell' | sed 's/.*Years://' 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1991 1992 1993 1994 1996 1997 1998 2001 2002 2003 2004 2005 2006 2007. UNIX> kurtproc7 *.txt | grep 'Kurt Russell' | sed 's/.*Years://' | awk '{ print NF }' 24 UNIX> cat *.txt | grep '^....$' | sort -u | wc 24 24 120 UNIX>