#include "jrb.h",and then compile the program with:
gcc -I/blugreen/homes/plank/cs360/includeWhen you link your object files to make an executable, follow the directions in the libfdr lecture notes.
The makefile in this directory does both of these things for you.
The main struct for rb-trees is the JRB. Like dllists, all rb-trees have a header node. You create a rb-tree by calling make_jrb(), which returns a pointer to the header node of an empty rb-tree. This header points to the main body of the rb-tree, which you don't need to care about, and to the first and last external nodes of the tree. These external nodes are hooked together with flink and blink pointers, so that you can view rb-trees as being dlists with the property that they are sorted, and you can find any node in the tree in O(log(n)) time.
Like dllists, each node in the tree has a val field, which is a Jval. Additionally, each node has a key field, which is also a Jval. The rb-tree tree makes sure that the keys are sorted. How they are sorted depends on the tree.
Note that it returns a pointer to the new jrb tree node. Also note that if the key is already in the tree, then it still creates a new node and puts it into the tree. No guarantees are made concerning the relative ordering of duplicate keys.
Even though the key is a string, it will be converted into a Jval in the new node. Thus, if you want to get at the key of node s, you should either use jval_s(s->key) or s->key.s.
This lets you do more sophisticated things than simply sorting with integers and strings. For example, strisort.c sorts strings but ignores case. strrsort2.c sorts strings in reverse order. Read these over.
To find keys, you use one of jrb_find_str(), jrb_find_int(), jrb_find_dbl() or jrb_find_gen(). Obviously, if you inserted keys with jrb_insert_str(), then you should use jrb_find_str() to find them. If the key that you're looking for is not in the tree, then jrb_find_xxx() returns NULL.
Finally, there are also: jrb_find_gte_str(), jrb_find_gte_int(), jrb_find_gte_dbl() and jrb_find_gte_gen(). These return the jrb tree node whose key is either equal to the specified key, or whose key is the smallest one greater than the specified key. If the specified key is greater than any in the tree, it will return a pointer to the sentinel node. It has an argument found that is set to tell you if the key was found or not.
A second way to do this is to have a two-level tree. The first tree has integers as keys and is based on the atoi() value of each line. The val field of each node, however, is another red-black tree. This red-black tree contains each line whose atoi() value is equal to the key of the node, sorted lexicographically. Thus, when you read a line, you first see if its atoi() value is in the tree. If so, you get a pointer to the val field of that node. Of not, you insert a new node into the tree whose key is the atoi(), and whose val field is a new, empty red-black tree. Now, you have a pointer to the red-black tree in the val field of the node whose key is the atoi() value of the string. What you do now is insert the string into this second red-black tree using jrb_insert_str(). When you're done, you have a big two-level red-black tree. You traverse it by traversing the top level tree, and for each node in that tree, you traverse the tree in its val field and print out the strings. See the code. It is in nsort3.c.
Name sunday-score F total-scoreFor example, the first few lines of 1999_Majors/Masters are:
Jose Maria Olazabal -1 F -8 Davis Love III -1 F -6 Greg Norman +1 F -5 Bob Estes +0 F -4 Steve Pate +1 F -4 David Duval -2 F -3 Phil Mickelson -1 F -3 ...Note that the name can have any number of words.
Now, suppose that we want to do some data processing on these files. For example, suppose we'd like to sort each player so that we first print out the players that have played the most tournaments, and then within that, we sort by the player with the lowest average score.
This is what golf.c does. It takes score files on the command line, then reads in all the players and scores. Then it sorts them by number of tournaments/average score, and prints them out in that order, along with their score for each tournament. For example, look at score1:
Jose Maria Olazabal -1 F -8 Davis Love III -1 F -6 Greg Norman +1 F -5and score2:
Greg Norman +1 F +9 David Frost +3 F +10 Davis Love III -2 F +11The golf program reads in these two files, and ranks the four players by number of tournaments, and then average score:
UNIX> golf score1 score2 Greg Norman : 2 tournaments : 2.00 -5 : score1 9 : score2 Davis Love III : 2 tournaments : 2.50 -6 : score1 11 : score2 Jose Maria Olazabal : 1 tournament : -8.00 -8 : score1 David Frost : 1 tournament : 10.00 10 : score2
Ok, now how does golf work? Well it works in three phases. In the first phase, it reads the input files to create a struct for each golfer. The data structure for this is a red-black tree keyed on the golfer's name, and whose val fields are Golfer structs that have the following defintion:
typedef struct { char *name; int ntourn; int tscore; Dllist scores; } Golfer;The first three fields are obvious. The last field is a list of the golfer's scores. Each element of the list points to a Score struct with the following definition:
typedef struct { char *tname; /* File name */ int score; /* Total score */ } Score;Note, in each file, we are going to ignore the ``sunday score.''
So, to read in the golfers, we create the jrb tree golfers, and then read in each line of each input file. For each line, we construct the golfer's name, and then we look to see if the golfer has an entry in the golfers tree. If there is no such entry, then one is created. Once the entry is found/created, the score for that file is added. When all the files have been read, phase 1 is completed:
Golfer *g; Score *s; JRB golfers, rnode; int i, fn; int tmp; IS is; char name[1000]; Dllist dnode; golfers = make_jrb(); for (fn = 1; fn < argc; fn++) { is = new_inputstruct(argv[fn]); if (is == NULL) { perror(argv[fn]); exit(1); } while(get_line(is) >= 0) { /* Error check each line */ if (is->NF < 4 || strcmp(is->fields[is->NF-2], "F") != 0 || sscanf(is->fields[is->NF-1], "%d", &tmp) != 1 || sscanf(is->fields[is->NF-3], "%d", &tmp) != 1) { fprintf(stderr, "File %s, Line %d: Not the proper format\n", is->name, is->line); exit(1); } /* Construct the golfer's name */ strcpy(name, is->fields[0]); for (i = 1; i < is->NF-3; i++) { strcat(name, " "); strcat(name, is->fields[i]); } /* Search for the name */ rnode = jrb_find_str(golfers, name); /* Create an entry if none exists. */ if (rnode == NULL) { g = (Golfer *) malloc(sizeof(Golfer)); g->name = strdup(name); g->ntourn = 0; g->tscore = 0; g->scores = new_dllist(); jrb_insert_str(golfers, g->name, new_jval_v(g)); } else { g = (Golfer *) rnode->val.v; } /* Add the information to the golfer's struct */ s = (Score *) malloc(sizeof(Score)); s->tname = argv[fn]; s->score = atoi(is->fields[is->NF-1]); g->ntourn++; g->tscore += s->score; dll_append(g->scores, new_jval_v(s)); } /* Go on to the next file */ jettison_inputstruct(is); }Now, this gives us all the information on the golfers, but they are sorted by the golfers' names, not by number of tournaments / average score. Thus, in phase 2, we construct a second red-black tree which will sort the golfers correctly. To do this, we need to construct our own comparison function that compares golfers by number of tournaments / average score. Here is the comparison function:
int golfercomp(Jval j1, Jval j2) { Golfer *g1, *g2; g1 = (Golfer *) j1.v; g2 = (Golfer *) j2.v; if (g1->ntourn > g2->ntourn) return 1; if (g1->ntourn < g2->ntourn) return -1; if (g1->tscore < g2->tscore) return 1; if (g1->tscore > g2->tscore) return -1; return 0; }And here is the part of main where the second red-black tree is built:
sorted_golfers = make_jrb(); jrb_traverse(rnode, golfers) { jrb_insert_gen(sorted_golfers, rnode->val, JNULL, golfercomp); }Note, you pass a Jval to jrb_insert_gen.
Finally, the third phase is to traverse the sorted_golfers tree, printing out the correct information for each golfer. This is straightforward, and done below:
jrb_rtraverse(rnode, sorted_golfers) { g = (Golfer *) rnode->key.v; printf("%-40s : %3d tournament%1s : %7.2f\n", g->name, g->ntourn, (g->ntourn == 1) ? "" : "s", (float) g->tscore / (float) g->ntourn); dll_traverse(dnode, g->scores) { s = (Score *) dnode->val.v; printf(" %3d : %s\n", s->score, s->tname); } }Try it out. You'll see that Tiger Woods did the best in all four majors this year:
UNIX> golf 1999_Majors/* Tiger Woods : 4 tournaments : 0.25 10 : 1999_Majors/British_Open 1 : 1999_Majors/Masters -11 : 1999_Majors/PGA_Champ 1 : 1999_Majors/US_Open Colin Montgomerie : 4 tournaments : 3.75 12 : 1999_Majors/British_Open -1 : 1999_Majors/Masters -6 : 1999_Majors/PGA_Champ 10 : 1999_Majors/US_Open Davis Love III : 4 tournaments : 4.50 10 : 1999_Majors/British_Open -6 : 1999_Majors/Masters 5 : 1999_Majors/PGA_Champ 9 : 1999_Majors/US_Open Jim Furyk : 4 tournaments : 4.50 11 : 1999_Majors/British_Open 0 : 1999_Majors/Masters -4 : 1999_Majors/PGA_Champ 11 : 1999_Majors/US_Open Nick Price : 4 tournaments : 4.75 17 : 1999_Majors/British_Open -3 : 1999_Majors/Masters -7 : 1999_Majors/PGA_Champ 12 : 1999_Majors/US_Open ...