insert(Btree tree, int key, char *record) { int level = 0; PAGE parent = NULL; PAGE node; int i, parent_link; int disk_num, page_num; int num_index_levels; /* get the root node */ parent = make_page_buffer(M_I, INDEX_ENTRY_SIZE, '\0'); node = make_page_buffer(M_I, INDEX_ENTRY_SIZE, '\0');
After allocating the buffers for the parent and node, we read the root page into memory. For simplicity, we will assume that the root page is on disk rather than in main memory. The root page is accessed via the root_disk and root_page_num fields of the B-tree data structure. Both of these fields are integers:
read_disk(tree->root_disk, node, tree->root_page_num);
inspect_disk_page(int disk, int page). Let's say you wanted to see the contents of the root page on the disk. You can either put the inspect_disk_page command in your program or, when in a debugger such as gdb, simply type at the gdb prompt:
(gdb) p inspect_disk_page(tree->root_disk, tree->root_page_num)
You will get a list of the page's contents. For example:
rec record contents ------------------------------------------------------- 0 0 0 5 1 0 3 8 2 1 1 12 3 1 2 18 4 0 1 19 5 1 0 -1
inspect_page(PAGE page_buffer). Once the page has been read from the disk into a page buffer, you can inspect the contents of the page buffer at any time using the inspect_page command. It prints the contents of a page buffer using the same format shown above.
/* save the number of index levels in the tree because it could be increased by a root split */ num_index_levels = tree->num_index_levels; /* search the index and retrieve the node into which the record should be inserted */ while (level < num_index_levels) { /* locate the link to the next node */ for (i = 0; (key >= get_key(node, i)) && (i < node->num_recs-1); i++);
int get_key(PAGE node, int index_num) { read_rec(input_buffer, node, index_num); return atoi(input_buffer->fields[2]); }
The decision not to explicitly pass input_buffer to get_key is somewhat questionable. If we passed it explicitly, a reader of the code would be explicitly reminded that a side effect of get_key is to read an index record into input_buffer. However, input_buffer is used ubiquitously throughout the program, so it seemed better to keep the size of the parameter list down by assuming that input_buffer is the buffer that will receive the index record. There is always a tension between explicitly passing parameters to a function, and hence explicitly showing all the variables from the caller that the function migh manipulate, and keeping the parameter list small by using some global variables. The choice of how to resolve this tension is one that comes with experience.
/* save the disk and page number of the child node */ disk_num = atoi(input_buffer->fields[0]); page_num = atoi(input_buffer->fields[1]); /* split the index node if it is full */ if ((node->num_recs - 1) == tree->max_keys) index_split(parent, parent_link, node, tree, level);
/* exchange the node and parent pointers. The parent pointer will now point to node. The current parent buffer is no longer needed so we can use it to hold node's child */ swap_ptrs(&node, &parent);
Why do you think we passed the addresses of these pointers rather than the pointers themselves? The reason is that C uses call-by-value. If we simply passed the pointers and swapped them using the following code, the swap would be lost as soon as we returned from the swap function:
void swap_ptrs(PAGE node1, PAGE node2) { PAGE tmp; tmp = node1; node1 = node2; node2 = tmp; }
To avoid losing the effect of the swap, we must pass the pointers' addresses. This leads to the following, correct, swap routine:
void swap_ptrs(PAGE *node1, PAGE *node2) { PAGE tmp; tmp = *node1; *node1 = *node2; *node2 = tmp; }
/* read node's child */ read_disk(disk_num, node, page_num); /* saving the link we followed is helpful if we need to split the child */ parent_link = i; level++; } /* end of the while */
/* insert the record into the node, but first, change the formatting information so that the node is treated as a record node rather than an index node */ node->max_recs = M_B; node->rec_length = RECORD_SIZE;
Once the reformatting is accomplished, we insert the record into the node, splitting it if necessary:
if ((node->num_recs - 1) == tree->max_recs) { record_node_split(parent, parent_link, node, tree, key, record, level); } else { insert_rec(node, key, record); write_disk(node); }
Note that we are assuming that record_node_split writes the node out to disk but that insert_rec does not. Hence there is a write_disk call after insert_rec but not after record_node_split.
Finally we clean up the insert routine by destroying the page buffers we've been using and exiting:
/* free the node and parent page_buffers */ destroy_page_buffer(node); destroy_page_buffer(parent); }
/* use insertion sort to insert the record */ insert_rec(PAGE node, int key, char *new_record) { int i; int num_recs; num_recs = node->num_recs; /* use insertion sort to insert the new record */ for (i = num_recs; (i > 1) && (key < get_rec_key(node, i-1)); i--) move_rec(node, i-1, i); write_rec(node, i, new_record); }
Two things should be noted about this code:
The first operation involves allocating a page buffer for the new node and allocating disk space for it. The creation of the page buffer is performed via the disk package's make_page_buffer command. We pass the number of records the new node can contain (M_B rather than M_B-1 because the node contains M_B-1 database records plus 1 for the header node). The allocation of disk space involves a clever use of the mod operator. We keep a running count of the total number of pages in use thus far using a variable called page_count. Each time we want a new page we use the integer divide and mod operators to compute a free disk and page number, then increment the page_count variable so that it points to the next free space:
assign_disk_space(PAGE page_buffer) { page_buffer->disk = page_count / PAGE_LIMIT; page_buffer->page_num = page_count % PAGE_LIMIT; page_count++; }
The code for creating a page buffer for the new node and assigning it disk space can now be written as:
/* create the new node */ new_node = make_page_buffer(M_B, RECORD_SIZE, '\0'); assign_disk_space(new_node);
The second operation of interest is transferring records to the new node. We use two counters to perform this move--one keeps track of our location in the old node and one keeps track of our location in the new node. For no particular reason we move backwards in each node (i.e., start at the end of each node and move toward the beginning):
/* transfer records to the new node */ for (i = num_recs_to_move, j = node->num_recs-1; i >= 1; i--, j--) { read_rec(input_buffer, node, j); delete_rec(node, j); write_rec(new_node, i, input_buffer->text1); }
Two points should be made about this code: