This is a three-week lab, and it is every bit of a three week lab. It requires attention to detail and disciplined testing. It is the hardest lab that I give, and for that reason, you should allocate time to do it. Don't think that because you are a better programmer than your friends, that you can do this lab in a day or two. I have seen too many excellent students flail on this lab because they underestimated the time to do it. I mean really good students!
On the flip side, I think that this is the most rewarding lab that I give. When you finish, and your B-tree is working as it should, it's a really great feeling!
Some completion stats:
You are going to implement the "B+ Tree" variant, where internal nodes only hold keys and pointers to other nodes, and external nodes hold keys and pointers to values. One change from the St.Vincent description is that the value of a key in an internal node is going to be held in the largest val pointer in the key's predecessor subtree. Let me show an example, lifted from their notes:
In their example, the val pointer for J is the pointer to the left of K. In our trees, the val pointer for J will be the pointer to the right of I. Similarly:
The last page of B-Tree.pdf also has a tree pictured with where its val pointers should be pointing.
UNIX> cp -r /home/jplank/cs494/labs/Lab-3-B-Tree/src . UNIX> cp -r /home/jplank/cs494/labs/Lab-3-B-Tree/include . UNIX> cp /home/jplank/cs494/labs/Lab-3-B-Tree/makefile . UNIX> cp /home/jplank/cs494/labs/Lab-3-B-Tree/tree* . UNIX> mkdir obj UNIX> mkdir binYou are now ready to start the lab.
#ifndef _BP_TREE_ #define _BP_TREE_ #include "jdisk.h" void *b_tree_create(char *filename, long size, int key_size); void *b_tree_attach(char *filename); unsigned int b_tree_insert(void *b_tree, void *key, void *record); unsigned int b_tree_find(void *b_tree, void *key); void *b_tree_disk(void *b_tree); int b_tree_key_size(void *b_tree); void b_tree_print_tree(void *b_tree); #endif |
What you are going to do is implement B-trees on top of jdisks. The keys will be buffers of exactly key_size bytes, which is defined when you create a btree. The vals will be buffers of exactly JDISK_SECTOR_SIZE bytes. Each node of the tree will fit into a sector of the disk.
The jdisks must have a specific format. That means that the jdisks that your programs create must be readable by my btree programs and vice versa (so long as they have the same sizes for longs and the same endian-ness). They don't have to be identical to mine. They just have to work. Here is the format:
UNIX> ls -l tree-1.jdisk -rw-r--r--. 1 jplank jplank 2048000 Sep 18 19:46 tree-1.jdisk UNIX> xxd -g 1 -len 16 tree-1.jdisk 0000000: 17 00 00 00 29 00 00 00 f1 01 00 00 00 00 00 00 ....)........... # Remember, you need to view these groupings of bytes as numbers in little endian. UNIX> xxd -g 4 -e -len 16 tree-1.jdisk # Or if you have -e, that makes life easier. 00000000: 00000017 00000029 000001f1 00000000 ....)........... UNIX>The file is roughly 2MB, composed of 2,000 sectors. When we look at the first 16 bytes, we see the numbers in little endian format (which is the format of our current Intel processors). The key size is 0x17, or 23 bytes. The LBA of the root node is 0x29, and the first free sector is LBA 0x1f1 = 497.
The next MAXKEY * key_size bytes are the keys. Then there can be some wasted bytes. The last (MAXKEY+1)*4 bytes are the LBA's, which are the pointers of the B-Trees. If the node is internal, then they are the LBA's of nodes that are pointed to by the node. If the node is external, then they are the LBA's of vals. If there are nkeys keys in the node, then there are nkeys+1 LBA's.
UNIX> ls -l tree-2.jdisk -rw-r--r--. 1 jplank jplank 20480 Sep 18 19:46 tree-2.jdisk UNIX> xxd -g 1 -len 16 tree-2.jdisk 0000000: c8 00 00 00 08 00 00 00 0c 00 00 00 00 00 00 00 ................ UNIX> xxd -g 4 -e -len 16 tree-2.jdisk 00000000: 000000c8 00000008 0000000c 00000000 ................ UNIX>This is a file with 20 sectors, of which 12 (0x0c) are currently in use. The key size is 200 (0xc8). The LBA of the root node is 8. Let's take a look at the first 202 bytes of that block:
UNIX> echo '8 1024 * p' | dc 8192 UNIX> xxd -g 1 -s 8192 -len 202 tree-2.jdisk 0002000: 01 01 45 6c 69 00 00 00 00 00 00 00 00 00 00 00 ..Eli........... 0002010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0002020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0002030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0002040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0002050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0002060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0002070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0002080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0002090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00020a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00020b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00020c0: 00 00 00 00 00 00 00 00 00 00 .......... UNIX>So, this is an internal node, because the first byte is one. It holds one key, because the second byte is also one. The first key is in the next 200 bytes -- as you see, they are all zeros except for the first three. I know they hold a string (because I created this file), and you can see from the output to xxd, that the string is "Eli."
Just a note here about keys that are strings, and the program bin/jdisk_test, if you want to use it instead of or in addition to xxd. Since the strings are null terminated, jdisk_test will print the string, regardless of whether you specify 3 characters or 200. In the example below, jdisk_test is reading 200 characters, but since the fourth character is '\0', it only prints out "Eli".
UNIX> bin/jdisk_test R tree-2.jdisk string 8194 3 Eli UNIX> bin/jdisk_test R tree-2.jdisk string 8194 200 Eli UNIX>
So, there is one key, which means that there are two pointers out of the node. Each of these pointers is an LBA of the sector holding the node to which the pointer points. How do we find these LBA's? Well, first, let's figure out what MAXKEY is: (1024 - 6) / (200+4) = 4.99. That means MAXKEY is four (and there is quite a bit of wasted space in our nodes. So it goes). Since a node can hold four keys, it can hold 5 LBA pointers. Those are in the last 5*4=20 bytes of the sector. Let's look at them:
UNIX> echo 8192+1024-20 | bc 9196 UNIX> xxd -g 1 -s 9196 -len 20 tree-2.jdisk 00023ec: 01 00 00 00 07 00 00 00 00 00 00 00 00 00 00 00 ................ 00023fc: 00 00 00 00 .... UNIX> xxd -g 4 -e -s 9196 -len 20 tree-2.jdisk 000023ec: 00000001 00000007 00000000 00000000 ................ 000023fc: 00000000 .... UNIX>So, the first pointer is to block 1, and then second is to block 7. Let's look at block 1:
UNIX> xxd -g 1 -s 1024 -len 32 tree-2.jdisk 0000400: 00 04 41 6c 65 78 69 73 00 00 00 00 00 00 00 00 ..Alexis........ 0000410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ UNIX>It is an external node that holds four keys. The keys will start at bytes 1026 (1024+2), 1226 (1024+2+200), 1426 (1024+2+400) and 1626 (1024+2+600). You can see from xxd that the first key is "Alexsis". Here are the other three keys:
UNIX> xxd -g 1 -s 1226 -len 16 tree-2.jdisk 00004ca: 41 6c 6c 69 73 6f 6e 00 00 00 00 00 00 00 00 00 Allison......... UNIX> xxd -g 1 -s 1426 -len 16 tree-2.jdisk 0000592: 43 61 6c 65 62 00 00 00 00 00 00 00 00 00 00 00 Caleb........... UNIX> xxd -g 1 -s 1626 -len 16 tree-2.jdisk 000065a: 44 61 6e 69 65 6c 00 00 00 00 00 00 00 00 00 00 Daniel.......... UNIX>Nice -- this is looking like the keys are all string-based (however, they are still 200 characters -- it just so happens that all of the characters after a string are byte 0x0).
Now, let's look at the LBA's, which will start 20 bytes from the end of the sector:
UNIX> xxd -g 1 -s 2028 -len 20 tree-2.jdisk 00007ec: 04 00 00 00 0b 00 00 00 0a 00 00 00 02 00 00 00 ................ 00007fc: 06 00 00 00 .... UNIX>These are the vals:
UNIX> xxd -g 1 -s 4096 -len 1024 tree-2.jdisk 0001000: 47 79 72 6f 73 63 6f 70 65 00 00 00 00 00 00 00 Gyroscope....... 0001010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0001020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ # ..... Lots of lines of zeros 00013f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ UNIX> echo '1024 * 11' | bc 11264 UNIX> xxd -s 11264 -len 16 -g 1 tree-2.jdisk 0002c00: 45 6d 62 65 6c 6c 69 73 68 00 00 00 00 00 00 00 Embellish....... UNIX> xxd -s 10240 -len 16 -g 1 tree-2.jdisk 0002800: 53 75 64 61 6e 65 73 65 00 00 00 00 00 00 00 00 Sudanese........ UNIX> xxd -s 2048 -len 16 -g 1 tree-2.jdisk 0000800: 53 68 61 64 6f 77 79 00 00 00 00 00 00 00 00 00 Shadowy......... UNIX> echo '1024 * 6' | bc 6144 UNIX> xxd -s 6144 -len 16 -g 1 tree-2.jdisk 0001800: 45 69 64 65 72 00 00 00 00 00 00 00 00 00 00 00 Eider........... UNIX>So, now we know:
UNIX> echo '7*1024' | bc 7168 UNIX> xxd -g 1 -s 7168 -len 16 tree-2.jdisk 0001c00: 00 03 47 72 61 63 65 00 00 00 00 00 00 00 00 00 ..Grace......... UNIX> xxd -g 4 -e -s 8172 -len 20 tree-2.jdisk 00001fec: 00000005 00000009 00000003 00000000 ................ 00001ffc: 00000000 .... UNIX> xxd -g 1 -s 7370 -len 16 tree-2.jdisk 0001cca: 4a 61 6d 65 73 00 00 00 00 00 00 00 00 00 00 00 James........... UNIX> xxd -g 1 -s 7570 -len 16 tree-2.jdisk 0001d92: 4c 61 6e 64 6f 6e 00 00 00 00 00 00 00 00 00 00 Landon.......... UNIX>Let's see the vals, which start 20 bytes from the end of sector 7:
UNIX> echo 7168+1024-20 | bc 8172 UNIX> xxd -g 1 -s 8172 -len 20 tree-2.jdisk 0001fec: 05 00 00 00 09 00 00 00 03 00 00 00 00 00 00 00 ................ 0001ffc: 00 00 00 00 .... UNIX> echo '5*1024' | bc 5120 UNIX> xxd -g 1 -s 5120 -len 16 tree-2.jdisk 0001400: 50 72 6f 63 74 65 72 00 00 00 00 00 00 00 00 00 Procter......... UNIX> echo '9*1024' | bc 9216 UNIX> xxd -g 1 -s 9216 -len 16 tree-2.jdisk 0002400: 43 68 75 67 00 00 00 00 00 00 00 00 00 00 00 00 Chug............ UNIX> echo '3*1024' | bc 3072 UNIX> xxd -g 1 -s 3072 -len 16 tree-2.jdisk 0000c00: 44 65 6c 69 6e 71 75 65 6e 74 00 00 00 00 00 00 Delinquent...... UNIX>So, armed with that information, we can draw our B-Tree as follows:
I have starred that last LBA, because it is unused.
I want to stress here that even though our examples above used null-terminated strings as keys, our btrees can take any keys that are key_size bytes. Use memcmp() for key comparison. (And use memcpy() to copy keys and vals to their respective homes if need be).
bin/b_tree_test file [CREATE file_size key_size] |
If you call it with "CREATE", then it creates a btree file with the given size and key size. If you don't call it with "CREATE", then it attaches to the preexisting btree file.
Once the file is created, it accepts three commands:
UNIX> rm tree-2.jdisk UNIX> bin/b_tree_test tree-2.jdisk CREATE 20480 200 I Daniel Shadowy Insert return value: 2 I Landon Delinquent Insert return value: 3 I Alexis Gyroscope Insert return value: 4 I Grace Procter Insert return value: 5 P LBA 0x00000001. Internal: 0 Entry 0: Key: Alexis LBA: 0x00000004 Entry 1: Key: Daniel LBA: 0x00000002 Entry 2: Key: Grace LBA: 0x00000005 Entry 3: Key: Landon LBA: 0x00000003 Entry 4: LBA: 0x00000000 I Eli Eider Insert return value: 6 P LBA 0x00000008. Internal: 1 Entry 0: Key: Eli LBA: 0x00000001 Entry 1: LBA: 0x00000007 LBA 0x00000001. Internal: 0 Entry 0: Key: Alexis LBA: 0x00000004 Entry 1: Key: Daniel LBA: 0x00000002 Entry 2: LBA: 0x00000006 LBA 0x00000007. Internal: 0 Entry 0: Key: Grace LBA: 0x00000005 Entry 1: Key: Landon LBA: 0x00000003 Entry 2: LBA: 0x00000000 I James Chug Insert return value: 9 I Caleb Sudanese Insert return value: 10 I Allison Embellish Insert return value: 11 P LBA 0x00000008. Internal: 1 Entry 0: Key: Eli LBA: 0x00000001 Entry 1: LBA: 0x00000007 LBA 0x00000001. Internal: 0 Entry 0: Key: Alexis LBA: 0x00000004 Entry 1: Key: Allison LBA: 0x0000000b Entry 2: Key: Caleb LBA: 0x0000000a Entry 3: Key: Daniel LBA: 0x00000002 Entry 4: LBA: 0x00000006 LBA 0x00000007. Internal: 0 Entry 0: Key: Grace LBA: 0x00000005 Entry 1: Key: James LBA: 0x00000009 Entry 2: Key: Landon LBA: 0x00000003 Entry 3: LBA: 0x00000000 <CNTL-D> Reads: 18 Writes: 28 UNIX>
bin/random_tester_1 seed nevents key_size val_size tree_file input_file output_file |
Here are the parameters:
At the end, it prints the number of reads and writes to the jdisk. Here are some examples:
UNIX> bin/random_tester_1 0 100 50 30 tree-3.jdisk - tree-3.txt > tmp-output.txtTake a look at the program's output, in tmp-output.txt. Here are the first few lines:
UNIX> head -n 5 tmp-output.txt I mjeglqyuapnuiutrwhuvvjmvgglhhapmuclvaynkuh ajujjsdaaeuuuzhqroq I xcedrqfnxavvqfguowkwpx jlkrkbpjgahf I lgltigobrvwkgathopicmh sxstblhiqfyowhyvefbweptotgp I bkcz ybsmyalyricsxgmptjelqds F xcedrqfnxavvqfguowkwpx UNIX>It inserted four keys/vals, and then it performed a find on "xcedrqfnxavvqfguowkwpx". After that find, it double-checked that the LBA matched the original insert, and that the bytes are "jlkrkbpjgahf". We can double-check that with b_tree_test and jdisk_test:
UNIX> echo F xcedrqfnxavvqfguowkwpx | bin/b_tree_test tree-3.jdisk Attached to tree-3.jdisk. FS: 10240000 - KS: 50 Find return value: 3 Reads: 3 Writes: 0 UNIX> echo '3*1024' | bc 3072 UNIX> xxd -s 3072 -len 16 -g 1 tree-3.jdisk 0000c00: 6a 6c 6b 72 6b 62 70 6a 67 61 68 66 00 00 00 00 jlkrkbpjgahf.... UNIX> UNIX>The file tree-3.txt has the output from random_tester_1. I can use it to make another call to random_tester_1 that attaches to the tree that it just created. The input file tells me what's in the tree (keys, vals and LBA's). Now it generates new events (using the old keys and the new keys for finding):
UNIX> bin/random_tester_1 1 10 50 30 tree-3.jdisk tree-3.txt - I ndbqtrkuwgsgmilthoqwkhsvhxerjetjbcxzakw ctvztvyrhnbig I hwe delscbm F auaerduagfi F dmkaaoptzhmyszbbybrfzfqyyfshbq F yh I murekjom mxv I epunbavuprsytivhufivhxhpt j F jblrqvbhhtssacoxqeksrxosrhnhpeiqmxjjaxrfiue F riosrojnkzqcyqgkasvfwybfiqgpvfzacsfbcodp I wdxszkdgjxqyakiqlweuah lt Reads: 26 Writes: 17 UNIX>You'll note, it called b_tree_find() on "yh", which was in the old tree, and made sure that the LBA and val for that key is what it was when we inserted it on the first call to random_tester_1.
random_tester_1 will be a good way for you to test that your btree programs and mine are interoperable. For example, you can call it once with my version of random_tester_1, and then again on your version with the input generated from the first call. They should both work together!
random_tester_2 seed nevents key_size tree_file input_file output_file |
The only parameter that is missing is val_size. random_tester_2 now generates random keys that are exactly key_size bytes, and they can be any bytes. No longer are they strings. It also generates random vals that are exactly JDISK_SECTOR_SIZE bytes. This is the ultimate test for your programs, because you can't debug with nice strings. You'll have to debug with a little moxy.
The input and output files are now binary.
UNIX> bin/b_tree_dcs tree-2.jdisk Key_Size: 200 Key 0: S Alexis Key 1: S Allison Key 2: S Caleb Key 3: S Daniel Key 4: S Eli Key 5: S Grace Key 6: S James Key 7: S Landon Val 0: S Gyroscope Val 1: S Embellish Val 2: S Sudanese Val 3: S Shadowy Val 4: S Eider Val 5: S Procter Val 6: S Chug Val 7: S Delinquent UNIX> rm -f tmp.jdisk; bin/random_tester_2 0 2 25 tmp.jdisk - - > /dev/null UNIX> bin/b_tree_dcs tmp.jdisk Key_Size: 25 Key 0: H A95E509709556B8137127CE6160C538C53E4488F4B083ADA9A Key 1: H C00A91D0F269AEBAB4483B0B5CEA668381C81552B9297D3A7D Val 0: H 7824B141A5586A73755C3CFC645E2F4FC454332A25B44DE8CFCDEC636CC ... Val 1: H 5AD463298B1EBB204624070D8C848B448EE15246053A1B632E59E07A666 ... UNIX>
For example, gradescript number 1 uses b_tree_test to insert one key into the B-tree file:
UNIX> sh /home/jplank/www-home/cs494/labs/Lab-3-B-Tree/grader.sh 1 N Problem 1 Correct. tree-answer.jdisk and tree-jplank.jdisk created as follows: ---------------------------------------------------------- rm -f tree-jplank.jdisk tree-answer.jdisk /home/jplank/cs494/labs/Lab-3-B-Tree/bin/b_tree_test tree-answer.jdisk CREATE 51200 26 < input.txt > /dev/null ./bin/b_tree_test tree-jplank.jdisk CREATE 51200 26 < input.txt > /dev/null UNIX> cat input.txt I Mackenzie Malignant UNIX> bin/b_tree_dcs tree-answer.jdisk Key_Size: 26 Key 0: S Mackenzie Val 0: S Malignant UNIX> bin/b_tree_dcs tree-jplank.jdisk Key_Size: 26 Key 0: S Mackenzie Val 0: S Malignant UNIX>Gradescript number 2 is a little more complex -- it creates an input file with 161 random insertions (some of which will replace keys). It calls the b_disk_test program in the lab directory to create tree-answer.jdisk, from the first 137 entries of the input file. Then it copies tree-answer.jdisk to tree-$USER.jdisk, and processes the remaining 24 insertions using the lab program and your program:
UNIX> sh /home/jplank/www-home/cs494/labs/Lab-3-B-Tree/grader.sh 2 N Problem 2 Correct. tree-answer.jdisk and tree-jplank.jdisk created as follows: ---------------------------------------------------------- rm -f tree-jplank.jdisk tree-answer.jdisk head -n 137 input.txt | /home/jplank/cs494/labs/Lab-3-B-Tree/bin/b_tree_test tree-answer.jdisk CREATE 614400 129 > /dev/null cp tree-answer.jdisk tree-jplank.jdisk sed 1,137d input.txt | /home/jplank/cs494/labs/Lab-3-B-Tree/bin/b_tree_test tree-answer.jdisk > /dev/null sed 1,137d input.txt | ./bin/b_tree_test tree-jplank.jdisk > /dev/null UNIX> wc input.txt 161 483 2807 input.txt UNIX> head input.txt I Blake Shelter I Mackenzie Effluvium I Brianna Reluctant I Evelyn Vigilantism I Mason Drawn I Anthony Singlet I Anna Microjoule I Kate Howell I Paige Tinder I Xavier Roulette UNIX> tail input.txt I Peyton Tammany I Caleb Abe I Landon Knick I Makayla Fortescue I Brody Repeat I Nathan Swizzle I Madison Nonetheless I Ava Astronautic I Zoey Indebted I Chloe Smirk UNIX> bin/b_tree_dcs tree-answer.jdisk | head Key_Size: 129 Key 0: S Aaron Key 1: S Abigail Key 2: S Addison Key 3: S Aidan Key 4: S Aiden Key 5: S Alex Key 6: S Alexander Key 7: S Alexis Key 8: S Allison UNIX>The tests performed by the gradescript are based on the number given mod ten. The tests are as follows:
typedef struct { int key_size; /* These are the first 16/12 bytes in sector 0 */ unsigned int root_lba; unsigned long first_free_block; void *disk; /* The jdisk */ unsigned long size; /* The jdisk's size */ unsigned long num_lbas; /* size/JDISK_SECTOR_SIZE */ int keys_per_block; /* MAXKEY */ int lbas_per_block; /* MAXKEY+1 */ Tree_Node *free_list; /* Free list of nodes */ Tree_Node *tmp_e; /* When find() fails, this is a pointer to the external node */ int tmp_e_index; /* and the index where the key should have gone */ int flush; /* Should I flush sector[0] to disk after b_tree_insert() */ } B_Tree; |
Note how the first three fields are such that you can write them to sector 0 (and read them from sector 0). As it turns out, I write the whole struct to sector 0, but the remaining bytes are ignored.
A Tree_Node is the internal representation of a node. Here's its struct:
typedef struct tnode { unsigned char bytes[JDISK_SECTOR_SIZE+256]; /* This holds the sector for reading and writing. It has extra room because your internal representation will hold an extra key. */ unsigned char nkeys; /* Number of keys in the node */ unsigned char flush; /* Should I flush this to disk at the end of b_tree_insert()? */ unsigned char internal; /* Internal or external node */ unsigned int lba; /* LBA when the node is flushed */ unsigned char **keys; /* Pointers to the keys. Size = MAXKEY+1 */ unsigned int *lbas; /* Pointer to the array of LBA's. Size = MAXKEY+2 */ struct tnode *parent; /* Pointer to my parent -- useful for splitting */ int parent_index; /* My index in my parent */ struct tnode *ptr; /* Free list link */ } Tree_Node; |
I maintain my own free list of Tree_Node structs. When I allocate one, I do three malloc() calls. One is for the Tree_Node itself, one is for keys and one is for lbas. I set the values of keys right after I allocate them, since they point to fixed locations in bytes. For example, keys[0] will point to bytes+2. keys[1] will point to bytes+2+key_size. Etc. The size of keys is MAXKEY+1. Moreover, the size of lbas is MAXKEY+2. The reason is that in my internal representation of a B-Tree node, I am allowed to store an extra key and val. This simplifies the implementation of splitting in an enormous way. If you don't believe me, ask anyone in the 2015 version of CS494 who tried to implement B-Trees without it. When I'm done with a Tree_Node, I put it onto my free list. That way, if I need a new Tree_Node later, and there's one on the free list, I don't have to do any malloc() calls -- I simply grab it from the free list. Keys and lbas are already allocated, and the values of keys are already set correctly. That's convenient.
I have a procedure which reads a Tree_Node from a jdisk. It reads it into bytes, and then it copies the LBA's from the end of the sector into lbas (using memcpy). It has to do this, because it uses the end of the sector for that extra key. It also reads nkeys and internal from the sector. A convenient thing is that the keys pointers are already pointing to the correct place, so you don't need to do anything special with the keys.
I set the flush field whenever I modify a B-Tree node, and the modified node needs to be flushed to disk. I do the final flushing at the end of b_tree_insert(). When I flush a node, I need to create the sector. I do this by copying nkeys and internal into their proper place in bytes. I then copy lbas to their proper place in bytes. The keys are already in their proper place. I then write bytes using jdisk_write().
What I did for splitting was as follows -- I'd go ahead and insert the key/lba. Since there's room for an extra key and lba in the Tree_Node, this works even when the node is full. I then checked to see if the node needed to be split, and if so, I called a procedure called split_node. This procedure has to be recursive, BTW.
When I call find(), I have it read in all of the nodes on the path to the external node, setting their parent fields, but setting flush to 0. The flush field is set by b_tree_insert() when the node changes. The parent fields are used on splitting. When I'm done with b_tree_insert() or b_tree_find(), I call free_and_flush() on every node from the external node up to the root. This flushes the node if flush is set, and puts the node onto the free list. I found free_and_flush super-helpful.
UNIX> cp tree-2.jdisk tmp.jdisk UNIX> echo P | bin/b_tree_test tmp.jdisk Attached to tmp.jdisk. FS: 20480 - KS: 200 LBA 0x00000008. Internal: 1 Entry 0: Key: Eli LBA: 0x00000001 Entry 1: LBA: 0x00000007 LBA 0x00000001. Internal: 0 Entry 0: Key: Alexis LBA: 0x00000004 Entry 1: Key: Allison LBA: 0x0000000b Entry 2: Key: Caleb LBA: 0x0000000a Entry 3: Key: Daniel LBA: 0x00000002 Entry 4: LBA: 0x00000006 LBA 0x00000007. Internal: 0 Entry 0: Key: Grace LBA: 0x00000005 Entry 1: Key: James LBA: 0x00000009 Entry 2: Key: Landon LBA: 0x00000003 Entry 3: LBA: 0x00000000 Reads: 4 Writes: 0 UNIX> echo I A-Aron Stoae | bin/b_tree_test tmp.jdisk Attached to tmp.jdisk. FS: 20480 - KS: 200 Insert return value: 12 Reads: 3 Writes: 5 UNIX> echo P | bin/b_tree_test tmp.jdisk Attached to tmp.jdisk. FS: 20480 - KS: 200 LBA 0x00000008. Internal: 1 Entry 0: Key: Allison LBA: 0x00000001 Entry 1: Key: Eli LBA: 0x0000000d Entry 2: LBA: 0x00000007 LBA 0x00000001. Internal: 0 Entry 0: Key: A-Aron LBA: 0x0000000c Entry 1: Key: Alexis LBA: 0x00000004 Entry 2: LBA: 0x0000000b LBA 0x0000000d. Internal: 0 Entry 0: Key: Caleb LBA: 0x0000000a Entry 1: Key: Daniel LBA: 0x00000002 Entry 2: LBA: 0x00000006 LBA 0x00000007. Internal: 0 Entry 0: Key: Grace LBA: 0x00000005 Entry 1: Key: James LBA: 0x00000009 Entry 2: Key: Landon LBA: 0x00000003 Entry 3: LBA: 0x00000000 Reads: 5 Writes: 0 UNIX>Now, here are the tree nodes as they are read in the beginning of the insertion, and as they are flushed after the insertion:
UNIX> cp tree-2.jdisk tmp.jdisk UNIX> echo I A-Aron Stoae | bin/b_tree_test_inst tmp.jdisk Attached to tmp.jdisk. FS: 20480 - KS: 200 Find(): Read -- Tree Node 0x1e840c0 Bytes: 0x1e840c0 Nkeys Address: 0x1e845c0 Nkeys Value: 1 Flush Value: 0 Internal Value: 1 LBA Value: 8 Keys: 0x1e84600 keys is allocated with malloc(). Keys[0] 0x1e840c2 (Eli) This is bytes+2 Keys[1] 0x1e8418a (Landon) This is bytes+2+key_size. Since nkeys equals 1, the contents here are meaningless. They happen to hold "Landon" from a previous time, when "Landon" was the second key. You have to ignore this, because Nkeys is one. Keys[2] 0x1e84252 () This is bytes+2+key_size*2 Keys[3] 0x1e8431a () Keys[4] 0x1e843e2 () This is the extra key. Lbas: 0x1e84630 lbas is allocated with malloc(). Lbas[0] 0x1 Lba of the internal node before "Eli". Lbas[1] 0x7 Lba of the internal node after "Eli". Lbas[2] 0x0 The value of these are meaningless. They happen to be zero. Lbas[3] 0x0 Lbas[4] 0x0 Lbas[5] 0x0 This is the extra lba Parent 0x0 Parent_index -1 &ptr 0x1e845e8 This is meaningless, because the tree_node isn't on the free list. Find(): Read -- Tree Node 0x1e84650 Bytes: 0x1e84650 Nkeys Address: 0x1e84b50 Nkeys Value: 4 Flush Value: 0 Internal Value: 0 LBA Value: 1 Keys: 0x1e84b90 Keys[0] 0x1e84652 (Alexis) Keys[1] 0x1e8471a (Allison) Keys[2] 0x1e847e2 (Caleb) Keys[3] 0x1e848aa (Daniel) Keys[4] 0x1e84972 () Lbas: 0x1e84bc0 Lbas[0] 0x4 Lbas[1] 0xb Lbas[2] 0xa Lbas[3] 0x2 Lbas[4] 0x6 Lbas[5] 0x0 Parent 0x1e840c0 Parent_index 0 &ptr 0x1e84b78 Free_and_flush(): Write -- Tree Node 0x1e84be0 Bytes: 0x1e84be0 Nkeys Address: 0x1e850e0 Nkeys Value: 2 Flush Value: 1 Internal Value: 0 LBA Value: 13 Keys: 0x1e85120 Keys[0] 0x1e84be2 (Caleb) Keys[1] 0x1e84caa (Daniel) Keys[2] 0x1e84d72 () Keys[3] 0x1e84e3a () Keys[4] 0x1e84f02 () Lbas: 0x1e85150 Lbas[0] 0xa Lbas[1] 0x2 Lbas[2] 0x6 Lbas[3] 0x0 Lbas[4] 0x0 Lbas[5] 0x0 Parent 0x0 Parent_index 0 &ptr 0x1e85108 Free_and_flush(): Write -- Tree Node 0x1e84650 Bytes: 0x1e84650 Nkeys Address: 0x1e84b50 Nkeys Value: 2 Flush Value: 1 Internal Value: 0 LBA Value: 1 Keys: 0x1e84b90 Keys[0] 0x1e84652 (A-Aron) Keys[1] 0x1e8471a (Alexis) Keys[2] 0x1e847e2 (Allison) These are leftover from before the split. nkeys is 2, Keys[3] 0x1e848aa (Caleb) so they are ignored, but they are written to disk. Keys[4] 0x1e84972 (Daniel) As you can see there is room for 5 keys, which makes splitting easier. Lbas: 0x1e84bc0 Lbas[0] 0xc Lbas[1] 0x4 Lbas[2] 0xb Lbas[3] 0xa Same with these LBA's. Lbas[4] 0x2 Lbas[5] 0x6 Parent 0x1e840c0 Parent_index 0 &ptr 0x1e84b78 Free_and_flush(): Write -- Tree Node 0x1e840c0 Bytes: 0x1e840c0 Nkeys Address: 0x1e845c0 Nkeys Value: 2 Flush Value: 1 Internal Value: 1 LBA Value: 8 Keys: 0x1e84600 Keys[0] 0x1e840c2 (Allison) Keys[1] 0x1e8418a (Eli) Keys[2] 0x1e84252 () Keys[3] 0x1e8431a () Keys[4] 0x1e843e2 () Lbas: 0x1e84630 Lbas[0] 0x1 Lbas[1] 0xd Lbas[2] 0x7 Lbas[3] 0x0 Lbas[4] 0x0 Lbas[5] 0x0 Parent 0x0 Parent_index -1 &ptr 0x1e845e8 Insert return value: 12 Reads: 3 Writes: 5 UNIX>
How hard can it be? Inserting keys at random Segmentation fault David C. - 2015 |
Time to start split node. I saved the best part for last. What day is it now? Tyler M. - 2016 |
Five hundred lines long One hundred problems correct Please, now can I sleep? David C. - 2015 |
Seg fault on case 5... I guess I should try Valgrind. Seg fault on case 6... Tyler M. - 2016 |
Passes the gradescript Allocates 10 million bytes Pattern not found: free() Elliot G. - 2016 |
My data structure Does not include a spare key. No spring break for me. Dr. Plank - 2015 |
Seven hundred lines? I guess I should have used find.. Insert can find too! Tyler M. - 2016 |