This lab is designed to give you experience with various sorting algorithms and with using objects to implement these algorithms. In particular you will need to implement a sort-merge algorithm for external sorting and a quicksort for internal sorting.
You should copy the following files from the /ruby/homes/ftp/pub/bvz/classes/cs302/labs/lab8 directory to your directory:
In order to make the Makefile work for you you will need to call the file you create bsort.cc.
In order to make the graphics package work, you will need to type the following two commands in every window in which you run your program (or you can place them in your .cshrc file):
setenv AMULET_DIR /sunshine/homes/bvz/amulet/amulet3/ setenv AMULET_VARS_FILE Makefile.vars.gcc.SolarisIf you have certain types of protections, you may also get a message saying that your display could not be opened when you run your program. If this happens, type the following command:
xhost +machine_namewhere machine_name is the name of your machine (e.g., cetus4a).
In this lab you will implement the balanced, multi-way external sorting algorithm presented in class. You will also experiment with varying the size of the initial runs and the size of P to get an idea of how these parameters affect the performance of the algorithm. Here is a man page-like description of the program you are to write:
Name
bsort - sort and collate lines of a file.
Synopsis
bsort [ -p num_ways ] [ -r run_size] input_file output_file
Description
bsort sorts the lines in input_file from smallest to largest and stores the sorted lines in output_file.
You may assume that each line in input_file has three integer fields. The first field is the sort key. An example data file may be found in data.
Options
Examples
bsort data output-data
bsort -p 10 data output-data
bsort -r 6 -p 4 data output-data
The output of bsort is the sorted output file. It should not produce any messages unless the user messes up the command line arguments. A working version of bsort is in the directory /ruby/homes/ftp/pub/bvz/classes/cs302/bin/bsort. If you have a question about what your program should do, first see what this program does.
Write this program using the pseudo-code provided in class and the various classes that we have provided for you.
I created the bsort program incrementally using the following steps:
In order to assist you with this lab we have prepared a visual debugging environment that shows your disk banks and arrays and that allows you to pause your program and inspect these data structures. The visual debugger is written as a driver program that initializes the environment and then calls your external sorting function. Hence for this lab you will not write a main function. Instead you will write a function named external_sort and this function will be called by our driver program (named driver.cc). external_sort takes two arguments as parameters--argc and argv:
void external_sort (int argc, char** argv);
You will also need to use the classes that we have provided in order to make the visual debugging environment work. a list of these classes and the methods they support: You can find documentation for these classes and an explanation of how to make the visual debugger pause here.
Once you have completed bsort, you will perform a series of experiments that vary the initial run size and P parameters. You should use initial run sizes of 100, 1000, and 10000 and values of P equal to 2, 4, 8, and 16. For each experiment you should sort the file data found in lab8/data. You should record 1) the aggregate number of file reads and writes (i.e., the number of calls you make to the Read and Write methods for the File objects), and 2) the number of passes your program requires once the initial set of runs has been created. The results should be recorded in a table that looks as follows:
\ P 2 4 8 16
Run Size \
100 i/o, pp i/o, pp i/o, pp i/o, pp
1000 i/o, pp i/o, pp i/o, pp i/o, pp
10000 i/o, pp i/o, pp i/o, pp i/o, pp
You should also produce a graph of the results. The x-axis should be P, the y-axis should be the number of reads/writes, and the curves should be of run size. Hence you will have three curves, one each for 100, 1000, and 10000. On the x-axis, you should evenly space 2, 4, 8, and 16.
i/o stands for the aggregate number of reads and writes and pp stands for the number of passes. Even if you cannot get bsort to work, you can use the bsort in the bin directory to conduct these experiments.
You should submit the following items to Hui. The first two items should be sent in one email message, and the third item should be packaged using 302submit and sent to Hui in a separate email message.