Your second project will be to build a sorting utility called
volsort in C++ that will mimic the UNIX tool "sort" with a
twist: the underlying container must be a linked list.
Your program must implement the following sorting methods or modes:
qsort: This sorting method simply uses the qsort function.
merge: This sorting method uses a custom implementation of the merge sort algorithm.
quick: This sorting method uses a custom implementation of the quick sort algorithm.
This project is to be done in groups of 1 - 4 students and is due Wednesday February 12th, 2025 at 5:00pm.
My implementation, which you should mimic, has the following usage message:
$ volsort -h
usage: volsort
-m MODE Sorting mode (oblivious, stl, qsort, merge, quick)
-n Perform numerical ordering
volsort therefore must take two optional arguments: the -m
flag, which lets the user select which sorting method to use; and
the -n flag, which informs the program to treat the input values as numeric
integers rather than textual strings.
As with most Unix filters, volsort reads from [standand input] and then
outputs to standard output:
# Use the volsort solution to order 10 random numbers in the file input $ volsort -m quick < input 12 32 37 37 50 61 69 71 90 97
Your header file should contains the definition of the struct Node and struct List such as this:
struct Node { std::string string; int number; Node *next; }; struct List { Node *head; size_t size; List(); ~List(); void push_front(const std::string &s); };
This struct List is a singly-linked list with a
constructor, destructor, and a single push_front method. This
struct List consists of struct Node entries that contain both string
values, the corresponding number value, and a pointer to the next struct
Node.
It is required that you implement the struct List constructor,
destructor, and push_front method. The push_front method should
insert a new struct Node into the front of the list. To get the number
value for the new struct Node, you should use the std::stoi function (if
this conversion fails, you can default to 0 for the number value).
You should implement the struct
Node comparison functions for STD::sort in C
++ style as:
// C++ Style comparison function bool node_number_compare(const Node *a, const Node *b); bool node_string_compare(const Node *a, const Node *b);
This will differ for qsort, as we will discuss in class and the accompanying video for this assignment.
The first sorting mode is to use the std::sort method. Because our struct List does not implement random access iterators,
we will need to copy the struct Nodes in the struct List into another
container such as an vector. Once we have this copy, we can then call the
std::sort
function on this new container with the appropriate comparison function.
Note that our Node struct has three values, one of which is a string.
The most efficient way to reconstitute the sorted order is to update
the links between the struct Nodes -- not to simply
replacing the list data with node data. For example, for a list of
n strings you may have to run atoi n times, versus
simply updating n pointers as suggested here.
Hint: You should store struct Node * in your new container.
Hint: Make sure you set the head of the struct List after you have
set the links of all the struct Nodes.
The second sorting mode is highly similar to the first function, except you will use qsort instead of std::sort. As a hint, here is the C-style integer comparison function also outlined in the course notes:
int qComparisonInt(const void * a, const void * b) {
const Node * ia = *(const Node **) a;
const Node * ib = *(const Node **) b;
return ia->number - ib->number; // sorts in ascending order
}
The * notation above isn't very intuitive but since void * in C is the most generic thing you can think of, we need to tell the compiler that the pointer is itself a pointer. This is not something I'll test you on but you may need to understand void * for CS360. Lets see.
Both of the custom merge sort and quick sort implementations must use the following divide-and-conquer strategy. Msort is used here as the initial example:
Node *msort(Node *head, CompareFunction compare) { // Handle base case // Divide into left and right sublists // Conquer left and right sublists // Combine left and right sublists }
Specifically, you could implement the third sorting mode as follows:
// Public functions void merge_sort(List &l, bool numeric); // Internal functions Node *msort(Node *head, CompareFunction compare); void split(Node *head, Node *&left, Node *&right); Node *merge(Node *left, Node *right, CompareFunction compare);
merge_sort takes a struct List and whether or not the comparison should
be numeric and performs the top-down merge sort algorithm. This
function serves as a wrapper or helper function for the recursive msort
function.
msort is the recursive portion of the algorithm and calls split to
divide and calls merge to conquer. It returns the new head of
the list.
split is a helper function that splits the singly-linked list in half
by using the slow and faster pointer trick (aka [tortoise and hare]).
merge is a helper function that combines both the left and right
lists and returns the new head of the list.
The fourth sorting mode is a custom implementation of quick sort. You are to implement a simple version of the algorithm where the first element is the pivot.
// Public functions void quick_sort(List &l, bool numeric); // Internal functions Node *qsort(Node *head, CompareFunction compare) void partition(Node *head, Node *pivot, Node *&left, Node *&right, CompareFunction compare); Node *concatenate(Node *left, Node *right);
quick_sort takes a struct List and whether or not the comparison should
be numeric and performs the quick sort algorithm. This function serves
as a wrapper or helper function for the recursive qsort function.
qsort is the recursive portion of the algorithm and calls partition to
divide the list and calls concatenate to conquer. It returns the
new head of the list.
partition is a helper function that splits the singly-linked list into
two left and right lists such that all elements less than the pivot
are in the left side and everything else is on the right.
concatenate is a helper function that combines both the left and
right lists and returns the new head of the list.
To test your programs, simply run make test. Your programs must correctly
process the input correctly, have no memory errors, and pass the test cases provided.
When you are finished implementing your volsort sorting utility answer the
following questions in the project02/README.md:
Benchmark all four different sorting modes on files with the following number of integers:
100000, 500000, 1000000, 10000000, 25000000
Record the elapsed time and memory consumption of each mode and file size in a Markdown table:
| Mode | Size | Elapsed Time | |---------|-----------|---------------| | STL | 100000 | 0 | | STL | 500000 | 0 | | ... | ... | ... | | QSORT | 100000 | 0 | | ... | ... | ... | | MERGE | 100000 | 0 | | ... | ... | ... | | QUICK | 100000 | 0 | | ... | ... | ... |
Hint: Use the Unix utility time to get the elapsed time per test.
Hint: Write a simple C/C++ program or create a Python script to generate the input files.
Optional: If you have prior experience, write another simple script (shell script for example) to automate the benchmarking.
You may wish to create your test files in /tmp as that is likely larger than your EECS local directory.
Likewise, you should redirect the output of volsort to /dev/null for
the benchmarking. Doing this on tesla01 my implementation of merge sort takes a little under a minute for 50 million numbers.
After you have performed your benchmark:
Discuss the relative performance of each sorting method and try to explain the differences.
What do these results reveal about the relationship between theoretical complexity discussed in class and actual performance?
In your opinion, which sorting mode is the best? Justify your conclusion by examining the trade-offs for the chosen mode.
In addition to the questions, please provide a brief summary of each individual group members contributions to the project.
To faciliate testing, you were previously asked to clone the course Github repository as follows:
git clone https://github.com/semrich/cs302-25.git cs302
For this assignment, update this clone by using the following:
git pull
We'll discuss this in class but note that your program must be compilable using make. To test your solution against ours, type:
make test
As we grade the assignment, "as intended" means you achieve the correct output in a given mode as outlined in this project document above. We may choose to assign up to 50% of the total points for as many alternative implementations, e.g., volsort -m stl copies list into a vector, sorts using std:sort(v.begin(), v.end()), and copies back into the list if:
Full rubric in three total parts follows:
Part 1: Project high-level requirements
Part 2: Make test
Part 3: Report in Git readme (as requested) or separate doc
To submit your solution, you must submit a single TAR archive (.tar) on Canvas prior to the deadline.
Note: Although submission will be faciliated by Canvas, we will compile and test on EECS lab machines!
If you develop your solution elsewhere please make sure it works on the lab computers prior to the deadline