class DSPrivate; class DisjSets { public: DisjSets(int numelements); ~DisjSets(); int find(int x); void unionSets(int root1, int root2); int maxheight(); double avgheight(); void print(); private: DSPrivate *d; };The constructor, destructor, find() and unionSets() methods are all straightforward and described in the book. Note that you pass set names (i.e. return values from find()) to unionSets(), and no set elements.
The last three methods help you look at the data structure that implements the disjoint set. Maxheight() returns the maximum height of a set. This is the maximum number of parent pointers that must be chased on a find() operation. Avgheight() returns the average number of parent pointers that must be chased on a find() operation. Finally, print() prints out the array that implements the disjoint set.
The first thing that you should do is implement the DisjSets class using the basic book implmentation. In this implementation, you simply make root2 the parent of root1 in unionSets.
You can test your implementation with the program SimpDisjTest.cpp. This program sets up a disjoint set with 10 elements, and then does 9 union operations, printing out the names of all 10 roots, and calling print() after each find operation. Here is the output when linked with my SimpDisjBase:
UNIX> SimpDisjTestBase Start: 0 1 2 3 4 5 6 7 8 9 Array:-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Max Height: 1 Avg Height: 1.0 Union(find(1), find(2)): 0 2 2 3 4 5 6 7 8 9 Array:-1 2 -1 -1 -1 -1 -1 -1 -1 -1 Union(find(0), find(2)): 2 2 2 3 4 5 6 7 8 9 Array: 2 2 -1 -1 -1 -1 -1 -1 -1 -1 Union(find(0), find(9)): 9 9 9 3 4 5 6 7 8 9 Array: 2 2 9 -1 -1 -1 -1 -1 -1 -1 Union(find(8), find(6)): 9 9 9 3 4 5 6 7 6 9 Array: 2 2 9 -1 -1 -1 -1 -1 6 -1 Union(find(4), find(5)): 9 9 9 3 5 5 6 7 6 9 Array: 2 2 9 -1 5 -1 -1 -1 6 -1 Union(find(4), find(3)): 9 9 9 3 3 3 6 7 6 9 Array: 2 2 9 -1 5 3 -1 -1 6 -1 Union(find(2), find(5)): 3 3 3 3 3 3 6 7 6 3 Array: 2 2 9 -1 5 3 -1 -1 6 3 Union(find(8), find(7)): 3 3 3 3 3 3 7 7 7 3 Array: 2 2 9 -1 5 3 7 -1 6 3 Union(find(2), find(7)): 7 7 7 7 7 7 7 7 7 7 Array: 2 2 9 7 5 3 7 -1 6 3 Max Height: 5 Avg Height: 3.2 UNIX>As you can see, each unionSets() operation simply makes root2 the parent of root1. For example, the first unionSets() operation makes 2 the parent of 1. This is seen both by the fact that 1's root becomes 2, and that 1's parent pointer in the array becomes 2.
I would recommend that you make sure that your output of this program matches mine exactly before you move on.
Next, implement DisjSetsSize.cpp. The only difference between this and DisjSetsBase.cpp is that the unionSets() operation checks the size of each set, and makes the larger one the parent of the smaller one. In the case of ties, I make root2 the parent of root1. You'll need to keep a second array of sizes as recommended by the book. Test this with SimpDisjTest.cpp and make sure that the output of your SimpDisjTestSize matches mine.
Next, implement DisjSetsHeight.cpp and test SimpDisjTestHeight. Once again, ties in unionSets mean that root2 becomes the parent of root1.
Finally, implement path compression in DisjSetsComp.cpp. Do this by starting with SimpDisjBase.cpp and simply modifying the find() method to do path compression. I didn't use recursion as in the book because it's inefficient. You can use recursion if you want. I don't care. Again, test this with SimpDisjTestComp and make sure that your output matches mine exactly.
Here is an example of a 2*2 maze:
_ ___ | | | |__ |In this maze:
_ ___________________________________________________________ | ___ | __| | __| | | __| __|__ |____ __| __| __| | | | | |__ |_| |__ | | | |__ ___ _____ |__ |__ |_| | |_| | | | |___| | | |___| |__ |_| | | ___ | |_| |__ | | | __| |_|__ | __|___|__ ________| | __|_| |_| |__ __|_| | |__ |___| | | |_| __|____ __|__ __| | | | |_| __| ___ __| |_| | |__ | ___ |__ ___ __|__ |____ |_| | |_| |____ |___| | | | ____| __| | __|__ ___ | _____ |___| | |__ |__ __| |_|_| |_| | |__ |__ | |_|__ __|____ |__ | | __| | | | | | ____| | |__ | | | ___ __| | | | | | | ____| ____| | | |_________|___|_|_|_|_|_____|_____|_|___|_|_|_____|___|____ |
MAZE rows X colsObviously, these define how big the maze is. You must have a size specification line before any cell specification lines.
CELL row column [LRTB]These specify the walls in the given cell. If a cell is not specified with a cell specification line, then it is assumed to have no walls.
MAZE 2 X 2 CELL 0 0 L R CELL 0 1 T L R CELL 1 0 B L CELL 1 1 R
You can have maze files that don't satisfy the path criteria of mazes. For example, the following is a valid maze file, even though each cell is boxed in:
MAZE 2 X 2 CELL 0 0 L R T B CELL 0 1 L R T B CELL 1 0 L R T B CELL 1 1 L R T B
UNIX> maztotxt < 02x02.maz _ ___ | | | |__ | UNIX> maztotxt < 05x04.maz _________ ___ | | | | |__ | |___|__ | | | __| |___|__ | UNIX>Try it out.
Then you pick a random wall that separates two cells. If these cells belong to a different set, then there is no path between them. If you remove the wall, you create a path between them. You do so, and then perform a union on the two sets. If the two cells already belong to the same set, then you don't remove the wall, since there is already a path between them.
You continue picking random walls and deleting them if you can, until you have deleting r*c-1 walls. At that point, all cells belong to the same set, and therefore, there is one (and only one) path between every pair of cells in the maze. Cool, no?
Now, write a program that does this. First, it sets up the grid of cells and the disjoint sets. Then it iterates choosing a random wall and deleting it if possible, until it has a maze. When it's done, it prints out the maze if desired, and prints out the maximum and average height of the disjoint set data structure.
This program should be Maze1.cpp and has the following command line arguments:
Maze1 rows columns seed print-mazeObviously, rows, columns and seed are integers. Print-maze is "yes" or "no", and directs whether to print the maze, or to simply print the max and average height of the disjoint set data structure.
The wall-choosing algorithm for Maze1.cpp should work as follows:
UNIX> Maze1-print 3 3 9 yes # Trying row 1 column 2 BOTTOM : Remove wall: 5,8 # Trying row 1 column 1 RIGHT : Remove wall: 4,8 # Trying row 1 column 1 BOTTOM : Remove wall: 8,7 # Trying row 0 column 2 TOP : External wall -- not removing # Trying row 0 column 2 TOP : External wall -- not removing # Trying row 2 column 0 BOTTOM : External wall -- not removing # Trying row 1 column 0 TOP : Remove wall: 3,0 # Trying row 0 column 2 LEFT : Remove wall: 2,1 # Trying row 0 column 2 BOTTOM : Remove wall: 1,7 # Trying row 1 column 1 LEFT : Remove wall: 7,0 # Trying row 0 column 0 LEFT : External wall -- not removing # Trying row 0 column 2 BOTTOM : No wall # Trying row 2 column 1 RIGHT : Keep wall: 0 # Trying row 2 column 0 TOP : Remove wall: 6,0 MAZE 3 X 3 CELL 0 0 L R CELL 0 1 T B L CELL 0 2 T R CELL 1 0 L CELL 1 1 T CELL 1 2 R CELL 2 0 B L R CELL 2 1 B L R CELL 2 2 L R # Max Height: 4 # Avg Height: 2.44 UNIX>This makes the following maze:
_ _____ | |__ | | | |_|_| |You'll note that it creates this maze by removing 7 walls to get to this maze:
_ _____ | |__ | |__ | |_|_| |It then tries to remove wall (0,0,left), which it can't, since that's an external wall. Next it tries to remove (0,2,bottom), which it can't because that wall has already been removed. Next, it tries to remove wall (2,1,right), which it can't because cells (2,1) and (2,2) already belong to the same set (i.e you can already get from cell (2,1) to (2,2). Finally, it removes wall (2,0,top) to get the final maze.
The makefile will compile Maze1.cpp with each of the disjoint set implementations. I got the following timings of the various versions of Maze1 on castor3:
MazeBase1 200 200 1 no # Max Height: 1820 # Avg Height: 1535.56 Time: 1:16 MazeSize1 200 200 1 no # Max Height: 10 # Avg Height: 4.57 Time: 0:02 MazeHeight1 200 200 1 no # Max Height: 10 # Avg Height: 4.90 Time: 0:02 MazeComp1 200 200 1 no # Max Height: 5 # Avg Height: 2.40 Time: 0:02Note what a difference the disjoint set implementation makes.
In Maze2.cpp, what you'll do is make a array of deletable walls. This array will start with (r-1)*c + r*(c-1) walls (there are (r-1)*c top/bottom interior walls, and r*(c-1) left-right interior walls). Then you choose a random wall by choosing a random element from this array (use lrand48()%(array_size)). Then you delete the element from the array by swapping it with the last element in the array and decrementing the array's size. Then you test it to see if you can remove the wall, and remove it if you can. You do this until you are done with the maze.
I created the array by traversing all the cells, and for each cell, putting that cell's top wall in the array, and then putting that cell's left wall in the array. Thus, for a 2x2 maze, my array of walls would be:
[ (0,1,left), (1,0,top), (1,1,top), (1,1,left) ]Once again, I have a version of Maze2 called Maze2-print that prints out what it's doing. Try it out. Note, Maze2 and Maze1 produce different mazes for the same seeds. This is because they pick different walls to remove.
Again, the makefile compiles Maze2.cpp with all of the disjoint set implementations. Here are timings, again on castor3:
MazeBase2 200 200 1 no # Max Height: 1866 # Avg Height: 1570.77 Time: 0:43 MazeSize2 200 200 1 no # Max Height: 10 # Avg Height: 4.51 Time: 0:01 MazeHeight2 200 200 1 no # Max Height: 9 # Avg Height: 4.76 Time: 0:01 MazeComp2 200 200 1 no # Max Height: 6 # Avg Height: 2.99 Time: 0:01Note that it is indeed faster than Maze1.cpp.
Also, my .o files for DisjSetsxx and Maze1/Maze2 are also in this directory.