CS302 -- Lab 7 - Mazes and Disjoint Sets

Url: http://www.cs.utk.edu/~plank/plank/classes/cs302/302/labs/lab7

Directory: /blugreen/homes/plank/cs302/labs/lab7

Makefile: /blugreen/homes/plank/cs302/labs/lab7/makefile

In this lab, you are going to write 6 C++ modules:

DisjSetsBase.cpp: A basic implementation of disjoint sets.
DisjSetsSize.cpp: An implementation of disjoint sets where you perform the union by making the smaller set the child of the larger set.
DisjSetsHeight.cpp: An implementation of disjoint sets where you perform the union by making the set with the smaller height the child of the larger height.
DisjSetsComp.cpp: An implementation of disjoint sets where you peform the union as in DisjSetsBase.cpp, but perform path compression on the find.
Maze1.cpp: A random maze generator that uses a rather inefficient algorithm for discovering walls to remove.
Maze2.cpp: A random maze generator that uses a better algorithm for discovering walls to remove.

Disjoint Sets

Read the book chapter on Disjoint Sets. You will be implementing these in the same way that the book recommends. Now, look at DisjSets.h:

class DSPrivate;

class DisjSets {
  public:
    DisjSets(int numelements);
    ~DisjSets();
    int find(int x);
    void unionSets(int root1, int root2);
    int maxheight();
    double avgheight();
    void print();
  private:
    DSPrivate *d;
};

The constructor, destructor, find() and unionSets() methods are all straightforward and described in the book. Note that you pass set names (i.e. return values from find()) to unionSets(), and no set elements.

The last three methods help you look at the data structure that implements the disjoint set. Maxheight() returns the maximum height of a set. This is the maximum number of parent pointers that must be chased on a find() operation. Avgheight() returns the average number of parent pointers that must be chased on a find() operation. Finally, print() prints out the array that implements the disjoint set.

The first thing that you should do is implement the DisjSets class using the basic book implmentation. In this implementation, you simply make root2 the parent of root1 in unionSets.

You can test your implementation with the program SimpDisjTest.cpp. This program sets up a disjoint set with 10 elements, and then does 9 union operations, printing out the names of all 10 roots, and calling print() after each find operation. Here is the output when linked with my SimpDisjBase:

UNIX> SimpDisjTestBase
Start:                   0 1 2 3 4 5 6 7 8 9 Array:-1 -1 -1 -1 -1 -1 -1 -1 -1 -1
Max Height: 1    Avg Height: 1.0

Union(find(1), find(2)): 0 2 2 3 4 5 6 7 8 9 Array:-1  2 -1 -1 -1 -1 -1 -1 -1 -1
Union(find(0), find(2)): 2 2 2 3 4 5 6 7 8 9 Array: 2  2 -1 -1 -1 -1 -1 -1 -1 -1
Union(find(0), find(9)): 9 9 9 3 4 5 6 7 8 9 Array: 2  2  9 -1 -1 -1 -1 -1 -1 -1
Union(find(8), find(6)): 9 9 9 3 4 5 6 7 6 9 Array: 2  2  9 -1 -1 -1 -1 -1  6 -1
Union(find(4), find(5)): 9 9 9 3 5 5 6 7 6 9 Array: 2  2  9 -1  5 -1 -1 -1  6 -1
Union(find(4), find(3)): 9 9 9 3 3 3 6 7 6 9 Array: 2  2  9 -1  5  3 -1 -1  6 -1
Union(find(2), find(5)): 3 3 3 3 3 3 6 7 6 3 Array: 2  2  9 -1  5  3 -1 -1  6  3
Union(find(8), find(7)): 3 3 3 3 3 3 7 7 7 3 Array: 2  2  9 -1  5  3  7 -1  6  3
Union(find(2), find(7)): 7 7 7 7 7 7 7 7 7 7 Array: 2  2  9  7  5  3  7 -1  6  3
Max Height: 5    Avg Height: 3.2
UNIX>

As you can see, each unionSets() operation simply makes root2 the parent of root1. For example, the first unionSets() operation makes 2 the parent of 1. This is seen both by the fact that 1's root becomes 2, and that 1's parent pointer in the array becomes 2.

I would recommend that you make sure that your output of this program matches mine exactly before you move on.

Next, implement DisjSetsSize.cpp. The only difference between this and DisjSetsBase.cpp is that the unionSets() operation checks the size of each set, and makes the larger one the parent of the smaller one. In the case of ties, I make root2 the parent of root1. You'll need to keep a second array of sizes as recommended by the book. Test this with SimpDisjTest.cpp and make sure that the output of your SimpDisjTestSize matches mine.

Next, implement DisjSetsHeight.cpp and test SimpDisjTestHeight. Once again, ties in unionSets mean that root2 becomes the parent of root1.

Finally, implement path compression in DisjSetsComp.cpp. Do this by starting with SimpDisjBase.cpp and simply modifying the find() method to do path compression. I didn't use recursion as in the book because it's inefficient. You can use recursion if you want. I don't care. Again, test this with SimpDisjTestComp and make sure that your output matches mine exactly.

Random Mazes

The book has a very nice way of using disjoint sets to generate random mazes. A maze is what you think it is. Our mazes will consist of a rectangular grid of r*c cells that can each have four walls. The upper lefthand cell (row 0, column 0) will have no top wall. The bottom righthand cell (row r-1, column c-1) will have no bottom wall. There will be exactly one simple path from the upper lefthand cell to every other cell in the maze. Obviously, if you want to solve the maze, you do so by finding the path from the upper lefthand cell to the bottom righthand cell.

Here is an example of a 2*2 maze:

_ ___
| | |
|__ |

In this maze:

Cell (0,0) has no top or bottom wall.
Cell (0,1) has no bottom wall.
Cell (1,0) has no top or right wall.
Cell (1,1) has no left or bottom wall.

Here's a 10*30 maze:

_ ___________________________________________________________
| ___ | __|   |   __| |   | __| __|__ |____ __|   __| __|   |
| | |       |__ |_| |__ | |   | |__ ___ _____   |__   |__ |_|
|   |_| | | | |___| | | |___| |__ |_| | | ___ |   |_| |__   |
| | __| |_|__ |   __|___|__ ________|   | __|_| |_| |__ __|_|
| |__ |___| | | |_| __|____ __|__ __| | | |   |_| __| ___ __|
|_| | |__ |   ___ |__ ___ __|__ |____ |_| | |_| |____ |___| |
|     | ____| __| | __|__   ___ | _____ |___| | |__   |__ __|
|_|_| |_| |   |__   |__ | |_|__ __|____   |__ | | __| |   | |
| | ____| | |__ | | | ___ __| | | |   | |   | ____| ____| | |
|_________|___|_|_|_|_|_____|_____|_|___|_|_|_____|___|____ |

Maze Files

I have defined a maze file format that is very simple. It consists of lines of text. These lines can fall into one of four categories:

Blank lines are ignored.
Comment lines. These start with a pound sign (#) and are ignored.
Size specification lines. These are of the form:
```
      MAZE  rows  X cols
```
Obviously, these define how big the maze is. You must have a size specification line before any cell specification lines.
Cell specification lines. These are of the form
```
      CELL row column [LRTB]
```
These specify the walls in the given cell. If a cell is not specified with a cell specification line, then it is assumed to have no walls.

I have the following example maze files:

02x02.maz. The 2*2 example maze above:

MAZE 2 X 2
CELL 0 0 L R
CELL 0 1 T L R
CELL 1 0 B L
CELL 1 1 R

05x04.maz. A 5x4 maze.
10x30.maz. The 10*30 example maze above.
40x39.maz. A 40x30 example maze.

For a maze file to be valid, each internal wall of the maze must be specified in both cells to which it is incident. For example, the right wall of cell (0,0) in the 2*2 maze above is specified in both cell (0,0) and cell (0,1).

You can have maze files that don't satisfy the path criteria of mazes. For example, the following is a valid maze file, even though each cell is boxed in:

MAZE 2 X 2
CELL 0 0 L R T B
CELL 0 1 L R T B
CELL 1 0 L R T B
CELL 1 1 L R T B

Maztotxt

I have written the program maztotxt.cpp to print out mazes from maze files. It takes a maze file on standard input and prints out a nice cheap ASCII representation of the maze:

UNIX> maztotxt < 02x02.maz
_ ___
| | |
|__ |

UNIX> maztotxt < 05x04.maz
_________
  ___ | |
| | |__ |
|___|__ |
| |   __|
|___|__ |

UNIX>

Try it out.

Maze1.cpp

Ok, now as described in the book, you can generate a random maze by starting with a maze where every cell except (0,0) and (r-1, c-1) has all four walls. You then create a disjoint set data structure with each cell as an element in its own set (you number cell (row,col) as row*c + col).

Then you pick a random wall that separates two cells. If these cells belong to a different set, then there is no path between them. If you remove the wall, you create a path between them. You do so, and then perform a union on the two sets. If the two cells already belong to the same set, then you don't remove the wall, since there is already a path between them.

You continue picking random walls and deleting them if you can, until you have deleting r*c-1 walls. At that point, all cells belong to the same set, and therefore, there is one (and only one) path between every pair of cells in the maze. Cool, no?

Now, write a program that does this. First, it sets up the grid of cells and the disjoint sets. Then it iterates choosing a random wall and deleting it if possible, until it has a maze. When it's done, it prints out the maze if desired, and prints out the maximum and average height of the disjoint set data structure.

This program should be Maze1.cpp and has the following command line arguments:

Maze1 rows columns seed print-maze

Obviously, rows, columns and seed are integers. Print-maze is "yes" or "no", and directs whether to print the maze, or to simply print the max and average height of the disjoint set data structure.

The wall-choosing algorithm for Maze1.cpp should work as follows:

Choose a random row (using lrand48()).
Choose a random column (again using lrand48()).
Choose a random wall (using lrand48()%4 -- 0 will be the top wall, 1 the bottom wall, 2 the left wall, and 3 the right wall.
If the wall is an internal wall, has not been deleted already, and separates two cells in different sets, remove it.
Do this until all cells belong to the same set.

Write this and test it. You can see which walls my program tries by trying Maze1-print in the answers directory. For example:

UNIX> Maze1-print 3 3 9 yes
# Trying row 1    column 2    BOTTOM : Remove wall: 5,8
# Trying row 1    column 1    RIGHT  : Remove wall: 4,8
# Trying row 1    column 1    BOTTOM : Remove wall: 8,7
# Trying row 0    column 2    TOP    : External wall -- not removing
# Trying row 0    column 2    TOP    : External wall -- not removing
# Trying row 2    column 0    BOTTOM : External wall -- not removing
# Trying row 1    column 0    TOP    : Remove wall: 3,0
# Trying row 0    column 2    LEFT   : Remove wall: 2,1
# Trying row 0    column 2    BOTTOM : Remove wall: 1,7
# Trying row 1    column 1    LEFT   : Remove wall: 7,0
# Trying row 0    column 0    LEFT   : External wall -- not removing
# Trying row 0    column 2    BOTTOM : No wall
# Trying row 2    column 1    RIGHT  : Keep wall: 0
# Trying row 2    column 0    TOP    : Remove wall: 6,0
MAZE 3 X 3
CELL 0 0 L R
CELL 0 1 T B L
CELL 0 2 T R
CELL 1 0 L
CELL 1 1 T
CELL 1 2 R
CELL 2 0 B L R
CELL 2 1 B L R
CELL 2 2 L R
# Max Height: 4
# Avg Height: 2.44  
UNIX>

This makes the following maze:

_ _____
| |__ |
|     |
|_|_| |

You'll note that it creates this maze by removing 7 walls to get to this maze:

_ _____
| |__ |
|__   |
|_|_| |

It then tries to remove wall (0,0,left), which it can't, since that's an external wall. Next it tries to remove (0,2,bottom), which it can't because that wall has already been removed. Next, it tries to remove wall (2,1,right), which it can't because cells (2,1) and (2,2) already belong to the same set (i.e you can already get from cell (2,1) to (2,2). Finally, it removes wall (2,0,top) to get the final maze.

The makefile will compile Maze1.cpp with each of the disjoint set implementations. I got the following timings of the various versions of Maze1 on castor3:

MazeBase1 200 200 1 no
# Max Height: 1820
# Avg Height: 1535.56
Time: 1:16

MazeSize1 200 200 1 no
# Max Height: 10
# Avg Height: 4.57  
Time: 0:02

MazeHeight1 200 200 1 no
# Max Height: 10
# Avg Height: 4.90  
Time: 0:02

MazeComp1 200 200 1 no
# Max Height: 5
# Avg Height: 2.40  
Time: 0:02

Note what a difference the disjoint set implementation makes.

Maze2.cpp

Maze1.cpp is rather inefficient because it can randomly select bad walls to delete. For example, as the mazes get bigger, it often selects external walls to delete, or walls that have already been deleted. You will fix this problem in Maze2.cpp.

In Maze2.cpp, what you'll do is make a array of deletable walls. This array will start with (r-1)*c + r*(c-1) walls (there are (r-1)*c top/bottom interior walls, and r*(c-1) left-right interior walls). Then you choose a random wall by choosing a random element from this array (use lrand48()%(array_size)). Then you delete the element from the array by swapping it with the last element in the array and decrementing the array's size. Then you test it to see if you can remove the wall, and remove it if you can. You do this until you are done with the maze.

I created the array by traversing all the cells, and for each cell, putting that cell's top wall in the array, and then putting that cell's left wall in the array. Thus, for a 2x2 maze, my array of walls would be:

[ (0,1,left), (1,0,top), (1,1,top), (1,1,left) ]

Once again, I have a version of Maze2 called Maze2-print that prints out what it's doing. Try it out. Note, Maze2 and Maze1 produce different mazes for the same seeds. This is because they pick different walls to remove.

Again, the makefile compiles Maze2.cpp with all of the disjoint set implementations. Here are timings, again on castor3:

MazeBase2 200 200 1 no
# Max Height: 1866
# Avg Height: 1570.77
Time: 0:43

MazeSize2 200 200 1 no
# Max Height: 10
# Avg Height: 4.51  
Time: 0:01

MazeHeight2 200 200 1 no
# Max Height: 9
# Avg Height: 4.76  
Time: 0:01

MazeComp2 200 200 1 no
# Max Height: 6
# Avg Height: 2.99  
Time: 0:01

Note that it is indeed faster than Maze1.cpp.

To summarize

You are to write and hand in the following C++ files:

DisjSetsBase.cpp
DisjSetsSize.cpp
DisjSetsHeight.cpp
DisjSetsComp.cpp
Maze1.cpp
Maze2.cpp

You should make these work exactly like mine. All of my executables are in the directory /blugreen/homes/plank/cs302/labs/answers/lab7.

Also, my .o files for DisjSetsxx and Maze1/Maze2 are also in this directory.