#include <vector>
#include <iostream>
using namespace std;
class Disjoint {
public:
Disjoint(int nelements);
int Union(int s1, int s2);
int Find(int element);
void Print();
protected:
vector <int> links;
vector <int> ranks;
};
|
The links data structure holds the parent pointers for each element. If links[e] is equal to negative one, then e is the root and set id of the set. If links[e] does not equal -1, then the set id of e is equal to the set id of links[e].
The ranks vector holds additional information:
In all cases, if e is not the root of a set, ranks[e] is immaterial.
I have three implementations:
The constructor sets up the two vectors. Each element is in its own set, so all links are -1 and all ranks are 1.
Disjoint::Disjoint(int nelements)
{
links.resize(nelements, -1);
ranks.resize(nelements, 1);
}
|
The Find(e) operator chases link[e] until it equals -1:
int Disjoint::Find(int element)
{
while (links[element] != -1) element = links[element];
return element;
}
|
And the Union(s1, s2) operator first checks to make sure that the set id's are valid, and then chooses a parent and a child from s1 and s2. The parent will be the one with the bigger of the two sets. It changes the link field of the child to point to the parent, and then it updates the size of the parent in the ranks vector:
int Disjoint::Union(int s1, int s2)
{
int p, c;
if (links[s1] != -1 || links[s2] != -1) {
cerr << "Must call union on a set, and not just an element.\n";
exit(1);
}
if (ranks[s1] > ranks[s2]) {
p = s1;
c = s2;
} else {
p = s2;
c = s1;
}
links[c] = p;
ranks[p] += ranks[c]; /* HERE */
return p;
}
|
I won't show Print(): it simply prints out the vectors.
The only difference between union-by-size and union-by-height is that ranks keeps track of the number of nodes in the longest path. It is a one line change to union-by-size -- the line marked HERE is changed to: DJ-height.cpp if (ranks[s1] == ranks[s2]) ranks[p]++;
This is because a set's height only changes if the two sets being merged have equal heights.
Finally, union-by-rank is equivalent to union-by-height, except that you perform path compression on find operations. With path compression, each time you perform a Find(e) operation, you update the links field of all elements on the path to the root, so that they equal the root. I do this with a vector that holds all the non-root elements in the path:
int Disjoint::Find(int element)
{
vector <int> q;
int i;
while (links[element] != -1) {
q.push_back(element);
element = links[element];
}
for (i = 0; i < q.size(); i++) links[q[i]] = element;
return element;
}
|
This is one of those convenient things about the STL -- I don't have to call new or delete. When the Find() operation is over, the vector is deallocated.
I could implement path compression in two other ways. The first is with simple recursion:
int Disjoint::Find(int element)
{
if (links[element] == -1) return element;
links[element] = Find(links[element]);
return links[element];
}
|
The second is to traverse links to the root, but while doing so, setting links[element] to be element's child. In that way, once you find the root, you can use links to go back to the original element, performing path compression along the way. The code is here -- if you're a little leery of this code, copy it to your directory and put in some print statements. This should be the best implementation performance-wise, because it doesn't use extra memory like the other two.
int Disjoint::Find(int e)
{
int p, c; // P is the parent, c is the child.
c = -1;
while (links[e] != -1) {
p = links[e];
links[e] = c;
c = e;
e = p;
}
p = e;
e = c;
while (e != -1) {
c = links[e];
links[e] = p;
e =c;
}
return p;
}
|
UNIX> make g++ -c -O dj-ex1.cpp g++ -c -O DJ-size.cpp g++ -O -o dj-ex1-size dj-ex1.o DJ-size.o g++ -c -O DJ-height.cpp g++ -O -o dj-ex1-height dj-ex1.o DJ-height.o g++ -c -O DJ-rank.cpp g++ -O -o dj-ex1-rank dj-ex1.o DJ-rank.o UNIX>We first run it with union-by-size. Let's look at the output incrementally. When the program starts, it sets up an empty Disjoint with ten elements:
UNIX> dj-ex1-size Starting State: Elts: 0 1 2 3 4 5 6 7 8 9 Links: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Ranks: 1 1 1 1 1 1 1 1 1 1 |
![]() |
Next, it performs three union operations: Union(0, 1), Union(2, 3), and Union(4, 5). Since each set in all three operations is the same size, the choice of parent and child is arbitrary. Here's the output and how it looks pictorally (I've added the sizes to the roots of each set):
Doing d.Union(0, 1). Resulting set = 1 Doing d.Union(2, 3). Resulting set = 3 Doing d.Union(4, 5). Resulting set = 5 Elts: 0 1 2 3 4 5 6 7 8 9 Links: 1 -1 3 -1 5 -1 -1 -1 -1 -1 Ranks: 1 2 1 2 1 2 1 1 1 1 |
![]() |
Next it performs four more union operations: Union(1, 3), Union(5, 6), Union(5, 7), and Union(5, 8). The first union operation merges two sets of the same size, so the parent/child selection is arbitrary. The remaining three union operations merge sets of size 1 (sets 6, 7 and 8) with set 5 which is larger. Thus, in each case, set 5 becomes the parent. The resulting sets are pictured to the right.
The Find() operations return the root of each set -- three in the set {0, 1, 2, 3}, and five in the set {4, 5, 6, 7, 8}.
You should make sure that you understand how the output of the program maps to the picture. In particular, make sure you understand the Links and Ranks lines and what they mean.
Doing d.Union(1, 3). Resulting set = 3 Doing d.Union(5, 6). Resulting set = 5 Doing d.Union(5, 7). Resulting set = 5 Doing d.Union(5, 8). Resulting set = 5 d.Find(1) = 3 d.Find(2) = 3 d.Find(4) = 5 d.Find(7) = 5 Elts: 0 1 2 3 4 5 6 7 8 9 Links: 1 3 3 -1 5 -1 5 5 5 -1 Ranks: 1 2 1 4 1 5 1 1 1 1 |
![]() |
Now, we perform Union(3, 5). Since set 5 has more elements than set 3, it is the parent and 3 is the child. Subsequent Find() operations on 3, 5, 7 and 0 all return 5 as the set id:
Doing d.Union(3, 5). Resulting set = 5 Elts: 0 1 2 3 4 5 6 7 8 9 Links: 1 3 3 5 5 -1 5 5 5 -1 Ranks: 1 2 1 4 1 9 1 1 1 1 d.Find(3) = 5 d.Find(5) = 5 d.Find(7) = 5 Elts: 0 1 2 3 4 5 6 7 8 9 Links: 1 3 3 5 5 -1 5 5 5 -1 Ranks: 1 2 1 4 1 9 1 1 1 1 d.Find(0) = 5 Elts: 0 1 2 3 4 5 6 7 8 9 Links: 1 3 3 5 5 -1 5 5 5 -1 Ranks: 1 2 1 4 1 9 1 1 1 1 UNIX> |
![]() |
UNIX> dj-ex1-height Starting State: Elts: 0 1 2 3 4 5 6 7 8 9 Links: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Ranks: 1 1 1 1 1 1 1 1 1 1 |
![]() |
Doing d.Union(0, 1). Resulting set = 1 Doing d.Union(2, 3). Resulting set = 3 Doing d.Union(4, 5). Resulting set = 5 Elts: 0 1 2 3 4 5 6 7 8 9 Links: 1 -1 3 -1 5 -1 -1 -1 -1 -1 Ranks: 1 2 1 2 1 2 1 1 1 1 |
![]() |
Doing d.Union(1, 3). Resulting set = 3 Doing d.Union(5, 6). Resulting set = 5 Doing d.Union(5, 7). Resulting set = 5 Doing d.Union(5, 8). Resulting set = 5 d.Find(1) = 3 d.Find(2) = 3 d.Find(4) = 5 d.Find(7) = 5 Elts: 0 1 2 3 4 5 6 7 8 9 Links: 1 3 3 -1 5 -1 5 5 5 -1 Ranks: 1 2 1 3 1 2 1 1 1 1 |
![]() |
Although the trees look the same, the ranks fields are different, now holding heights rather than sizes. So, when we perform the last union of 3 and 5, 3 becomes the parent, since it has greater height. Subsequent Find() operations all return 3 now:
Doing d.Union(3, 5). Resulting set = 3 Elts: 0 1 2 3 4 5 6 7 8 9 Links: 1 3 3 -1 5 3 5 5 5 -1 Ranks: 1 2 1 3 1 2 1 1 1 1 d.Find(3) = 3 d.Find(5) = 3 d.Find(7) = 3 Elts: 0 1 2 3 4 5 6 7 8 9 Links: 1 3 3 -1 5 3 5 5 5 -1 Ranks: 1 2 1 3 1 2 1 1 1 1 d.Find(0) = 3 Elts: 0 1 2 3 4 5 6 7 8 9 Links: 1 3 3 -1 5 3 5 5 5 -1 Ranks: 1 2 1 3 1 2 1 1 1 1 UNIX> |
![]() |
UNIX> dj-ex1-rank .... .... Doing d.Union(3, 5). Resulting set = 3 Elts: 0 1 2 3 4 5 6 7 8 9 Links: 1 3 3 -1 5 3 5 5 5 -1 Ranks: 1 2 1 3 1 2 1 1 1 1 |
![]() |
When we perform the three Find() operations, the last one -- Find(7) performs path compression, setting node 7's link to the root of the set: 3:
d.Find(3) = 3 d.Find(5) = 3 d.Find(7) = 3 Elts: 0 1 2 3 4 5 6 7 8 9 Links: 1 3 3 -1 5 3 5 3 5 -1 Ranks: 1 2 1 3 1 2 1 1 1 1 |
![]() |
Similarly, the last Find(0) operation also performs path compression:
d.Find(0) = 3 Elts: 0 1 2 3 4 5 6 7 8 9 Links: 3 3 3 -1 5 3 5 3 5 -1 Ranks: 1 2 1 3 1 2 1 1 1 1 |
![]() |
Were we to call Find(4) Find(6) and Find(8), then those nodes too would perform path compression and point directly to node three. In that case, the state would be the following:
d.Find(4) = 3 d.Find(6) = 3 d.Find(8) = 3 Elts: 0 1 2 3 4 5 6 7 8 9 Links: 3 3 3 -1 3 3 3 3 3 -1 Ranks: 1 2 1 3 1 2 1 1 1 1 |
![]() |
I draw this picture because you should see that ranks[3] remains at three, even though its height is two. This is because the ranks field traces what the height of the tree would be with no path compression. We can't keep it updated properly without adding to the running time of the Union() or Find() operations. Fortunately, it doesn't matter -- the fine theoreticians of the world have proved that Find() operations run in O(α(n)) time. Union() operations are still O(1).
![]() |
A good maze is one where the graph is fully connected, so that every cell is reachable from the start/end cells, but there are no cycles. We can generate such a maze using disjoint sets. We start with a completely disconnected graph, where each cell is surrounded by walls. If this graph has r rows and c columns, then the graph contains r*c nodes and no edges.
What we'll do is choose a random wall to remove. If that wall separates nodes in different connected components, then we'll remove it, thereby lowering the number of connected components. If it doesn't separate nodes in different connected components, we keep it.
This can be done with disjoint sets. We start with each cell in its own set, and then we choose a random wall. If that wall connects two nodes in different sets, we remove the wall and call Union() on the two sets. Otherwise, we keep the wall. We keep doing this until we have just one set.
The code is in maze-gen.cpp. It's a little tricky. We first generate all the walls. Walls that separate vertically adjacent cells are indexed by the smaller cell number. Walls that separate horizontally adjacent cells are indexed by the smaller cell number plus r*c. We generate all the walls and insert them into a multiset keyed by a random number. Then we traverse the multiset, deleting walls if they separate different components, until we have just one component. Then we print out the walls:
#include <vector>
#include <cstdlib>
#include <map>
#include "DJ.h"
#include <iostream>
using namespace std;
typedef multimap <double, int> DIMap;
typedef DIMap::iterator DIMit;
main(int argc, char **argv)
{
int r, c, row, column, c1, c2, ncomp, s1, s2, hov;
Disjoint *d;
DIMap walls;
DIMit wit;
DIMit tmp;
if (argc != 3) { fprintf(stderr, "Bad dog\n"); exit(1); }
r = atoi(argv[1]);
c = atoi(argv[2]);
d = new Disjoint(r*c);
for (row = 0; row < r-1; row++) { // Generate walls that separate vertical cells.
for (column = 0; column < c; column++) {
c1 = row*c + column;
walls.insert(make_pair(drand48(), c1));
}
}
for (row = 0; row < r; row++) { // Generate walls that separate horizontal cells.
for (column = 0; column < c-1; column++) {
c1 = (row*c + column) + r*c;
walls.insert(make_pair(drand48(), c1));
}
}
ncomp = r*c;
wit = walls.begin();
while (ncomp > 1) {
c1 = wit->second;
if (c1 < r*c) { // This is a wall separating vertical cells
c2 = c1 + c;
} else { // This is a wall separating horizontal cells
c1 -= r*c;
c2 = c1+1;
}
s1 = d->Find(c1);
s2 = d->Find(c2);
if (s1 != s2) { // Test for different connected components.
d->Union(s1, s2);
tmp = wit;
wit++;
walls.erase(tmp);
ncomp--;
} else {
wit++;
}
}
printf("ROWS %d COLS %d\n", r, c);
for (wit = walls.begin(); wit != walls.end(); wit++) {
c1 = wit->second;
if (c1 < r*c) {
c2 = c1 + c;
} else {
c1 -= r*c;
c2 = c1+1;
}
printf("WALL %d %d\n", c1, c2);
}
}
|
We can run this and pipe the output to the program maze_ppm (from a CS302 lab that you may not have done yet), and that lets us generate mazes of all sizes:
UNIX> maze-gen-size 50 100 | maze_ppm 5 | convert - maze2.jpg ![]() |