Disjoint Sets

James S. Plank - CS140


Here is the API for disjoint sets, in disjoint.h: The source code is in disjoint.c.

#ifndef _DISJOINT_
#define _DISJOINT_

typedef struct {
  int *links;
  int *sizes;
  int *ranks;
  int maxindex;
  int nsets;
} DisjointSet;
  
extern DisjointSet *new_disjoint_set(int maxindex);
extern void free_disjoint_set(DisjointSet *dj);
extern void disjoint_makeset(DisjointSet *dj, int index);
extern int disjoint_union(DisjointSet *dj, int s1, int s2);
extern int disjoint_find(DisjointSet *dj, int index);

#endif

New_disjoint_set(maxindex) allocates a new DisjointSet, along with three arrays of maxindex integers: links sizes and ranks. Maxindex is set to the parameter, and nsets is set to zero. A "set" is a number between 0 and maxindex-1. You can create up to maxindex of these by calling disjoint_makeset. Each time you call this, nsets is incremented, sizes[index] and ranks[index] are set to one, and links[index] is be set to -1.

Disjoint_union() takes two sets (not two elements, but two sets -- i.e. two elements whose links[] entry is -1) and performs a union on them. Disjoint_find() returns the root node of an element's set. This is the element whose links[] entry is -1. All elements in a set will return the same value of disjoint_find()

As an example, consider example.c:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include "disjoint.h"


main()
{
   DisjointSet *dj;
   int i;

   dj = new_disjoint_set(10);
   for (i = 0; i < 10; i++) {
     disjoint_makeset(dj, i);
   }
   for (i = 0; i < 10; i++) {
     printf("Find(%d) is %d\n", i, disjoint_find(dj, i));
   }

   printf("\n");
   for (i = 2; i < 10; i+= 2) {
     disjoint_union(dj, disjoint_find(dj, 0), disjoint_find(dj, i));
   }
   for (i = 0; i < 10; i++) {
     printf("Find(%d) is %d of size %d\n", i, disjoint_find(dj, i), dj->sizes[disjoint_find(dj, i)]);
   }

}

This first sets up 10 sets -- the elements 0 through 9, and then calls union on each even number with 0, so that all even numbers are in the same set. When it runs, you see that all the even numbers are in the same set:

UNIX> example
Find(0) is 0
Find(1) is 1
Find(2) is 2
Find(3) is 3
Find(4) is 4
Find(5) is 5
Find(6) is 6
Find(7) is 7
Find(8) is 8
Find(9) is 9

Find(0) is 0 of size 5
Find(1) is 1 of size 1
Find(2) is 0 of size 5
Find(3) is 3 of size 1
Find(4) is 0 of size 5
Find(5) is 5 of size 1
Find(6) is 0 of size 5
Find(7) is 7 of size 1
Find(8) is 0 of size 5
Find(9) is 9 of size 1
UNIX> 
The program example2.c changes the order of the union calls to union(8, 6), union(6, 4), union(4, 2), and union(2, 0):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include "disjoint.h"

main()
{
   DisjointSet *dj;
   int i;

   dj = new_disjoint_set(10);
   for (i = 0; i < 10; i++) {
     disjoint_makeset(dj, i);
   }
   for (i = 0; i < 10; i++) {
     printf("Find(%d) is %d\n", i, disjoint_find(dj, i));
   }

   printf("\n");
   for (i = 8; i > 0; i -= 2) {
     disjoint_union(dj, disjoint_find(dj, i), disjoint_find(dj, i-2));
   }
   for (i = 0; i < 10; i++) {
     printf("Find(%d) is %d of size %d\n", i, disjoint_find(dj, i), dj->sizes[disjoint_find(dj, i)]);
   }

}

You'll note that the output is similar -- although the return value of the find() call is different, what is important is that it is the same for all even numbers, since they are all in the same set.

UNIX> example2
Find(0) is 0
Find(1) is 1
Find(2) is 2
Find(3) is 3
Find(4) is 4
Find(5) is 5
Find(6) is 6
Find(7) is 7
Find(8) is 8
Find(9) is 9

Find(0) is 8 of size 5
Find(1) is 1 of size 1
Find(2) is 8 of size 5
Find(3) is 3 of size 1
Find(4) is 8 of size 5
Find(5) is 5 of size 1
Find(6) is 8 of size 5
Find(7) is 7 of size 1
Find(8) is 8 of size 5
Find(9) is 9 of size 1
UNIX> 

The Maze Program

I don't have too much time to write this one up, but the maze program that we wrote in class is in maze.c. A maze is represented with the following struct:

typedef struct {
  int r;
  int c;
  int pixwidth;
  int *hwalls;
  int *vwalls;
} Maze;

There are r rows of cells and c columns. As such, there are r+1 rows of c horizontal walls between cells (including the outside border), and r rows of c+1 vertical walls between cells. We denote them with two integer arrays:

Creating the maze is pretty simple -- start with each of the r*c cells in its own set, and then randomly choose two adjacent elements. If they are in different sets, then perform a union on them and remove the wall separating them. You are done when all cells are in the same set. This is done with the following code:

  ds = new_disjoint_set(m->r * m->c);
  for (i = 0; i < m->r*m->c; i++) disjoint_makeset(ds, i);

  srand48(time(0));

  while (ds->nsets > 1) {
    if (lrand48()%2) {
      e1 = lrand48()%((m->r-1)*m->c);
      e2 = e1 + m->c;
      s1 = disjoint_find(ds, e1);
      s2 = disjoint_find(ds, e2);
      if (s1 != s2) {
        disjoint_union(ds, s1, s2);
        m->hwalls[e2] = 0;
      }
    } else {
      i = lrand48()%m->r;
      j = lrand48()%(m->c-1);
      e1 = i*m->c+j;
      e2 = e1+1;
      s1 = disjoint_find(ds, e1);
      s2 = disjoint_find(ds, e2);
      if (s1 != s2) {
        disjoint_union(ds, s1, s2);
        m->vwalls[i*(m->c+1)+j+1] = 0;
      }
    }
  }

Printing the maze to a pgm file was a bit of in-class performance art. We create a (r+1)*(c+1)*pixwidth*pixwidth array of pixels and set them all to white (255). Each row of pixels has (c+1)*pixwidth pixels. Every cell is represented by a pixwidth * pixwidth square of pixels. The cell in maze row i and column j has its upper left-hand corner starting with the pixel in the pixel map's row i*pixwidth + (pixwidth/2+1) and column j*pixwidth + (pixwidth/2+1). Since we use a two-dimensional array, that means it is in pixmap[(i*pixwidth + (pixwidth/2+1) * (c+1) * pixwidth + j*pixwidth + (pixwidth/2+1)]. To set walls, we find that corner and either set pixwidth pixels to the right, or down. Here's the code.

void print_maze(Maze *m)
{
  int i, j, k;
  int *pixmap;
  int r1, c1, si, pw;

  r1 = m->r+1;
  c1 = m->c+1;
  pw = m->pixwidth;
  pixmap = (int *) malloc(sizeof(int) * r1 * c1 * m->pixwidth * m->pixwidth);
  if (pixmap == NULL) { perror("malloc"); exit(1); }
  for (i = 0; i < r1 * c1 *m->pixwidth * m->pixwidth; i++) pixmap[i] = 255;
  for (i = 0; i < r1; i++) {
    for (j = 0; j < m->c; j++) {
      if (m->hwalls[i*m->c+j]) { 
         si = (i * c1 * pw * pw) + pw * c1 * (pw/2+1) + j * pw + pw/2+1;
         for (k = 0; k < pw; k++) pixmap[si+k] = 0;
      } 
    }
  }
  for (i = 0; i < m->r; i++) {
    for (j = 0; j < c1; j++) {
      if (m->vwalls[i*c1+j]) { 
        si = (i * c1 * pw * pw) + pw * c1 * (pw/2+1) + j * pw + pw/2+1;
        for (k = 0; k < pw; k++) pixmap[si+k*c1*pw] = 0;
      } 
    }
  }
  printf("P2\n%d %d\n255\n", c1 * m->pixwidth, r1 * m->pixwidth);
  for (i = 0; i < r1 * c1 *m->pixwidth * m->pixwidth; i++) printf("%d\n", pixmap[i]);
  exit(0);
}

It's pretty neat -- here's the result of:

UNIX> maze 100 100 5 > m.pgm
UNIX> convert m.pgm m.jpg