CS140 Lecture notes -- Unions

Jim Plank (with some modifications by Brad Vander Zanden)

Unions

Unions are a modification of structs that let you use the same field for different types. This is only useful at very specific times, one of those being when you want to have a field in your struct that can be anything -- int, float, double, pointer, etc.

In this class, you should not attempt to use unions in your code unless you are asked to do so. However, some of my code will have unions, so you'll have to know how to deal with them.

A union is very simple -- it looks just like a struct:

  typedef union {
    int i;
    float f;
    double d;
  } Ifd;

However, in a struct, you can use any and all of those fields. In a union, you can only use one at a time. Think if it as an either-or kind of thing. If I declare an Ifd above, then I can use it as an int, a float or a double, but I can't interchange them.

Here is a trivial example, using the above struct (this is in union1.c):

#include < stdio.h >

typedef union {
  int i;
  float f;
  double d;
} Ifd;

main()
{
  Ifd ifd;
  int i;
  float f;

  ifd.i = 1;
  i = ifd.i;
  printf("%d %d\n", i, ifd.i);

  ifd.f = 5.55;
  f = ifd.f;
  printf("%f %f\n", ifd.f, f);

  printf("%d\n", ifd.i);
}

So, when we use ifd.i, this means we can use ifd as an integer. When we use ifd.f, we can use ifd as a floating point number. However, unlike a struct, we can't use both ifd.i and ifd.f at the same time. This is because ifd is not an aggregate data type with three fields -- it is a type that holds one value, which can be any of three types -- integer, float or double.

So, when we look at the above code, the first three lines say:

Treat ifd as an integer, and set its value to 1.
Set i to ifd. This is fine, since we're using ifd as an integer. I will get the value 1.
Print out ifd and i. We expect it to print out "1 1".

The next three lines say:

Treat ifd as a float, and set its value to 5.55. Note that once we do this, we can't use ifd.i, because we are now using ifd as a float.
Set f to ifd. This is fine, since we're using ifd as a float now. F will get the value 5.55.
Print out ifd and f. We expect it to print out "5.550000 5.550000".

Now, the last line is problemmatic, and a good example of where people get tripped up with unions. Here we print out ifd.i. The problem is that we're currently using ifd as a float -- therefore, the value of ifd.i is undetermined. We might expect the C compiler to automatically cast ifd.f to an integer, but it doesn't -- when you use a union incorrectly, as in this last line, you can get anything.

Here's the output -- note the last line -- indeed, using ifd.i when you have set ifd.f gives you something that is not very useful:

UNIX> union1
1 1
5.550000 5.550000
1085381018
UNIX>

When do we use unions?

We use unions when we want to have our code work on any type, but we don't know what that type will be until run time. Again, we'll see useful examples of this later in class. Here is a slightly contrived one. Suppose we want to write code that reads in five items into an array, and then prints them out. The problem is that we don't know what an item will be -- it could be an int, a float, or a one-word character string. When we read an item, we'll make the user specify what kind of item it is. For example, here's some valid input:

int 3
float -4.33
string Jim
int -67
float 40000.1

Now, we could structure this code in one of two ways. First, lets write it without using unions. We can have a struct for each item, and this struct has four fields: type, i, f and s. The type field is a character that identifies the type as either 'i' for integer, 'f' for float, and 's' for string. Then if type is 'i', then the i field contains the value. If type is 'f', then the f field contains the value. If type is 's', then the s field contains the value.

Here's the code (this is in union2.c):

#include < stdio.h >
#include < string.h >
#include "fields.h"

typedef struct {
  char type;
  int i;
  float f;
  char *s;
} Item;

main()
{
  Item array[5];
  int i;
  IS is;

  is = new_inputstruct(NULL);

  /* Read in the items -- if "int", read it into array[i].i
                          if "float", read it into array[i].f
                          if "string", read it into array[i].s */

  for (i = 0; i < 5; i++) {
    if (get_line(is) != 2) exit(1); 

    if (strcmp(is->fields[0], "int") == 0) {
      array[i].type = 'i';
      if (sscanf(is->fields[1], "%d", &(array[i].i)) != 1) exit(1); 

    } else if (strcmp(is->fields[0], "float") == 0) {
      array[i].type = 'f';
      if (sscanf(is->fields[1], "%f", &(array[i].f)) != 1) exit(1); 

    } else if (strcmp(is->fields[0], "string") == 0) {
      array[i].type = 's';
      array[i].s = strdup(is->fields[1]);

    } else {
      exit(1);
    }
  }

  /* Write out the items. */

  for (i = 0; i < 5; i++) {
    printf("Item %d: Type %c -- ", i, array[i].type);
    if (array[i].type == 'i') {
      printf("Value: %d\n", array[i].i);
    } else if (array[i].type == 'f') {
      printf("Value: %f\n", array[i].f);
    } else if (array[i].type == 's') {
      printf("Value: %s\n", array[i].s);
    } else {
      exit(1);
    }
  }
 
  /* Print the size of the item struct */

  printf("\n");
  printf("Sizeof(Item): %d\n", sizeof(Item));
}

Now, when you give it the example input above, you get: (I've bold-faced the input):

UNIX> union2
int 3
float -4.33
string Jim
int -67
float 40000.1
Item 0: Type i -- Value: 3
Item 1: Type f -- Value: -4.330000
Item 2: Type s -- Value: Jim
Item 3: Type i -- Value: -67
Item 4: Type f -- Value: 40000.101562

Sizeof(Item): 16
UNIX>

A little yucky, but certainly code that you all are capable of writing. One of the problems with this code is how wasteful it is. Every item contains two fields (8 bytes) that it does not use. Of course, in this example, that leads to a whopping 40 bytes, so no, it's not a problem here. But if we had a million items, then it would be 8 megabytes worth of wasted space.

The solution is to use a union for the value of the item. The code is in union3.c. Note how similar it is to union2.c:

#include < stdio.h >
#include < string.h >
#include "fields.h"

typedef struct {
  char type;
  union {
    int i;
    float f;
    char *s;
  } value;
} Item;

main()
{
  Item array[5];
  int i;
  IS is;

  is = new_inputstruct(NULL);

  /* Read in the items -- if "int", read it into array[i].i
                          if "float", read it into array[i].f
                          if "string", read it into array[i].s */

  for (i = 0; i < 5; i++) {
    if (get_line(is) != 2) exit(1); 

    if (strcmp(is->fields[0], "int") == 0) {
      array[i].type = 'i';
      if (sscanf(is->fields[1], "%d", &(array[i].value.i)) != 1) exit(1); 

    } else if (strcmp(is->fields[0], "float") == 0) {
      array[i].type = 'f';
      if (sscanf(is->fields[1], "%f", &(array[i].value.f)) != 1) exit(1); 

    } else if (strcmp(is->fields[0], "string") == 0) {
      array[i].type = 's';
      array[i].value.s = strdup(is->fields[1]);

    } else {
      exit(1);
    }
  }

  /* Write out the items. */

  for (i = 0; i < 5; i++) {
    printf("Item %d: Type %c -- ", i, array[i].type);
    if (array[i].type == 'i') {
      printf("Value: %d\n", array[i].value.i);
    } else if (array[i].type == 'f') {
      printf("Value: %f\n", array[i].value.f);
    } else if (array[i].type == 's') {
      printf("Value: %s\n", array[i].value.s);
    } else {
      exit(1);
    }
  }
 
  /* Print the size of the item struct */

  printf("\n");
  printf("Sizeof(Item): %d\n", sizeof(Item));
}

The output is the same, only now you'll notice that the size of an item is only 8 bytes:

UNIX> union3
int 3
float -4.33
string Jim
int -67
float 40000.1
Item 0: Type i -- Value: 3
Item 1: Type f -- Value: -4.330000
Item 2: Type s -- Value: Jim
Item 3: Type i -- Value: -67
Item 4: Type f -- Value: 40000.101562

Sizeof(Item): 8
UNIX>

This is because the union only allocates enough space for its biggest field -- here they are all 4 bytes, so the union is just four bytes.

Also note that 40000.1 has not been exactly duplicated in the output. The reason is that fractions such as .1 have to be approximated using fractional powers of 2 since numbers are stored as bits. So .1 gets approximated as:

0*1/2¹ + 0*1/2² + 0*1/2³ + 1*1/2⁴ + 1*1/2⁵ + 0*1/2⁶ + 1*1/2⁷

= 1/16 + 1/32 + 1/128 = 0.1015625

At this point the computer ran out of memory in which to store the fraction because it had to use the remaining bits to represent 40000. Note that -4.33 did get faithfully duplicated because the computer could expend more bits on the fraction and so the approximation got close enough to .33.

This example shows why it can be dangerous to use floats rather than doubles. You can only get limited precision with 4 bytes and as the size of the number increases, the accuracy of the fraction will decrease. This loss of precision could be important when doing scientific calculations, dollars and cents calculations, etc.

union4.c is a rewritten version of union3.c that uses a double rather than a float to store the floating point numbers. Here is its output when run with the same input as union3.c:

UNIX> union4
int 3
float -4.33
string Jim
int -67
float 40000.1
Item 0: Type i -- Value: 3
Item 1: Type f -- Value: -4.330000
Item 2: Type s -- Value: Jim
Item 3: Type i -- Value: -67
Item 4: Type f -- Value: 40000.100000

Sizeof(Item): 16
UNIX>

Note that all floating point values are now accurately duplicated but that the size of the struct has also gone from 8 to 16 bytes. There's no free lunch in this world!