CS360 -- Lab 2


Before we begin, learn about getchar() and fread()

Getchar() and fread() are two library calls from the C standard I/O library. You will be using them in this lab, so let me explain them to you.

The simplest is getchar(). It reads one byte from standard input and returns it as a signed integer. If there are no bytes in standard input, it returns the value EOF, which is typically defined in stdio.h to equal -1. Thus, when it reads a character, it will return a value between 0 and 255, and when it fails, it returns EOF.

The program read10-getchar.c reads up to ten bytes from standard input using getchar, and prints them in up to four ways:

The program is straightforward. You need <ctype.h>, because it defines isprint():

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

void main(int argc, char **argv)
{
  FILE *f;
  int i, nread;
  int c;

  if (argc != 1) {
    fprintf(stderr, "usage: read10 (file on standard input)\n");
  }

  for (i = 0; i < 10; i++) {
    c = getchar();
    if (c == EOF) {
      if (i == 0) printf("The input is empty.\n");
      exit(0);
    }
    printf("Byte %d: Unsigned Decimal: %3d. Signed: %4d.  Hex: %02x.", i, 
           (unsigned char) c, (char) c, (unsigned int) c);
    if (isprint(c)) printf("  Character: %c", c);
    printf("\n");
  }
  exit(0);
}

Let's see it in action:

UNIX> echo abcd | read10-getchar
Byte 0: Unsigned Decimal:  97. Signed:   97.  Hex: 61.  Character: a
Byte 1: Unsigned Decimal:  98. Signed:   98.  Hex: 62.  Character: b
Byte 2: Unsigned Decimal:  99. Signed:   99.  Hex: 63.  Character: c
Byte 3: Unsigned Decimal: 100. Signed:  100.  Hex: 64.  Character: d
Byte 4: Unsigned Decimal:  10. Signed:   10.  Hex: 0a.
UNIX>
It detected that input ended after the newline character (10) when getchar() returned EOF.
UNIX> echo "a 1 # )" | read10-getchar
Byte 0: Unsigned Decimal:  97. Signed:   97.  Hex: 61.  Character: a
Byte 1: Unsigned Decimal:  32. Signed:   32.  Hex: 20.  Character:  
Byte 2: Unsigned Decimal:  49. Signed:   49.  Hex: 31.  Character: 1
Byte 3: Unsigned Decimal:  32. Signed:   32.  Hex: 20.  Character:  
Byte 4: Unsigned Decimal:  35. Signed:   35.  Hex: 23.  Character: #
Byte 5: Unsigned Decimal:  32. Signed:   32.  Hex: 20.  Character:  
Byte 6: Unsigned Decimal:  41. Signed:   41.  Hex: 29.  Character: )
Byte 7: Unsigned Decimal:  10. Signed:   10.  Hex: 0a.
UNIX> 
You'll see that isprint() says that spaces are "printable," but newlines are "not."
UNIX> read10-getchar < read10-getchar.c
Byte 0: Unsigned Decimal:  35. Signed:   35.  Hex: 23.  Character: #
Byte 1: Unsigned Decimal: 105. Signed:  105.  Hex: 69.  Character: i
Byte 2: Unsigned Decimal: 110. Signed:  110.  Hex: 6e.  Character: n
Byte 3: Unsigned Decimal:  99. Signed:   99.  Hex: 63.  Character: c
Byte 4: Unsigned Decimal: 108. Signed:  108.  Hex: 6c.  Character: l
Byte 5: Unsigned Decimal: 117. Signed:  117.  Hex: 75.  Character: u
Byte 6: Unsigned Decimal: 100. Signed:  100.  Hex: 64.  Character: d
Byte 7: Unsigned Decimal: 101. Signed:  101.  Hex: 65.  Character: e
Byte 8: Unsigned Decimal:  32. Signed:   32.  Hex: 20.  Character:  
Byte 9: Unsigned Decimal:  60. Signed:   60.  Hex: 3c.  Character: <
UNIX>
In the run above, we have redirected read10-getchar.c to standard input, and it has printed the first 10 bytes in all four formats.
UNIX> read10-getchar < Dog.jpg
Byte 0: Unsigned Decimal: 255. Signed:   -1.  Hex: ff.
Byte 1: Unsigned Decimal: 216. Signed:  -40.  Hex: d8.
Byte 2: Unsigned Decimal: 255. Signed:   -1.  Hex: ff.
Byte 3: Unsigned Decimal: 224. Signed:  -32.  Hex: e0.
Byte 4: Unsigned Decimal:   0. Signed:    0.  Hex: 00.
Byte 5: Unsigned Decimal:  16. Signed:   16.  Hex: 10.
Byte 6: Unsigned Decimal:  74. Signed:   74.  Hex: 4a.  Character: J
Byte 7: Unsigned Decimal:  70. Signed:   70.  Hex: 46.  Character: F
Byte 8: Unsigned Decimal:  73. Signed:   73.  Hex: 49.  Character: I
Byte 9: Unsigned Decimal:  70. Signed:   70.  Hex: 46.  Character: F
UNIX> 
The final call above shows what happens when you get a byte between 128 and 255. Those bytes aren't in standard printable text files, but they are in files like binary program files and JPG files.

Fread() is a way to read multiple bytes at a time from a file. You call it as:

long fread(void *ptr, long size, long nmemb, FILE *stream);

This says to read size*nmemb bytes from the "stream" called stream, and put those bytes into the memory denoted by ptr. The two parameters are specified so that you can ask for nmemb items each of whose size is size. It returns the number of items actually read. If you call it with size equal to one, then it returns the number of bytes that it read. If, for example, the input stream is a file and there are not nmemb items in the file, it simply reads the items that are in the file and returns that number. It returns 0 if there are no items to read.

As usual with C, you have to already have the bytes allocated. You can use the global variable stdin (which is defined in stdio.h) to read from standard input. If you want to read from a file, you can create a "stream" using fopen().

The program below, in read10-fread.c, performs identically to read10-getchar.c, except it uses fread() to read all of the characters at one time, and it allows you to specify a file to read from on the command line:

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <ctype.h>

void main(int argc, char **argv)
{
  FILE *f;
  int i, nread;
  unsigned char buf[10];

  if (argc != 2) {
    fprintf(stderr, "usage: read10 file(- for standard input)\n");
    exit(1);
  }

  if (strcmp("-", argv[1]) == 0) {
    f = stdin;
  } else {
    f = fopen(argv[1], "r");
    if (f == NULL) { perror(argv[1]); exit(1); }
  }

  nread = fread(buf, 1, 10, f);
  if (nread == 0) {
    printf("The input is empty.\n");
  } else if (nread < 0) {
    perror("fread");
    exit(1);
  } else {
    for (i = 0; i < nread; i++) {
      printf("Byte %d: Unsigned Decimal: %3d. Signed: %4d.  Hex: %02x.", i,
         buf[i], (char) buf[i], (unsigned int) buf[i]);
      if (isprint(buf[i])) printf("  Character: %c", buf[i]);
      printf("\n");
    }
  }
  exit(0);
}

Here are the same four calls as above, only using read10-fread:

UNIX> echo abcd | read10-fread -
Byte 0: Unsigned Decimal:  97. Signed:   97.  Hex: 61.  Character: a
Byte 1: Unsigned Decimal:  98. Signed:   98.  Hex: 62.  Character: b
Byte 2: Unsigned Decimal:  99. Signed:   99.  Hex: 63.  Character: c
Byte 3: Unsigned Decimal: 100. Signed:  100.  Hex: 64.  Character: d
Byte 4: Unsigned Decimal:  10. Signed:   10.  Hex: 0a.
UNIX> echo "a 1 # )" | read10-fread -
Byte 0: Unsigned Decimal:  97. Signed:   97.  Hex: 61.  Character: a
Byte 1: Unsigned Decimal:  32. Signed:   32.  Hex: 20.  Character:  
Byte 2: Unsigned Decimal:  49. Signed:   49.  Hex: 31.  Character: 1
Byte 3: Unsigned Decimal:  32. Signed:   32.  Hex: 20.  Character:  
Byte 4: Unsigned Decimal:  35. Signed:   35.  Hex: 23.  Character: #
Byte 5: Unsigned Decimal:  32. Signed:   32.  Hex: 20.  Character:  
Byte 6: Unsigned Decimal:  41. Signed:   41.  Hex: 29.  Character: )
Byte 7: Unsigned Decimal:  10. Signed:   10.  Hex: 0a.
UNIX> read10-fread read10-getchar.c
Byte 0: Unsigned Decimal:  35. Signed:   35.  Hex: 23.  Character: #
Byte 1: Unsigned Decimal: 105. Signed:  105.  Hex: 69.  Character: i
Byte 2: Unsigned Decimal: 110. Signed:  110.  Hex: 6e.  Character: n
Byte 3: Unsigned Decimal:  99. Signed:   99.  Hex: 63.  Character: c
Byte 4: Unsigned Decimal: 108. Signed:  108.  Hex: 6c.  Character: l
Byte 5: Unsigned Decimal: 117. Signed:  117.  Hex: 75.  Character: u
Byte 6: Unsigned Decimal: 100. Signed:  100.  Hex: 64.  Character: d
Byte 7: Unsigned Decimal: 101. Signed:  101.  Hex: 65.  Character: e
Byte 8: Unsigned Decimal:  32. Signed:   32.  Hex: 20.  Character:  
Byte 9: Unsigned Decimal:  60. Signed:   60.  Hex: 3c.  Character: <
UNIX> read10-fread Dog.jpg
Byte 0: Unsigned Decimal: 255. Signed:   -1.  Hex: ff.
Byte 1: Unsigned Decimal: 216. Signed:  -40.  Hex: d8.
Byte 2: Unsigned Decimal: 255. Signed:   -1.  Hex: ff.
Byte 3: Unsigned Decimal: 224. Signed:  -32.  Hex: e0.
Byte 4: Unsigned Decimal:   0. Signed:    0.  Hex: 00.
Byte 5: Unsigned Decimal:  16. Signed:   16.  Hex: 10.
Byte 6: Unsigned Decimal:  74. Signed:   74.  Hex: 4a.  Character: J
Byte 7: Unsigned Decimal:  70. Signed:   70.  Hex: 46.  Character: F
Byte 8: Unsigned Decimal:  73. Signed:   73.  Hex: 49.  Character: I
Byte 9: Unsigned Decimal:  70. Signed:   70.  Hex: 46.  Character: F
UNIX> 
You now know enough about getchar() and fread() to do this lab.

data_uncompress.c - using the C standard I/O library

Your job is to write the program data_uncompress, using the C standard I/O library. In particular, you should read using getchar() and fread().

This program assumes that standard input is in a compressed format. The format is a stream of bytes as follows. You read a character, and that character tells you how to proceed:

Here's an example file in comp-1.txt -- each cell of the table is a character of the file:

115 ('s') 1 2 74 ('J') 105 ('i') 109 ('m') 4 80 ('P')
108 ('l') 97 ('a') 110 ('n') 107 ('k') 110 ('n') 115 ('s') 0 4
67 ('C') 83 ('S') 51 ('3') 54 ('6') 48 ('0') 110 ('n')

When you uncompress this, you first see the 's', and then a one. That means you need to read two strings. The first consists of three characters, which are 'J', 'i' and 'm', and the second consists of five characters - "Plank". You print "Jim", a space and "Plank." Then you read a 'n', and print a newline. Next you read a 's' and the number 0, which means to read one string. That string has five characters: "CS360". You print "CS360", then read the final 'n' and print a newline.

Thus:

UNIX> data_uncompress < comp-1.txt
Jim Plank
CS360
UNIX> 

I have written a program called data_compress, which turns standard input into the proper format to be read by data_uncompress. It turns most numbers into integers and doubles, and the rest of its words into strings. For example, if we do:

UNIX> echo "1234 3.14159 Fred" | data_compress > comp-2.txt
Then comp-2.txt is composed of the following bytes:

105('i') 0 210 4 0 0 100('d') 0
110 134 27 240 249 33 9 64
115('s') 0 3 70('F') 114('r') 101('e') 100('d') 110('n')

To uncompress, you first read the 'i' and the 0, which says to read one integer. You will read the next four bytes (210, 4, 0, 0) using fread(). You next read the byte 'd' and a 0, which say to read one double. You will thus read the next eight bytes (110, 134,27, 40, 249, 33, 9, 64) again using fread(). Next you read an 's' and 0, which says to read one string. You read 3, which says that the string is four characters (Fred). After reading those, you read the 'n' which says to print a newline.

The output is thus:

UNIX> data_uncompress < comp-2.txt
1234 3.14159 Fred
UNIX> 
If you call data_compress on a file and pipe the output to data_uncompress or data_uncompress, you will get a file that is roughly equivalent to the original. There may be some formatting that is different. Your output must match mine exactly, however.

A Common Mistake

When you read n and size, you read them as unsigned chars. However, when you use them in your computer program, you should convert them to ints. Why? Because if you read in 255 as an unsigned char, and then add one to it, it becomes zero, and that is not what you want, is it?

Error catching

Your program must catch the following errors, and then print the given strings on standard error. If you are in doubt about what your output should be, test it against the program in the lab directory.

When you are reading multiple characters, integers or doubles, read them all at once. For example, if you are supposed to read 10 integers, do it with one fread() call, and then check to make sure that the call actually read 10 integers. Don't do 10 fread() calls. That will help you match the proper output.

Just a little more information on this -- in my program, I had three different buffers: One for chars, one for integers and one for doubles. Each buffer can hold the maximum number of chars, integers and doubles (257 chars, since we will null terminate strings, and 256 integers and doubles). These were the buffers that I used for the fread() calls.

Walking you through gradescript 100

This is a piazza post from 2017. Please take a look if gradescripts 90-100 are giving you trouble, or if you want some ideas on debugging.