The code associates strings with sequences of bits. For example, one code associates the following strings with the following sequences of bits:
| String | Sequence of bits |
| " " | 10 |
| "\n" | 1100 |
| "o" | 00 |
| "p" | 010 |
| "f" | 1101 |
| "t" | 111 |
| "the" | 011 |
You'll note that no sequence of bits is a prefix of another sequence of bits. This means that you can decode a stream of bits into unique sequences. For example:
maps to this stream of unique sequence of bits:
The code definition file defines the associations. Its format is a stream of null-terminated strings that come in pairs: the string and the sequence of bits defined as a stream of zero and one characters. Thus, for example, the above code is stored in t-code.txt, which is composed of the following bytes:
| 111 'o' | 0 | 48 '0' | 48 '0' | 0 | 112 'p' | 0 | 48 '0' |
| 49 '1' | 48 '0' | 0 | 116 't' | 104 'h' | 101 'e' | 0 | 48 '0' |
| 49 '1' | 49 '1' | 0 | 32 space | 0 | 49 '1' | 48 '0' | 0 |
| 10 eol | 0 | 49 '1' | 49 '1' | 48 '0' | 48 '0' | 0 | 102 'f' |
| 0 | 49 '1' | 49 '1' | 48 '0' | 49 '1' | 0 | 116 't' | 0 |
| 49 '1' | 49 '1' | 49 '1' | 0 |
The input file is in a specific format. The last four bytes represent an unsigned integer, which says how many bits should be read from the input file. The preceding bytes contain the stream of bits. If the total number of bits is t then the size of the file should be:
Within each byte, the stream of bits starts with the least significant bit. Thus, the stream "10000000" would be represented by the byte 0x1, and the stream "00000001" would be represented by the byte 0x80.
Take for example, the following file t-encoded.txt:
| 0x27 | 0x77 | 0x34 | 0x18 | 0x00 | 0x00 | 0x00 |
The last four bytes contain the integer 0x18, which equals 24. Thus, the bit stream contains 24 bits, which are stored in three bytes: 0x27, 0x77 and 0x34. Converting them to a bit stream:
Let's try another example, in t-encoded-2.txt:
| 0x47 | 0x0c | 0x0e | 0x00 | 0x00 | 0x00 |
The last four bytes contain 0xe = 14. Thus, the first two bytes contain a string of 14 bits: 0x47 = "11100010" and 0x0c = "00110000". The string is "11100010001100" = "111","00","010","00","1100" = "topo\n".
Armed with this knowledge, you are now equipped to write huf_dec, which should decode the input file using the code defined in the code definition file. Use the C standard I/O library to read both the code definition file and the input file.
You may assume that the code definition file defines a code in which no sequence of bits is a prefix of another. You may also assume that no string or sequence of bits is longer than 10000 characters.
You must test for the following errors:
You may assume that the output of huff_dec is composed solely of printable characters.
The program huff_enc takes two command line arguments -- a code defintion file and the name of an output file. It then encodes standard input to the output file using the code.
For example:
UNIX> cat celebrate.txt
I just want to celebrate another day of living
I just want to celebrate another day of life
Put my faith in the people but the people let me down
So I turned the other way and I carried on anyhow
I just want to celebrate another day of living
I just want to celebrate another day of life
UNIX> huff_create_code < celebrate.txt > celebrate-code.txt
UNIX> wc celebrate-code.txt
1 3 275 celebrate-code.txt
UNIX> huff_enc < celebrate.txt celebrate-code.txt celebrate-encoded.txt
UNIX> huff_dec celebrate-code.txt celebrate-encoded.txt
I just want to celebrate another day of living
I just want to celebrate another day of life
Put my faith in the people but the people let me down
So I turned the other way and I carried on anyhow
I just want to celebrate another day of living
I just want to celebrate another day of life
UNIX> ls -l cel*
-rw-r--r-- 1 plank staff 275 Jan 31 14:08 celebrate-code.txt
-rw-r--r-- 1 plank staff 109 Jan 31 14:09 celebrate-encoded.txt
-rw-r--r-- 1 plank staff 288 Jan 31 14:08 celebrate.txt
UNIX>
Here's the code in celebrate-code.txt:
| 'r' | 0 | '0' | '0' | '0' | '0' | '0' | 0 | 'w' | 'a' | 'n' | 't' |
| 0 | '0' | '0' | '0' | '0' | '1' | 0 | 'g' | 0 | '0' | '0' | '0' |
| '1' | '0' | '0' | 0 | 'm' | 0 | '0' | '0' | '0' | '1' | '0' | '1' |
| 0 | 'v' | 0 | '0' | '0' | '0' | '1' | '1' | '0' | 0 | 'P' | 0 |
| '0' | '0' | '0' | '1' | '1' | '1' | '0' | 0 | 'S' | 0 | '0' | '0' |
| '0' | '1' | '1' | '1' | '1' | 0 | 'i' | 0 | '0' | '0' | '1' | '0' |
| 0 | 'a' | 0 | '0' | '0' | '1' | '1' | '0' | 0 | 'b' | 0 | '0' |
| '0' | '1' | '1' | '1' | '0' | '0' | 0 | 'c' | 0 | '0' | '0' | '1' |
| '1' | '1' | '0' | '1' | 0 | 'u' | 0 | '0' | '0' | '1' | '1' | '1' |
| '1' | 0 | eol | 0 | '0' | '1' | '0' | '0' | '0' | 0 | 'I' | 0 |
| '0' | '1' | '0' | '0' | '1' | 0 | 'h' | 0 | '0' | '1' | '0' | '1' |
| '0' | 0 | 'w' | 0 | '0' | '1' | '0' | '1' | '1' | '0' | 0 | 'y' |
| 0 | '0' | '1' | '0' | '1' | '1' | '1' | 0 | 't' | 0 | '0' | '1' |
| '1' | '0' | 0 | 'e' | 0 | '0' | '1' | '1' | '1' | 0 | space | 0 |
| '1' | '0' | 0 | 'f' | 0 | '1' | '1' | '0' | '0' | '0' | 0 | 'l' |
| 0 | '1' | '1' | '0' | '0' | '1' | 0 | 'o' | 0 | '1' | '1' | '0' |
| '1' | 0 | 'n' | 0 | '1' | '1' | '1' | '0' | '0' | 0 | 'a' | 'n' |
| 'o' | 't' | 'h' | 'e' | 'r' | 0 | '1' | '1' | '1' | '0' | '1' | '0' |
| 0 | 'c' | 'e' | 'l' | 'e' | 'b' | 'r' | 'a' | 't' | 'e' | 0 | '1' |
| '1' | '1' | '0' | '1' | '1' | 0 | 'd' | 0 | '1' | '1' | '1' | '1' |
| '0' | '0' | 0 | 'd' | 'a' | 'y' | 0 | '1' | '1' | '1' | '1' | '0' |
| '1' | 0 | 'j' | 'u' | 's' | 't' | 0 | '1' | '1' | '1' | '1' | '1' |
| '0' | 0 | 'p' | 0 | '1' | '1' | '1' | '1' | '1' | '1' | 0 |