Sounds like fun, no?
Here's the program that you are going to write:
The program reads a WAV file on standard input. WAV files are uncompressed audio files that have a pretty easy-to-read format. You can see a format specification in http://www.sonicspot.com/guide/wavefiles.html. (and if that link is gone, try https://web.archive.org/web/20130822004502/http://www.sonicspot.com/guide/wavefiles.html). You program needs to work on 8, 16, and 32-bit PCM files.
If you call the program with insert as the command line argument, then it will insert watermarks into the file, and write the resulting file on standard output. If you call the program with check as the command line argument, then the program will check to see if the file is authentic. If so, it prints "Authentic." If not, it prints "Inauthentic."
Here's how you insert/check watermarks. WAV files are composed of "chunks." The "music" part of a WAV file is stored in "data" chunks. When your program checks for watermarks, it ignores all chunks except the data chunks. When your program inserts watermarks, it only puts them into data chunks. It simply copies all of the other chunks from the input file to the output files.
The data chunks are composed of "samples". These samples are 8, 16 or 32-bit numbers, stored in little-endian (least significant byte first) order. You are going to read the data chunks in segments of 4096 bytes. The last segment may be smaller than 4096 bytes. You are going to ignore that segment for the purpose of watermarking (that means you ignore it when you check watermarks, and you simply copy it when you insert watermarks).
Now, consider a 4096-byte segment, composed of 4096, 2048 or 1024 samples. To calculate your watermark, you are going to zero the least significant bit of the first 32 samples. Then, you are going to feed the 4096 bytes into the djb_hash() hash function from CS140 (see http://web.eecs.utk.edu/~plank/plank/classes/cs140/Notes/Hashing/index.html -- you need to change this so that it works in C, and it hashes 4096 bytes, rather than null-terminated character strings).
This gives you a 32-bit, unsigned hash value. You are going to insert each bit of the hash file into the data chunk so that the highest bit of the hash value is the least significant bit of the first sample. The second highest bit is the least significant bit of the second sample. And so on. Therefore, the first 32 samples contain the watermark in each of their least significant bits.
A 2015 student (Connor Minton) made a Piazza post on how to view binary files with VI. I'm including that below in case you want to do that:
For Lab2, it might be helpful to hand-edit binary files. To do this in VIM:
Here is a link that shows an example: http://makezine.com/2008/08/09/edit-binary-files-in-vi/
UNIX> wav_water insert < Liszt-16.wav > Liszt-16-WM.wav UNIX> wav_water check < Liszt-16.wav Inauthentic. UNIX> wav_water check < Liszt-16-WM.wav Authentic. UNIX> ls -l Liszt-16* -rw-r--r-- 1 plank loci 862648 Jan 24 15:23 Liszt-16.wav -rw-r--r-- 1 plank loci 862648 Jan 24 15:25 Liszt-16-WM.wav UNIX>Now, let's examine this a little more. I've modified wav_water to print the watermark in each segment, and then to print the first 32 samples. I've named the executable wav_waterprint. Here it is on Liszt-16.wav:
UNIX> ( wav_waterprint insert < Liszt-16.wav > Liszt-16-WM.wav ) >& tmp.txt ; head -n 33 tmp.txt Watermark 0x82785e24. Here are the first 32 samples: 0xfa43 (1) 0xf744 (0) 0xfed8 (0) 0xfa7a (0) 0x0130 (0) 0xfbbc (0) 0x0077 (1) 0xfbd6 (0) 0xfeba (0) 0xfc3d (1) 0xfea3 (1) 0xfd8d (1) 0x00bb (1) 0xff28 (0) 0x043c (0) 0x0088 (0) 0x078c (0) 0x01c1 (1) 0x08e2 (0) 0x0345 (1) 0x085f (1) 0x0513 (1) 0x0751 (1) 0x060e (0) 0x0654 (0) 0x05a8 (0) 0x059d (1) 0x0494 (0) 0x0494 (0) 0x0301 (1) 0x0294 (0) 0x012e (0) UNIX>You should see how the watermark is embedded in the least significant bits of the samples. See how the first four samples have least significant bits of 1000 = 0x8? That's the high four bits of the watermark. The next four bits are 0x2: 0010. Etc.
Let's try it on an 8-bit version of the same music, which is in Liszt-08.wav:
UNIX> ( wav_waterprint insert < Liszt-08.wav > Liszt-08-WM.wav ) > & tmp.txt ; head -n 33 tmp.txt Watermark 0xd9c39427. Here are the first 32 samples: 0x7b (1) 0x77 (1) 0x7e (0) 0x7b (1) 0x81 (1) 0x7a (0) 0x80 (0) 0x7b (1) 0x7f (1) 0x7d (1) 0x7e (0) 0x7c (0) 0x80 (0) 0x7e (0) 0x85 (1) 0x81 (1) 0x87 (1) 0x80 (0) 0x88 (0) 0x83 (1) 0x88 (0) 0x85 (1) 0x86 (0) 0x86 (0) 0x86 (0) 0x84 (0) 0x85 (1) 0x84 (0) 0x84 (0) 0x83 (1) 0x83 (1) 0x81 (1) UNIX>And on a 32-bit version. I created the three files with Audacity, so they are the same music, but different file formats.
UNIX> ( wav_waterprint insert < Liszt-32.wav > Liszt-32-WM.wav ) > & tmp.txt ; head -n 33 tmp.txt Watermark 0xe277fe56. Here are the first 32 samples: 0xfa420001 (1) 0xf7440001 (1) 0xfed70001 (1) 0xfa7c0000 (0) 0x012f0000 (0) 0xfbbf0000 (0) 0x00740001 (1) 0xfbd70000 (0) 0xfeba0000 (0) 0xfc3d0001 (1) 0xfea20001 (1) 0xfd8c0001 (1) 0x00b90000 (0) 0xff2b0001 (1) 0x04380001 (1) 0x008c0001 (1) 0x07890001 (1) 0x01c30001 (1) 0x08e10001 (1) 0x03440001 (1) 0x085f0001 (1) 0x05120001 (1) 0x074f0001 (1) 0x06110000 (0) 0x06540000 (0) 0x05a90001 (1) 0x059c0000 (0) 0x04960001 (1) 0x04940000 (0) 0x03030001 (1) 0x02920001 (1) 0x01300000 (0) UNIX>
There is no gradescript for this -- the TA can do that on his or her own. However, grading should be obvious -- your program's output show match my program's output verbatim. Don't worry about error messages -- just make sure that you can handle erroneous files.
For fun, see if you can identify the 30 wav clips. I used to give some extra credit for this, but students ran it through Shazam, which batted about 0.700 on it. So no extra credit. If you want me to verify your guesses, simply email them to me.