CS367 -- Honors Lab 1 -- Adding Watermarks to WAV Files


You work for a music production company. Your boss wants to be able to put a "watermark" into a music file. This will be data in the file that allows you to determine whether the music file is "authentic" or not. He wants the watermark in the music itself, and not in, for example, the metadata or tags of the file. He wants it to be imperceptible audibly. If you make an exact copy of the file, it will remain authentic. However, if you modify the music, you will destroy the watermark.

Sounds like fun, no?

Here's the program that you are going to write:

wav_water insert|check

The program reads a WAV file on standard input. WAV files are uncompressed audio files that have a pretty easy-to-read format. You can see a format specification in http://www.sonicspot.com/guide/wavefiles.html. (and if that link is gone, try https://web.archive.org/web/20130822004502/http://www.sonicspot.com/guide/wavefiles.html). You program needs to work on 8, 16, and 32-bit PCM files.

If you call the program with insert as the command line argument, then it will insert watermarks into the file, and write the resulting file on standard output. If you call the program with check as the command line argument, then the program will check to see if the file is authentic. If so, it prints "Authentic." If not, it prints "Inauthentic."

Here's how you insert/check watermarks. WAV files are composed of "chunks." The "music" part of a WAV file is stored in "data" chunks. When your program checks for watermarks, it ignores all chunks except the data chunks. When your program inserts watermarks, it only puts them into data chunks. It simply copies all of the other chunks from the input file to the output files.

The data chunks are composed of "samples". These samples are 8, 16 or 32-bit numbers, stored in little-endian (least significant byte first) order. You are going to read the data chunks in segments of 4096 bytes. The last segment may be smaller than 4096 bytes. You are going to ignore that segment for the purpose of watermarking (that means you ignore it when you check watermarks, and you simply copy it when you insert watermarks).

Now, consider a 4096-byte segment, composed of 4096, 2048 or 1024 samples. To calculate your watermark, you are going to zero the least significant bit of the first 32 samples. Then, you are going to feed the 4096 bytes into the djb_hash() hash function from CS140 (see http://web.eecs.utk.edu/~plank/plank/classes/cs140/Notes/Hashing/index.html -- you need to change this so that it works in C, and it hashes 4096 bytes, rather than null-terminated character strings).

This gives you a 32-bit, unsigned hash value. You are going to insert each bit of the hash file into the data chunk so that the highest bit of the hash value is the least significant bit of the first sample. The second highest bit is the least significant bit of the second sample. And so on. Therefore, the first 32 samples contain the watermark in each of their least significant bits.


Constraints, VI

You are not allowed to use any external libraries here -- all you need is fread(), fwrite() and memcpy(). Yes, you can grab libraries off the web that help you edit WAV files. You are not allowed to use them. For reference, my program was just 220 lines.

A 2015 student (Connor Minton) made a Piazza post on how to view binary files with VI. I'm including that below in case you want to do that:

For Lab2, it might be helpful to hand-edit binary files. To do this in VIM:

  1. Open the file with the "-b" option, as in "vim -b foo.dat"
  2. Type the command ":%!xxd"
  3. Edit the hex part (editing the ascii representation will not affect the output)
  4. When you're done, type the command ":%!xxd -r"
  5. Save and exit

Here is a link that shows an example: http://makezine.com/2008/08/09/edit-binary-files-in-vi/


An Example

The file Liszt-16.wav is a small 5 second WAV file with 16-bit samples. Below, we create a watermark version of it in Liszt-16-WM.wav, and then show that wav_water correctly identifies that Liszt-16.wav is authentic, but Liszt-16-WM.wav is not. As you can see, they are the same size, and if you listen to them, they will sound identical. How cool is that?
UNIX> wav_water insert < Liszt-16.wav > Liszt-16-WM.wav
UNIX> wav_water check < Liszt-16.wav
Inauthentic.
UNIX> wav_water check < Liszt-16-WM.wav
Authentic.
UNIX> ls -l Liszt-16*
-rw-r--r-- 1 plank loci 862648 Jan 24 15:23 Liszt-16.wav
-rw-r--r-- 1 plank loci 862648 Jan 24 15:25 Liszt-16-WM.wav
UNIX> 
Now, let's examine this a little more. I've modified wav_water to print the watermark in each segment, and then to print the first 32 samples. I've named the executable wav_waterprint. Here it is on Liszt-16.wav:
UNIX> ( wav_waterprint insert < Liszt-16.wav > Liszt-16-WM.wav ) >& tmp.txt ; head -n 33 tmp.txt
Watermark 0x82785e24.  Here are the first 32 samples:
  0xfa43 (1)
  0xf744 (0)
  0xfed8 (0)
  0xfa7a (0)
  0x0130 (0)
  0xfbbc (0)
  0x0077 (1)
  0xfbd6 (0)
  0xfeba (0)
  0xfc3d (1)
  0xfea3 (1)
  0xfd8d (1)
  0x00bb (1)
  0xff28 (0)
  0x043c (0)
  0x0088 (0)
  0x078c (0)
  0x01c1 (1)
  0x08e2 (0)
  0x0345 (1)
  0x085f (1)
  0x0513 (1)
  0x0751 (1)
  0x060e (0)
  0x0654 (0)
  0x05a8 (0)
  0x059d (1)
  0x0494 (0)
  0x0494 (0)
  0x0301 (1)
  0x0294 (0)
  0x012e (0)
UNIX>
You should see how the watermark is embedded in the least significant bits of the samples. See how the first four samples have least significant bits of 1000 = 0x8? That's the high four bits of the watermark. The next four bits are 0x2: 0010. Etc.

Let's try it on an 8-bit version of the same music, which is in Liszt-08.wav:

UNIX> ( wav_waterprint insert < Liszt-08.wav > Liszt-08-WM.wav ) > & tmp.txt ; head -n 33 tmp.txt
Watermark 0xd9c39427.  Here are the first 32 samples:
  0x7b (1)
  0x77 (1)
  0x7e (0)
  0x7b (1)
  0x81 (1)
  0x7a (0)
  0x80 (0)
  0x7b (1)
  0x7f (1)
  0x7d (1)
  0x7e (0)
  0x7c (0)
  0x80 (0)
  0x7e (0)
  0x85 (1)
  0x81 (1)
  0x87 (1)
  0x80 (0)
  0x88 (0)
  0x83 (1)
  0x88 (0)
  0x85 (1)
  0x86 (0)
  0x86 (0)
  0x86 (0)
  0x84 (0)
  0x85 (1)
  0x84 (0)
  0x84 (0)
  0x83 (1)
  0x83 (1)
  0x81 (1)
UNIX> 
And on a 32-bit version. I created the three files with Audacity, so they are the same music, but different file formats.
UNIX> ( wav_waterprint insert < Liszt-32.wav > Liszt-32-WM.wav ) > & tmp.txt ; head -n 33 tmp.txt
Watermark 0xe277fe56.  Here are the first 32 samples:
  0xfa420001 (1)
  0xf7440001 (1)
  0xfed70001 (1)
  0xfa7c0000 (0)
  0x012f0000 (0)
  0xfbbf0000 (0)
  0x00740001 (1)
  0xfbd70000 (0)
  0xfeba0000 (0)
  0xfc3d0001 (1)
  0xfea20001 (1)
  0xfd8c0001 (1)
  0x00b90000 (0)
  0xff2b0001 (1)
  0x04380001 (1)
  0x008c0001 (1)
  0x07890001 (1)
  0x01c30001 (1)
  0x08e10001 (1)
  0x03440001 (1)
  0x085f0001 (1)
  0x05120001 (1)
  0x074f0001 (1)
  0x06110000 (0)
  0x06540000 (0)
  0x05a90001 (1)
  0x059c0000 (0)
  0x04960001 (1)
  0x04940000 (0)
  0x03030001 (1)
  0x02920001 (1)
  0x01300000 (0)
UNIX>

Some Other Examples Files, Grading, A Challenge

I have 30 wav files in 01.wav through 30.wav. They are all clips of music, and their sample sizes vary between 8, 16 and 32 bits.

There is no gradescript for this -- the TA can do that on his or her own. However, grading should be obvious -- your program's output show match my program's output verbatim. Don't worry about error messages -- just make sure that you can handle erroneous files.

For fun, see if you can identify the 30 wav clips. I used to give some extra credit for this, but students ran it through Shazam, which batted about 0.700 on it. So no extra credit. If you want me to verify your guesses, simply email them to me.