Technical Report UT-CS-07-593
Department of Computer Science
University of Tennessee
The online home for this document is:
http://web.eecs.utk.edu/~jplank/plank/papers/CS-07-593
When w = 8, the Galois Field GF(28) comprises the elements 0, 1, ..., 255. This is an important field because it allows you to perform arithmetic that adheres to the above properties on single bytes. Again, this is essential in Reed-Solomon coding and other kinds of coding. Similarly, w=16 and w=32 are other important fields.
There are useful applications for fields with other values of w. For example, Cauchy Reed-Solomon coding employs Galois Fields with other values of w, and converts them to strictly binary (XOR) codes.
Galois Fields are covered in standard texts on Error Correcting Codes such as Peterson & Weldon [PW72], MacWilliams and Sloane [MS77], and Van Lint [VL82]. These treatments are thorough, and take a bit of time to understand. For a briefer and more pragmatic treatment, see Plank's 1997 tutorial on Reed-Solomon coding [P97], [PD05].
Unix-Prompt> gf_xor 15 7 8 Unix-Prompt> gf_xor 8 7 15 Unix-Prompt> gf_xor 230498 2738947 2772833 Unix-Prompt> gf_xor 2772833 230498 2738947 Unix-Prompt> |
The program gf_mult takes three arguments: a, b and w, and prints out the product of a and b in GF(2w). The program gf_div performs division, and the program gf_inverse returns the multiplicative inverse of a number in GF(2w):
Unix-Prompt> gf_mult 3 7 4 9 Unix-Prompt> gf_div 9 3 4 7 Unix-Prompt> gf_div 9 7 4 3 Unix-Prompt> gf_mult 1234567 2345678 32 1404360778 Unix-Prompt> gf_div 1404360778 1234567 32 2345678 Unix-Prompt> gf_div 1404360778 2345678 32 1234567 Unix-Prompt> gf_inverse 1404360778 32 106460795 Unix-Prompt> gf_mult 1404360778 106460795 32 1 Unix-Prompt> |
The other command line tools are discussed at the end of this document.
The files galois.h and galois.c implement a library of procedures for Galois Field Arithmetic in GF(2w) for w between 1 and 32. The library is written in C, but will work in C++ as well. It is especially tailored for w equal to 8, 16 and 32, but it is also applicable for any other value of w. For the smaller values of w (where multiplication or logarithm tables fit into memory), these procedures should be very fast.
In the following sections, we describe the procedures implemented by the library.
int galois_single_multiply(int a, int b, int w); int galois_single_divide(int a, int b, int w); |
Galois_single_multiply() returns the product of a and b in GF(2w), and galois_single_divide() returns the qoutient of a and b in GF(2w). w may have any value from 1 to 32.
The decision to make this procedure use regular, signed integers instead of unsigned integers was largely for convenience. It only makes a difference when w equals 32, in which case the sign bit of a, b, or the return values may be set. If it matters, simply convert the integers to unsigned integers. The procedures in this library will work regardless -- when w equals 32, they are treated as streams of bits and not integers.
It is anticipated that most applications that need to perform single multiplications and divisions only need reasonable performance, which is what these two procedures give you. If you need faster multiplication and division, then see the procedures below, which allow you to get much faster performance.
void galois_w08_region_multiply(char *region, int multby, int nbytes, char *r2, int add); void galois_w16_region_multiply(char *region, int multby, int nbytes, char *r2, int add); void galois_w32_region_multiply(char *region, int multby, int nbytes, char *r2, int add); |
These multiply the region of bytes specified by region and nbytes by the number multby in the field specified by the procedure's name. Region should be long-word aligned, otherwise these routines will generate a bus error. There are three separate functionalities of these procedures denoted by the values of r2 and add.
The performance of these procedures has been tuned to be very fast. A multiplication table is employed when w=8. Log and inverse log tables are employed with w=16, and seven multiplication tables are employed when w=32.
void galois_region_xor(char *r1, char *r2, char *r3, int nbytes); |
R3 may equal either r1 or r2 if you wish to overwrite either of them. Again, all pointers should be long-word aligned.
To use multiplication and division tables directly, use one or more of the following routines:
int galois_create_mult_tables(int w); int galois_multtable_multiply(int a, int b, int w); int galois_multtable_divide(int a, int b, int w); int *galois_get_mult_table(int w); int *galois_get_div_table(int w); |
Galois_create_mult_tables(w) creates multiplication and division tables for a given value of w and stores them internally. If you call it twice with the same value of w, it will not create new tables the second time. You may call it with different values of w and the tables will be stored separately.
If successful, galois_create_mult_tables() will return 0. Otherwise, it will return -1, and any allocated memory will be freed.
Galois_multtable_multiply() and galois_multtable_divide() work just like galois_single_multiply() and galois_single_divide(), except they assume that you have called galois_create_mult_tables() for the appropriate value of w and that it was successful. They do not error-check, so if you have not called galois_create_mult_tables(), they will seg-fault. This decision was made for speed -- although for small values of w (between 1 and 9), galois_single_multiply() uses galois_multtable_multiply(), it is significantly slower because of the error checking that it does.
Finally, to free yourself of procedure call overhead, the routines galois_get_mult_table() and galois_get_div_table() return the tables themselves. The product/quotient of a and b is in element a*2w+b, which of course may be computed quickly via bit arithmetic as ( (a << w) | b). You do not need to call galois_create_mult_tables() before calling galois_get_mult_table() or galois_get_div_table().
To use the log tables, use one or more of the following routines:
int galois_create_log_tables(int w); int galois_logtable_multiply(int a, int b, int w); int galois_logtable_divide(int a, int b, int w); int galois_log(int value, int w); int galois_ilog(int value, int w); int *galois_get_log_table(int w); int *galois_get_ilog_table(int w); |
Galois_create_log_tables(w) creates log and inverse log tables for the given value of w, and stores them internally. If you call it twice with the same value of w, it will not create new tables the second time. You may call it with different values of w and the tables will be stored separately.
If successful, galois_create_log_tables() will return 0. Otherwise, it will return -1, and any allocated memory will be freed.
Galois_logtable_multiply() and galois_logtable_divide() work just like galois_single_multiply() and galois_single_divide(), except they assume that you have called galois_create_log_tables() for the appropriate value of w and that it was successful. They do not error-check, so if you have not called galois_create_log_tables(), they will seg-fault. This decision was made for speed -- although for medium values of w (between 10 and 22), galois_single_multiply() uses galois_logtable_multiply(), it is significantly slower because of the error checking that it does.
Galois_log() and galois_ilog() return the log and inverse log of an element of GF(2w). You can use them to multiply using the following identity:
The division identity takes into account C's weird definition of modular arithmetic with negative numbers.
To perform the fastest multiplication and division with these tables, you should get access to the tables themselves using galois_get_log_table(int w) and galois_get_ilog_table(int w). Then you may calculate the product/quotient of a and b as:
You do not have to worry about modular arithmetic because the ilog table contains three copies of the inverse logs, and is defined for indices between -2w+1 and 22w-2. This saves a few instructions.
int galois_shift_multiply(int a, int b, int w); int galois_shift_divide(int a, int b, int w); |
Galois_shift_multiply() converts b into a w * w bit matrix and multiplies it by the bit vector a to create the product vector. You may see a quasi-tutorial description of this technique in the paper [P05]. It is significantly slower than the methods that use tables. However, it is general-purpose and requires no preallocation of memory.
For division, Galois_shift_divide() also converts b into a bit matrix, inverts it, and then multiplies the inverse by a. As such, it is *really* slow. If I get a clue how to implement this one faster, I will.
Galois_single_multiply() uses galois_shift_multiply() for w between 23 and 31. Galois_single_divide() uses galois_shift_divide() for w between 23 and 32.
int galois_create_split_w8_tables(); int galois_split_w8_multiply(int a, int b); |
Galois_create_split_w8_tables() creates seven tables that are 256 KB each. Galois_split_w8_multiply() employs these tables to multiply the 32-bit numbers by breaking them into four eight-bit numbers each, and then performing sixteen multiplications and exclusive-ors to calculate the product. It's a cool technique suggested to me by Cheng Huang of Microsoft, and is a good 16 times faster than using galois_shift_multiply().
"But couldn't you use this technique for other values of w?" Yes, you could, but I'm not implementing it, because I don't think it's that important. If that view changes, I'll fix it.
Galois_single_multiply() uses galois_split_w8_multiply() for w = 32.
Since galois_single_multiply() and galois_single_divide() call the table creation routines whenever the tables do not exist, if you are worried about thread safety, then for each value of w that you will use, you should make sure that the first call to galois_single_multiply() or galois_single_divide() is protected. After that, no protection is required.
Gf_basic_tester and gf_xor_tester test both correctness and speed. Call gf_xor_tester with no arguments to test the speed of gf_region_xor() on your system. Here it is on a Macbook whose CPU is a little busy doing other things:
Unix-Prompt> gf_xor_tester XOR Tester Seeding random number generator with 1172533188 Passed correctness test -- doing 10-second timing 1827.79986 Megabytes of XORs per second Unix-Prompt>
Gf_basic_tester takes the following command line arguments:
For example (again, on my MacBook):
Unix-Prompt> gf_basic_tester 16 default 100000 W: 16 Method: default Seeding random number generator with 1172533569 Doing 100000 trials for single-operation correctness. Passed Single-Operations Correctness Tests. Doing galois_w16_region_multiply correctness test. Passed galois_w16_region_multiply correctness test. Speed Test #1: 10 Seconds of Multiply operations Speed Test #1: 42.23862 Mega Multiplies per second Speed Test #2: 10 Seconds of Divide operations Speed Test #2: 43.42448 Mega Divides per second Doing 10 seconds worth of region_multiplies - Three tests: Test 0: Overwrite initial region Test 1: Products to new region Test 2: XOR products into second region Test 0: 253.45548 Megabytes of Multiplies per second Test 1: 238.59569 Megabytes of Multiplies per second Test 2: 167.99968 Megabytes of Multiplies per second