- January, 2017
- Latest Revision: January, 2017
- James S. Plank
- Directory:
**/home/plank/cs302/Notes/Bits**

- SRM 484 D2 250-pointer (NumberMagicEasy). You can solve this with a laundry list of
**if**statements, or with a**set**of integers; however, this is a really nice fit with the paradigm of using bits to represent elements of a set. I have lecture notes for this problem here, and often go over it in class. - SRM 699 D1 250-pointer (OthersXor): This is a nice problem that makes you think about bits. (Hints)
- TCO 2015 Q1A 250-pointer (Similars): Good practice with bits. (Here's a writeup that walks you through it.)
- SRM 596 D1 250-pointer (IncrementAndDoubling): You don't really need bit arithmetic here, but it is a good practice problem for it. (Hints)

I'm assuming that bit operations are review for you, but I'm also assuming that you've had very little practice programming with them. This lecture is to show you a little programming with bit operations, and to give you a program (

The standard bit operations AND, OR and XOR work on bits, and I'm assume that you learned this in CS130, if not in high school:

- 1 AND 1 equals one. Anything AND zero equals zero.
- 0 OR 0 equals zero. Anything OR one equals one.
- XOR is equal to addition modulo 2.

- Both input numbers are treated as ordered collections of bits.
- The operation is done on each bit of the input numbers.

5 = 0101 9 = 1001 ---- 0001 -- The AND of each bit gives you 0001 in binary, which is 1.If we're doing XOR, then you take the XOR of each bit:

5 = 0101 9 = 1001 ---- 1100 -- The XOR of each bit gives you 1100 in binary, which is 12.Other bit operations are NOT (which flips each bit), and the shift operations:

- If you "left-shift" by
*n*, then you move each bit*n*binary digits to the left. The*n*right-most bits will be set to zero, and any bits which started out within*n*binary digits of the left end of the word, will be deleted. - If you "right-shift" by
*n*bits, that you're doing the same operation, except you are moving the bits to the right and not the left.

In C and C++, the following are the bit arithmetic operators:

AND: this is a single ampersand: | & |

OR: this is a single vertical bar: | | |

XOR: this is a single carat: | ^ |

NOT: this is a single tilde: | ~ |

Left-shift: this is two less-than signs: | << |

Right-shift: this is two greater-than signs: | >> |

The numbers can be represented in any of three ways:

- Standard decimal, up to 2
^{64}-1 in value. - Hexadecimal, preceded by "0x" and up to 16 digits.
- Binary, preceded by "B" and up to 64 digits.

Here are some examples:

UNIX>The first two examples are explained above. The last one is kind of a pain, but I'm hoping that you see that looking at the hex is a nice way to solve the problem. With hex, each digit corresponds to four bits. So, you can iterate through the hex digits and solve the OR problem for each of those. Start with the right-most one: 0xd is 1101 and 0x3 is 0011. So (0xd OR 0x3) is equal to 1111 - 0xf. Moving left: 0x8 is 1000 and 0xd is 1101, so their OR is 0xd. And so on. Yes, looking at the bits is easier, but you can do it directly from the hex after a little practice.g++ -o ba_helper -std=c++98 ba_helper.cppUNIX>ba_helperWhen entering numbers, you can enter: A normal decimal number as big as 2^{64}-1. A number in hex up to 16 digits, starting with 0x. A number in binary up to 64 digits, starting with B. Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:5 AND 9Operator: AND A: 5 0x0000000000000005 0000000000000000000000000000000000000000000000000000000000000101 B: 9 0x0000000000000009 0000000000000000000000000000000000000000000000000000000000001001 C: 1 0x0000000000000001 0000000000000000000000000000000000000000000000000000000000000001 Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:B101 XOR 0x9Operator: XOR A: 5 0x0000000000000005 0000000000000000000000000000000000000000000000000000000000000101 B: 9 0x0000000000000009 0000000000000000000000000000000000000000000000000000000000001001 C: 12 0x000000000000000c 0000000000000000000000000000000000000000000000000000000000001100 Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:837261 OR 276591827Operator: OR A: 837261 0x00000000000cc68d 0000000000000000000000000000000000000000000011001100011010001101 B: 276591827 0x00000000107c74d3 0000000000000000000000000000000000010000011111000111010011010011 C: 276625119 0x00000000107cf6df 0000000000000000000000000000000000010000011111001111011011011111 Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:<CNTL-D>UNIX>

Left-shifts and right-shifts are easy if the shifting value is a multiple of 4. When that happens, you can divide by four and shift the hex. For example, in the first call, since we are left-shifting the bits by eight, that's the same as left-shifting the hex by 8/4 = 2. And in the second call, that's the same as right-shifting the hex by 16/4 = 4:

UNIX>Left-shift and right-shift become a pain when the number of digits is not a multiple of four. Then the best thing to do is convert the hex to binary, do the bit shift, and then partition the binary digits into groups of four, and convert back into hex. Here's an example. Suppose you want to do:ba_helperWhen entering numbers, you can enter: A normal decimal number as big as 2^{64}-1. A number in hex up to 16 digits, starting with 0x. A number in binary up to 64 digits, starting with B. Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:0x83726987264 LS 8Operator: LS A: 9032963748452 0x0000083726987264 0000000000000000000010000011011100100110100110000111001001100100 B: 8 0x0000000000000008 0000000000000000000000000000000000000000000000000000000000001000 C: 2312438719603712 0x0008372698726400 0000000000001000001101110010011010011000011100100110010000000000 Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:0xabcdef123454 RS 16Operator: RS A: 188900967593044 0x0000abcdef123454 0000000000000000101010111100110111101111000100100011010001010100 B: 16 0x0000000000000010 0000000000000000000000000000000000000000000000000000000000010000 C: 2882400018 0x00000000abcdef12 0000000000000000000000000000000010101011110011011110111100010010 Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:<CNTL-D>UNIX>

0x8a7e6c8 LS 7

First, convert the hex to binary, digit by digit:

8 a 7 e 6 c 8 1000 1010 0111 1110 0110 1100 1000

Now, add seven 0's to the right side, and get rid of the spaces. In VI, you can do that with "`:s/ //g`".

1000 1010 0111 1110 0110 1100 1000 0000000 10001010011111100110110010000000000

Now, add a zero to the beginning, so that the number of digits is a multiple of four, and then group
the digits in groups of four. In VI, you can
do that with "`:s/\(....\)/\1 /g`". And then you can go back to hex:

010001010011111100110110010000000000 0100 0101 0011 1111 0011 0110 0100 0000 0000 4 5 3 f 3 6 4 0 0

So the answer is 0x453f36400:

UNIX>The nice thing aboutba_helperWhen entering numbers, you can enter: A normal decimal number as big as 2^{64}-1. A number in hex up to 16 digits, starting with 0x. A number in binary up to 64 digits, starting with B. Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:0x8a7e6c8 LS 7Operator: LS A: 145221320 0x0000000008a7e6c8 0000000000000000000000000000000000001000101001111110011011001000 B: 7 0x0000000000000007 0000000000000000000000000000000000000000000000000000000000000111 C: 18588328960 0x0000000453f36400 0000000000000000000000000000010001010011111100110110010000000000 Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:<CNTL-D>UNIX>

In this discussion, I'm going to talk about "bit *x* of a number." When I say that, I mean the *x*-th bit
from the right side of the binary representation of a number. For example, with the number 12, which is (1100) in
binary, bits 0 and 1 are equal to zero, and bits 2 and 3 are equal to one. If the number is a 64-bit number, then
bits 4 through 63 are also zero.

In C/C++, to set bit *x* in number *v*, you do:

v |= (1ULL << x);

That operator is "OR-Equals", which is like "+=" and "*=", on with OR instead of addition or multiplication.
It's a good idea to put the left-shift in parentheses, because operator precedence is a little odd with bit arithmetic.
The "ULL" is something you need if you are dealing with 64-bit numbers (like **unsigned long long**). The ULL
tells the compiler to treat the number one as a 64-bit number, and that way it knows to do a bit shift on 64-bit numbers.
If you don't do "ULL", then it will treat one as an integer, and then, for example (1 << 32) will equal zero,
because in a 32-bit number, this shifts the one all the way off the number.

Here's an example using **ba_helper** that shows how you make sure that bit 6 is set in two numbers: 0x83, where
the bit is not set already, and 0x64, where the bit is set already:

UNIX>ba_helperWhen entering numbers, you can enter: A normal decimal number as big as 2^{64}-1. A number in hex up to 16 digits, starting with 0x. A number in binary up to 64 digits, starting with B. Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:1 LS 6Operator: LS A: 1 0x0000000000000001 0000000000000000000000000000000000000000000000000000000000000001 B: 6 0x0000000000000006 0000000000000000000000000000000000000000000000000000000000000110 C: 64 0x0000000000000040 0000000000000000000000000000000000000000000000000000000001000000 ^ As you can see, this creates a number where only bit six is set | Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:0x83 OR 0x40Operator: OR A: 131 0x0000000000000083 0000000000000000000000000000000000000000000000000000000010000011 B: 64 0x0000000000000040 0000000000000000000000000000000000000000000000000000000001000000 C: 195 0x00000000000000c3 0000000000000000000000000000000000000000000000000000000011000011 ^ Now, bit six is set in C | Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:0x64 OR 0x40Operator: OR A: 100 0x0000000000000064 0000000000000000000000000000000000000000000000000000000001100100 B: 64 0x0000000000000040 0000000000000000000000000000000000000000000000000000000001000000 C: 100 0x0000000000000064 0000000000000000000000000000000000000000000000000000000001100100 ^ In this example, bit 6 was already set in A, so C equals A. | Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:<CNTL-D>UNIX>

v &= (~(1ULL << x));

In the example below, we are going to clear the 6th bit of 0x64 (where it is set), and 0x83 (where it is not). As above, (1 << 6) equals 0x40, so we do AND-NOT with 0x40: so

UNIX>To see this a little more clearly, let's use (NOT 0x40). That is equal to 0xffffffffffffffbf. You can see the clearing of bit six a little more clearly here, I think:ba_helperWhen entering numbers, you can enter: A normal decimal number as big as 2^{64}-1. A number in hex up to 16 digits, starting with 0x. A number in binary up to 64 digits, starting with B. Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number: UNIX>0x64 ANDNOT 0x40Operator: ANDNOT A: 100 0x0000000000000064 0000000000000000000000000000000000000000000000000000000001100100 B: 64 0x0000000000000040 0000000000000000000000000000000000000000000000000000000001000000 C: 36 0x0000000000000024 0000000000000000000000000000000000000000000000000000000000100100 ^ In this example, we clear bit 6: | Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number: UNIX>0x83 ANDNOT 0x40Operator: ANDNOT A: 131 0x0000000000000083 0000000000000000000000000000000000000000000000000000000010000011 B: 64 0x0000000000000040 0000000000000000000000000000000000000000000000000000000001000000 C: 131 0x0000000000000083 0000000000000000000000000000000000000000000000000000000010000011 ^ In this example, bit 6 is already cleared: | Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:<CNTL-D>UNIX>

UNIX>ba_helperWhen entering numbers, you can enter: A normal decimal number as big as 2^{64}-1. A number in hex up to 16 digits, starting with 0x. A number in binary up to 64 digits, starting with B. Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:0x64 AND 0xffffffffffffffbfOperator: AND A: 100 0x0000000000000064 0000000000000000000000000000000000000000000000000000000001100100 B: 18446744073709551551 0xffffffffffffffbf 1111111111111111111111111111111111111111111111111111111110111111 C: 36 0x0000000000000024 0000000000000000000000000000000000000000000000000000000000100100 ^ In this example, we clear bit 6: | Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:0x83 AND 0xffffffffffffffbfOperator: AND A: 131 0x0000000000000083 0000000000000000000000000000000000000000000000000000000010000011 B: 18446744073709551551 0xffffffffffffffbf 1111111111111111111111111111111111111111111111111111111110111111 C: 131 0x0000000000000083 0000000000000000000000000000000000000000000000000000000010000011 ^ In this example, bit 6 is already cleared: | Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:<CNTL-D>UNIX>

if (v & (1ULL << x)) ...

If the bit isn't set, then the AND will result in all zero bits, which is **false**. If it is set, then the
AND will equal `(1 << x)`, which is not equal to zero. Boolean expressions that are not equal to zero
are **true**.

extracted_bits = v & ( (1ULL << x) - 1);

You may want to do some examples to convince yourself. Recall that `(1 << 6)` is 0x40, which equals 64. That means
that 63, which equals 0x3f, will have bits 0 through 5 set, and the rest clear. That is our mask.
In this example, we AND that with
0x827364, which extracts the lowest six bits from the number:

UNIX>ba_helperWhen entering numbers, you can enter: A normal decimal number as big as 2^{64}-1. A number in hex up to 16 digits, starting with 0x. A number in binary up to 64 digits, starting with B. Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:0x827364 AND 0x3fOperator: AND A: 8549220 0x0000000000827364 0000000000000000000000000000000000000000100000100111001101100100 B: 63 0x000000000000003f 0000000000000000000000000000000000000000000000000000000000111111 C: 36 0x0000000000000024 0000000000000000000000000000000000000000000000000000000000100100 ^^^^^^ Here we are extracting the lowest six bits of 0x827364: |||||| Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number: UNIX><CNTL-D>UNIX>

extracted_bits = ( ( v & ( (1ULL << (y+1)) - 1) ) >> x);

Let's use an example of extracting bits 2 through 5 of 0x827364. As in our previous examples, `(1 << (5+1))` equals
0x40, so `( (1ULL << (y+1)) - 1)` equals 0x3f. The last example above shows that 0x827364 AND 0x3f is 0x24 (100100),
and our final action is to right shift that by two bits, to get (1001) or 9. Let's just show this in the original
number 0x827364:

A: 8549220 0x0000000000827364 0000000000000000000000000000000000000000100000100111001101100100 ^^^^ Below we are extracting bits 2 through 5, which are 1001. ||||And below, we'll show the two operations:

Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:0x827364 AND 0x3fOperator: AND A: 8549220 0x0000000000827364 0000000000000000000000000000000000000000100000100111001101100100 B: 63 0x000000000000003f 0000000000000000000000000000000000000000000000000000000000111111 C: 36 0x0000000000000024 0000000000000000000000000000000000000000000000000000000000100100 Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:0x24 RS 2Operator: RS A: 36 0x0000000000000024 0000000000000000000000000000000000000000000000000000000000100100 B: 2 0x0000000000000002 0000000000000000000000000000000000000000000000000000000000000010 C: 9 0x0000000000000009 0000000000000000000000000000000000000000000000000000000000001001

For example, you can represent the set { 1, 3, 6, 7 } with the number 0xca. In binary, that number is 11001010, which, as you can see, has bits 1, 3, 6 and 7 set. When you represent sets in this way, you use the methods above to set a bit (adding an element to a set), clear a bit (removing an element from a set) and testing to see if a bit is set (testing to see if an element is in the set).

This is often *much* faster than using a **set** to represent the set. That is because a **set**
is implemented with a balanced binary tree data structure, which uses a lot of memory for each element. With
bit arithmetic, all it takes is one **int** or one **long long**. That is a big savings! With bits,
you can implement set intersection with a simple binary **AND**, set union with a simple binary **OR**,
and take the complement of a set with **NOT**.
How cool is that?

/* This is how we're holding a number. The class facilitates printing out the number. */ class Number { public: unsigned long long d; /* The number. */ string hex; /* Its representation in hex (16 hex digits with a 0x in front). */ string binary; /* Its representation in binary */ string To_String(); /* This creates a bigger string, which is kind of formatted. */ }; |

Because I want to deal with 64-bit integers, I use the type **unsigned long long**. The "unsigned"
part means that it goes from 0 to 2^{64}-1. I keep two string representations, and I have
a method called **To_String()** which prints out all three representations of the number, formatted.
When I create a **Number**, I make sure to set both string representations at that time.

Let's look at **To_String()**:

/* This returns a "formatted" string for a number. */ string Number::To_String() { char buf[200]; string s; sprintf(buf, "%21llu %s %s", d, hex.c_str(), binary.c_str()); s = buf; return s; } |

The only thing here that is remotely subtle is the "%21llu" -- this is a way of specifying that you want to
print out an **unsigned long long**, padded to 21 spaces, right justified.

Now, I've written two procedures that create **Number** instances. The first is called
**number_from_ull()**, and it creates an instance of **Number**
from an **unsigned long long**. It calls **new** to create the instance and
sets the **d** field. Next,
it sets the **hex** string using **sprintf()**, with a format string
of "0x%016llx". That says to start with "0x", then print the **unsigned long
long** as a 16-digit hex number with leading zeros.

Finally, it creates the binary string by running through the digits from 0
to 63, checking to see if the digit is set in *v* using the same technique
as I describe above in "**Checking to see if bit x is set**", and
if a bit is set, its corresponding character in the string is set to '1':

Number *number_from_ull(unsigned long long v) { Number *n; int i; char buf[200]; /* Create the Number class instance, and set the strings. */ n = new Number; n->d = v; /* Set the hexadecimal using sprintf. */ sprintf(buf, "0x%016llx", v); n->hex = buf; /* For the binary, examine each bit by doing AND with one left-shifted the proper number of bits. "1ULL" forces the compiler to treat one as an unsigned long long. Otherwise, if you shift it more than 31 bits, it will treat one as an integer, and turn it into zero. */ n->binary.resize(64, '0'); for (i = 0; i < 64; i++) if (v & (1ULL << i)) n->binary[64-i-1] = '1'; return n; } |

The second procedure that creates **Number** instances is
**number_from_string()**, and it creates a number from a string that is in any of the three
formats described above. It does this by converting the string to an **unsigned long long**
named *v*, and then calling **number_from_ull()** on *v*.

I'm showing the code below up to the point where the procedure reads the number from a binary string beginning with 'B':

/* This creates a number from a string, which is either decimal, hexadecimal (starting with 0x), or binary (starting with B). It creates all of the string representations. */ Number *number_from_string(string &s) { unsigned long long v; unsigned long long i; int b; Number *n; char buf[100]; v = 0; if (s.size() == 0) return NULL; /* Convert from binary if the string begins with 'B' */ if (s[0] == 'B') { if (s.size() == 1) return NULL; if (s.size() > 65) return NULL; for (i = 0; i < s.size()-1; i++) { b = s[s.size()-i-1]; if (b != '0' && b != '1') return NULL; if (b == '1') v |= (1ULL << i); /* Set bit i, if the corresponding character is '1' */ } |

Take a look at the **for** loop -- that loops through the digits, where *i* is the number of the
digit. In the binary string, digit 0 is the last digit of the string, so it is **s[s.size()-1]**.
Digit 1 is the digit before that one, so it is
**s[s.size()-2]**. And so on -- this is why we set **b** to be **s[s.size()-i-1]**.

When **b** is equal to '1', that means that bit *i* should be set, and I set it exactly as described
above in "**Setting a bit**":

if (b == '1') v |= (1ULL << i); |

In the **for** loop, I stop at **i < s.size()-1** instead of **s.size()**, because I want to ignore the
initial 'B' character.

Now, the next block of code reads the string if it is specified in hex, and if not, it tries to read
it in decimal. This code is pretty straightforward, except we use "%llx" to read an **unsigned long long**
in hex, and "%llu" to read an **unsigned long long** as a decimal. At the end, it calls
**number_from_ull()**.

/* Convert from hex if the string begins with "0x" */ } else if (s.substr(0, 2) == "0x") { if (s.size() == 2 || s.size() > 18) return NULL; if (sscanf(s.c_str(), "0x%llx", &v) != 1) return NULL; /* Attempt to convert from decimal. */ } else { if (sscanf(s.c_str(), "%llu", &v) != 1) return NULL; } return number_from_ull(v); } |

Finally, this last code block implements the **main()**, which reads the user
input and prints the output. I don't say anything more about this code except
for what's in the comments -- this is straightforward code, but it's good code
for you to read, because I think it is laid out well, and is easy to read,
despite the fact that it handles input errors pretty cleanly:

int main() { Number *A, *B, *C; string sa, sb, sop; int error; printf("When entering numbers, you can enter:\n"); printf(" A normal decimal number as big as 2^{64}-1.\n"); printf(" A number in hex up to 16 digits, starting with 0x.\n"); printf(" A number in binary up to 64 digits, starting with B.\n"); while (1) { error = 0; C = NULL; /* Grab A, B and the operator. */ printf("Enter a problem: number AND|OR|XOR|LS|RS|ANDNOT number:\n"); fflush(stdout); if (! (cin >> sa >> sop >> sb)) exit(1); /* Convert A and B to instances of the Number class, and error check. */ A = number_from_string(sa); B = number_from_string(sb); if (A == NULL) { printf("Bad format for the first number.\n"); error = 1; } if (B == NULL) { printf("Bad format for the second number.\n"); error = 1; } /* Do the operation if we haven't had an error so far. */ if (error == 0) { if (sop == "AND") { C = number_from_ull(A->d & B->d); } else if (sop == "OR") { C = number_from_ull(A->d | B->d); } else if (sop == "XOR") { C = number_from_ull(A->d ^ B->d); } else if (sop == "LS") { C = number_from_ull(A->d << B->d); } else if (sop == "RS") { C = number_from_ull(A->d >> B->d); } else if (sop == "ANDNOT") { C = number_from_ull(A->d & (~B->d)); } else { printf("Bad operator.\n"); error = 1; } } /* If everything was successful, print the results. */ if (error == 0) { printf("\n"); printf("Operator: %s\n", sop.c_str()); printf("A: %s\n", A->To_String().c_str()); printf("B: %s\n", B->To_String().c_str()); printf("C: %s\n", C->To_String().c_str()); printf("\n"); } /* Free up memory: Call delete on anything that you created with new. */ if (A != NULL) delete A; if (B != NULL) delete B; if (C != NULL) delete C; } exit(0); } |