Leetcode.com problem 393: "UTF-8-Validation"

James S. Plank

Tue Sep 13 00:24:26 EDT 2022

Hints

This is a good practice problem for bit arithmetic. What I did was maintain a variable called state. You start with state equal to zero. Then: When you run out of data, make sure state is zero.

If you're uncertain about your bit arithmetic, here is an example:

UNIX> echo 0x65 0xc4 0xbb 0xef 0xaf 0x84 0xf6 0x9b 0x8c 0xad 0x00 | ./a.out
true
UNIX> 

state  input-hex  input-binary    action
  0       0x65      0110 0101     It's a legal, one-byte value. Keep state at zero.
  0       0xc4      1100 0100     It's a legal, two-byte value. Set state to one.
  1       0xbb      1011 1011     That's a legal second byte. Decrement state to zero.
  0       0xef      1110 1111     It's a legal, three-byte value. Set state to two.
  2       0xaf      1010 1111     That's a legal second byte. Decrement state to one.
  1       0x84      1000 0100     That's a legal third byte. Decrement state to zero.
  0       0xf6      1111 0110     It's a legal, four-byte value. Set state to three.
  3       0x9b      1001 1011     That's a legal second byte. Decrement state to one.
  2       0x8c      1000 1100     That's a legal third byte. Decrement state to one.
  1       0xad      1010 1101     That's a legal fourth byte. Decrement state to zero.
  0       0x00      0000 0000     It's a legal, one-byte value. Keep state at zero

At the end, the state is 0, so return true.

If you're uncertain about how to check for bit patterns that contain zeros and ones, here's what I did to read the byte when state ≠ 0. Let's suppose the byte is b. I want to see if the first two bits of b are 10. What I did was XOR b with 0x40. That way, if its second bit was a 0, it's now a 1. Then I tested to see if the first two bits are one. The code was:

if (((b ^ 0x40) & 0xc0) == 0xc0)

See if you can figure out the rest!