Hints for SRM 682, D2, 550-Pointer (TopBiologist)
James S. Plank
Writeup: Mon Feb 29 14:35:24 EST 2016
Problem Statement.
My thoughts went instantly to enumeration here. If you enumerate DNA strings, how many will
you have to enumerate until you get one that's not in the sequence?
- There are 4 one letter strings.
- There are 16 two letter strings.
- There are 64 three letter strings.
- There are 256 four letter strings.
- There are 1024 five letter strings.
- There are 4096 six letter strings.
Since sequence is limited to 2000 characters, it can hold a maximum of 1996 five-letter
strings, and 1995 six-letter strings. So, if you enumerate strings, you'll probably stop
during the five letter strings, and definitely during the six letter strings. So,
you enumerate roughly 2000 times and call find() on a 2000 character string -- that
should fall right within topcoder's limits.
How do you enumerate strings? I'd recommend the following strategy. You keep a vector of all
strings, and initialize it with "". Then, for each element e on the vector, create
four strings by concatenating each of the four characters to e. Look
in sequence for each string as you create it, and if it's not there, you're done.
If it is, then append it to the vector, and keep going.
Can you do this more efficiently when your return value is five or six letters? Yes -- for
each value l, starting at one and incrementing, use substr() to grab each
substring of sequence with l characters. Insert it into a set. You can
now look for the enumerated strings in the set rather than by using find() on the string.
You don't have to do this, but it would improve upon the running time.