SRM 683, D2, 250-Pointer (EqualSubstrings2)
James S. Plank
Wed Jan 27 16:58:02 EST 2021
In case Topcoder's servers are down
Here is a summary of the problem:
- You are givin a string s composed exclusively of lower-case letters.
- Your job is to compute and return the number of non-overlapping, non-empty,
identical substrings of s
- Topcoder limits the size of s 50, but I'm going to update that to
400 to make it a little more challenging.
The examples
# S Answer
0 "aa" 1 - the only matching substrings are "a" and "a"
1 "abcd" 0
2 "aba" 1 - again, "a" and "a" are the only matching substrings.
3 "abab" 3 - "a" & "a", "b" & "b", "ab" & "ab".
4 "aaaab" 7 - 6 different pairs of "a", one pair of "aa".
5 "onetwothreeonetwothree" 86
Testing yourself
Like the Cryptography Problem,
I have a shell script tests.sh, that you can use to test your program. When you
run tests.sh, your answer should be identical to
answers.txt
Hints
Let's evaluate the big-O of a brain-dead answer:
- Enumerate all substrings x of s. (O(n2)).
- For each of these, enumerate all substrings that start after x and that
are the same size as x. That's O(n).
- For each of these, compare the substring to x, and increment a counter if they
match. This is O(n) as well, although when the strings don't match, the
comparison can exit early.
That's looking like a whopping O(n4), but with n capped at 50,
it's fine. Even with n at 400, it finishes within the Topcoder limits when compiler
optimization is used.
You can make this a little faster if, instead of generating substrings and comparing them,
you simply use the C library routine strncmp(). Now, you don't need to make copies
of any substrings -- it compares them all in place.
A final way of making it faster is call s.find(s1, k), where s1 is the first
substring, and k is the index for the second substring. If it returns string::npos,
then you don't have to look any more. Otherwise, you increment the total, and set k to
be one after the return value.
I've programmed all of these, and their timings on tests.sh are as follows:
- Simplest approach: generating and comparing substrings: 15.481 seconds.
- Using the find() method to find the first substring in s: 1.107 seconds.
- Using strncmp(): 3.284 seconds
This would be a great application of the
Rabin-Karp algorithm. Someday, I'll hack it up here and see what the speedup is. Until then, it would
be good practice for you!