SRM 689, D2, 250-Pointer (SimilarUserDetection)

James S. Plank

Thu Mar 30 13:17:56 EDT 2017
Try this on your own. If the maximum length of a string is s, and the number of strings is n, then your solution should be O(sn log(n)).

I will give you more explanation below.

The simple way to attack this problem is to convert each string to a canonical form. If two strings are "similar," then their canonical forms should be identical. That makes it easy to test for similarity.

What I did was convert each character '0' to 'O', each '1' to 'l' and each 'I' to 'l'. That does the trick -- after the conversion, two strings that were similar will be the same. Simply insert each string into a set, and test the set's size. If there were duplicates, then the set will be smaller than the vector handles.

Can you calculate the running time I gave above? You are doing n insertions into a set. That will be O(n log(n)). Each comparison of two strings will take potentially s comparisons of letters. That gives you the factor of s: O(sn log(n)).