SRM 689, D2, 250-Pointer (SimilarUserDetection)

James S. Plank

Thu Mar 30 13:17:56 EDT 2017
Sat Oct 21 10:53:09 EDT 2023

Problem Statement.
A main() with the examples compiled in.
A skeleton that compiles with main.cpp.
tests.sh. MD5 hash: f6315d3c93e99e583a748cacbbff74b0
answers.txt. MD5 hash: 62254b9905a72e26c1e081e60e9977fc
Problem Given in Topcoder: April, 2016
Competitors who opened the problem: 488
Competitors who submitted a solution: 462
Number of correct solutions: 321
Accuracy (percentage correct vs those who opened): 65.8%
Average Correct Time: 15 minutes, 22 seconds.

In case Topcoder's servers are down

Here is a summary of the problem:

Topcoder usernames, or "handles", are composed of upper-case letters, lower-case letters, and numbers.
Two handles are considered "similar" if they are the same, with the following exceptions:
- If one has an '0' where the other one has an 'O', then they are similar.
- If one has an 'l', '1' or 'I' where the other one has an 'l', '1' or 'I', then they are similar.
For example, "II11ll00OO" is similar to "I1lI1l0O0O".
You are given a vector of strings called handles, which will be between 2 and 50 elements in size. Each of these is a string that is between 2 and 50 elements in size.
If any pair of strings in handles is "similar" to each other, return the string "Similar handles found".
Otherwise, return "Similar handles not found".

Examples

#  Input vector handles                             Return Value
-  ----- ------ -------                             ------ -----
0  {"top", "coder", "TOPCODER", "TOPC0DER"}         "Similar handles found"
1  {"Topcoder", "topcoder", "t0pcoder", "topcOder"} "Similar handles not found"
2  {"same", "same", "same", "different"}            "Similar handles found"
3  {"0123", "0I23", "0l23", "O123", "OI23", "Ol23"} "Similar handles found"
4  {"i23", "123", "456", "789", "000", "o", "O"}    "Similar handles not found"

Testing yourself

I have a shell script tests.sh , that you can use to test your program. When you run tests.sh, your answer should be identical to answers.txt.

Hints

Try this on your own. If the maximum length of a string is s, and the number of strings is n, then your solution should be O(sn).

I will give you more explanation below.

The simple way to attack this problem is to convert each string to a canonical form. If two strings are "similar," then their canonical forms should be identical. That makes it easy to test for similarity.

What I did was convert each character '0' to 'O', each '1' to 'l' and each 'I' to 'l'. That does the trick -- after the conversion, two strings that were similar will be the same. Simply insert each string into an unordered_set, and test the unordered_set's size. If there were duplicates, then the unordered_set will be smaller than the vector handles.

Can you calculate the running time I gave above? You are doing n insertions into an unordered_set. That will be O(n). Each comparison of two strings will take potentially s comparisons of letters. That gives you the factor of s: O(sn).