class Solution { public: vector <vector <int> > palindromePairs(vector<string>& words); }; |
You can give it a single command line argument. If that argument has an 's' character, then it will print the answer with a very slow algorithm before it runs yours. That lets you test your answers, at least for small inputs.
If the argument has a 'p', then it will print your answers.
If the argument has a 't', then it will show the timing of your solution.
So here are the first three examples, solved with the "slow" solution:
UNIX> g++ skeleton.cpp UNIX> cat ex1.txt abcd dcba lls s sssll UNIX> ./a.out s < ex1.txt 0 1 # abcddcba 1 0 # dcbaabcd 2 4 # llsssll 3 2 # slls UNIX> cat ex2.txt bat tab cat UNIX> ./a.out s < ex2.txt 0 1 1 0 UNIX> cat ex3.txt a - UNIX> ./a.out s < ex3.txt 0 1 1 0 UNIX>
vector <vector <int> > Solution::slow(vector<string>& words) { int i, j, k; string w; vector <vector <int> > rv; vector <int> p; bool ok; p.resize(2); for (i = 0; i < words.size(); i++) { // Enumerate all first words. for (j = 0; j < words.size(); j++) { // Enumerate all second words. if (j != i) { w = words[i] + words[j]; // Concatenate them. ok = true; for (k = 0; ok && k < w.size()/2; k++) { // Test to see if the result is a palindrome. ok = (w[k] == w[w.size()-k-1]); } if (ok) { p[0] = i; p[1] = j; rv.push_back(p); } } } } return rv; } |
Why is this too slow? Well, let's name the variables for the analysis:
That's not a difficult observation to come up with, but turning it into a solution is more difficult.
A data structure that we don't teach in our "Data Structures and Algorithms" courses is a "Trie". It's useful when it's useful. And it's not when it's not. So let's define it.
A trie is a multiway tree. Each node represents a character in a string, and its links to children represent subsequent characters in the string. I know that sounds pretty obtuse, so here's the trie for Example 1 above. I've put stars into nodes that end words.
The links are stored as an array of pointers with 26 entries, one for each character. Therefore, when you're using the trie to find a word, you move on from each node in O(1) time. Here's my node definition (I don't represent that a node ends a word here).
struct t_node { char c; // You actually don't need c or the parent. char parent; // However, they are useful for debugging bool eow; // True if this node ends a word (starred nodes above) vector <struct t_node *> children; }; |
How can we use the Trie? There are a bunch of ways. One way is the following -- for each node in the trie, maintain a vector indices. Process every word, and for every prefix of c characters in the word, if the resulting suffix is a palindrome, then put the word's index into the trie at character c.
I know that's a mouthful, so let's take a look at the trie for our example above.
Consider the string "abcd". When the prefix of this string is "a", then its suffix is "bcd", which is not a palindrome. That's why a's indices vector is empty. However, when the prefix is "abc", then the suffix is "d", which is a palindrome. That's why c's indices vector contains the value 0 ("abcd" is words[0]).
One we have this trie built, we can traverse the words, and process them from back to front. When you get to the end of a word, if you are at a node in the trie whose indices vector is non-empty, then you add answers to the return value.
Let's look at some examples with the trie above:
And let's go over processing the words, where we process each word from front to back. In case you're lost,
Now, for each trie, you traverse all of the words, and search the trie (either forward-to-backward or backward-to-forward). At each character, if the node is an ending node, then you test the word to see if its prefix/suffix is a palindrome, and if so, you have a pair of words for the return value. This should require both less storage and less palindrome checking.
Let's use words[4], "sssll", as an example. We'll traverse the first trie, and when we do so, we need to traverse "sssll" backward: l-l-s-s-s.
Then, it did one traversal of words, and basically mixed my previous solutions. I don't have time to go over it, so I won't, but it was indeed both fast and small enough to pass the test.
For the hash function, I use my favorite -- the "djb_hash" function from COSC202, but I store it into an unsigned long long rather than an int. That reduces my chance of collisions:
unsigned long long djb_hash(const string &s) { size_t i; unsigned long long h; h = 0x1505; for (i = 0; i < s.size(); i++) { h = (h << 5) + h + s[i]; } return h; } |
Here's the code that processes the words from back to front:
// For each word, go from back to front, and look up the hash // of the suffix in the map. If it's there, and the prefix // is a palindrome, then you have a match! for (i = 0; i < words.size(); i++) { string &s = words[i]; h = 0x1505; for (j = s.size()-1; j >= -1; j--) { mit = m.find(h); if (mit != m.end() && mit->second != i && ispal(s, 0, j)) { rv.resize(rvs+1); rv[rvs].push_back(mit->second); rv[rvs].push_back(i); rvs++; } if (j >= 0) h = (h << 5) + h + s[j]; } } |
If you need an example, let's use the one above, with "abcd", "dcba", "s", "lls" and "sssll".
0x000000017c93ee4f is the hash of abcd 0x000000017c95978f is the hash of dcba 0x000000000b888cb0 is the hash of lls 0x000000000002b618 is the hash of s 0x0000003110610936 is the hash of sssll |
Here's the pass where we process the strings backward.
Considering abcd 0x0000000000001505 is the hash of "" backward 0x000000000002b609 is the hash of "d" backward 0x000000000059778c is the hash of "cd" backward 0x000000000b88696e is the hash of "bcd" backward 0x000000017c95978f is the hash of "abcd" backward Found a match with dcba Considering dcba 0x0000000000001505 is the hash of "" backward 0x000000000002b606 is the hash of "a" backward 0x0000000000597728 is the hash of "ba" backward 0x000000000b885c8b is the hash of "cba" backward 0x000000017c93ee4f is the hash of "dcba" backward Found a match with abcd Considering lls 0x0000000000001505 is the hash of "" backward 0x000000000002b618 is the hash of "s" backward Found a match with s 0x0000000000597984 is the hash of "ls" backward 0x000000000b88aa70 is the hash of "lls" backward Considering s 0x0000000000001505 is the hash of "" backward 0x000000000002b618 is the hash of "s" backward Considering sssll 0x0000000000001505 is the hash of "" backward 0x000000000002b611 is the hash of "l" backward 0x000000000059789d is the hash of "ll" backward 0x000000000b888cb0 is the hash of "sll" backward Found a match with lls 0x000000017c9a2323 is the hash of "ssll" backward 0x000000310fde87f6 is the hash of "sssll" backward |
This approach gets faster scores on Leetcode (no collisions -- yay!), although it's slower on my ex-nasty.txt file. Go figure. I think I've beaten this problem to death sufficiently.