Leetcode.com problem 316: "Remove Duplicate Letters"

James S. Plank

Tue Sep 26 13:08:53 EDT 2023

Problem Statement.
Success Rate (as of September, 2023): 47.9%
This is a medium problem.
Skeleton.cpp - A version that compiles, but does not implement the class.
tests.sh - A testing script.
answers.txt - Proper output to the testing script.

I found this one tricky to think about. It took me a while to figure out the pattern for an answer.

I apologize, too, as this is a sketchy writeup. The pieces are all there, but you may need to walk through them slowly.

In case Leetcode's servers are down

Here is a summary of the problem:

You are to write the following method of the Solution class:

string removeDuplicateLetters(string s);

Here are details:

s will be composed of lower-case letters, and its size is between 0 and 10,000.
Your job is to return the lexicographically smallest string that you can generate by deleting characters of s, that contains all of the characters in s, such that there are no duplicates.

Examples:

#1: s = "bcabc". 
    Answer = "abc".  You get this by deleting the first b and the first c.  You'll note
                     you could delete the second b and the second c and get "bca", which
                     has all of the characters in s, but it is not lexicographically
                     smaller than "abc".

#2: s = "cbacdcbc".
    Answer = "acdb".  You get this with "..acd.b.", where the dots are the deleted characters.

Approaches that didn't work

My first thought was to place the smallest character, then the next, and then the next. For example, if there is an 'a' in the string, then you know that the leftmost 'a' won't get deleted, and all other a's will. Unfortunately, I couldn't build that into a solution that didn't feel exponential.

My next thought was to convert the string into a graph, which is directed and acyclic, so maybe topological sort will work on it. But no, nothing came of that.

So:

The approach that did work

Here's the approach that did work. Ask yourself the following question -- what's the first character going to be in the final string? Well, we know the following things about it:

It will be followed by all of the other characters in the string.
Of all characters that are followed by all of the other characters, it will be the smallest and the leftmost.

Let me give you an example. Suppose the string is "cbbcbdab". There are four characters -- a, b, c and d. Now, let's look at the string, and for each character in the string, look at the set of characters that follow it (including the character itself):

s[0] - c - { a, b, c, d }
s[1] - b - { a, b, c, d }
s[2] - b - { a, b, c, d }
s[3] - c - { a, b, c, d }
s[4] - b - { a, b, d }
s[5] - d - { a, b, d }
s[6] - a - { a, b }
s[7] - b - { b }

The first character of the result is going to be 'b' that is at s[1]. This is the leftmost, smallest character whose set of following characters is { a, b, c, d }.

Now, we can push this character onto a return value, remove it from the set of characters that we care about, then start at s[2]. In other words, s becomes "cda", and we keep going.

When we consider the running time of this, if there are u unique characters in s, and the length of s is n, then this will be O(un).

Representing the Unique Characters in S

There are only 26 potential characters, so we can use the bits of an integer as described in http://web.eecs.utk.edu/~jplank/plank/classes/cs302/Notes/Bits/. If you're doing this problem, start writing code, and have it calculate the bit set and print it out. Here are some inputs to try:

UNIX> echo a | a.out            # fe dcba
The Bitset: 0x1                 # 00 0001
UNIX> echo ab | a.out
The Bitset: 0x3                 # 00 0011
UNIX> echo abc | a.out
The Bitset: 0x7                 # 00 0111
UNIX> echo aabbccaabbcc | a.out
The Bitset: 0x7                 # 00 0111
UNIX> echo eabc | a.out
The Bitset: 0x17                # 01 0111
UNIX> echo fabc | a.out
The Bitset: 0x27                # 10 0111
UNIX>

Now, traverse s from back to front. Each time you read a character, add it to a set of characters read so far. When that set equals the set of all characters, then start to keep track of the smallest character, and its leftmost occurrence.

Here is that data on "cbbcbdab":

s[7] = 'b'.  Characters seen so far: 0x2.
s[6] = 'a'.  Characters seen so far: 0x3.
s[5] = 'd'.  Characters seen so far: 0xb.
s[4] = 'b'.  Characters seen so far: 0xb.
s[3] = 'c'.  Characters seen so far: 0xf.  Best leftmost character so far is this one.
s[2] = 'b'.  Characters seen so far: 0xf.  Best leftmost character so far is this one.
s[1] = 'b'.  Characters seen so far: 0xf.  Best leftmost character so far is this one.
s[0] = 'c'.  Characters seen so far: 0xf.

So -- we add 'b' to the return value, and then we remove it from the set of characters that we care about. Now, we traverse the string again from the back to s[1]. That's how we build up the return value. Here are all iterations on "cbbcbdab":

The Target set of Characters: 0xf
s[7] = 'b'.  Characters seen so far: 0x2.
s[6] = 'a'.  Characters seen so far: 0x3.
s[5] = 'd'.  Characters seen so far: 0xb.
s[4] = 'b'.  Characters seen so far: 0xb.
s[3] = 'c'.  Characters seen so far: 0xf.  Best leftmost character so far is this one.
s[2] = 'b'.  Characters seen so far: 0xf.  Best leftmost character so far is this one.
s[1] = 'b'.  Characters seen so far: 0xf.  Best leftmost character so far is this one.
s[0] = 'c'.  Characters seen so far: 0xf.
Pushing b onto the return value, and removing it from the Target Set
new return value: b
new target set:   0xd
lowest string index: 1

The Target set of Characters: 0xd
s[6] = 'a'.  Characters seen so far: 0x1.
s[5] = 'd'.  Characters seen so far: 0x9.
s[3] = 'c'.  Characters seen so far: 0xd.  Best leftmost character so far is this one.
Pushing c onto the return value, and removing it from the Target Set
new return value: bc
new target set:   0x9
lowest string index: 3

The Target set of Characters: 0x9
s[6] = 'a'.  Characters seen so far: 0x1.
s[5] = 'd'.  Characters seen so far: 0x9.  Best leftmost character so far is this one.
Pushing d onto the return value, and removing it from the Target Set
new return value: bcd
new target set:   0x1
lowest string index: 5

The Target set of Characters: 0x1
s[6] = 'a'.  Characters seen so far: 0x1.  Best leftmost character so far is this one.
Pushing a onto the return value, and removing it from the Target Set
new return value: bcda
new target set:   0x0
lowest string index: 6