Longest Substring with At Most K Distinct Characters

Rework later. Don't write code when you travel.

Question (LC.340)

Given a string, find the longest substring that contains at most K distinct characters.

Example

I: s = "eceba", k = 2
O: 3 because "ece"
I: s = "eceba", k = 3
O: 4 bacause "eceb"

Analysis

The question asks for a longest substring with a constraint. Typically, this is a substring/window problem.

Brute Force Approach

From the start of every string, have a hashset, stops when size exceeds k. Store an int to keep track of the longest substring under this constraint.

Here is the pseudo-code:

func longestSubKD(str, k)
    if s == null || len(s) == 0
        return 0

    int maxLenght = 0
    Set<Character> charSet = new HashSet

    for i from 0 to len(s)
        charSet.clear()
        for j from i to len(s) and charSet.size() <= k // comment
            charSet.add(str.charAt(j))
        end for
        maxLength = Math.max(maxLength, j-i) 
        // since the for loop breaks after the condition becomes false
        // (one step beyond the constraint), j-i+1 the +1 is omitted
    end for

    return maxLength
end func

The pseudo-code looks good but there's one logical error.

How For Loop Works

int i = 0;
int j = i;

for (i = 0; i < 5 && (j - 1) <= 3; i++) {
    j = i;
}
System.out.println("j: " + j + " i: " + i);

for (j = 0; j - 1 <= 3; j++) {
}
System.out.println("the other j: " + j);

What is the output for i, j, and the other j? They should all output 5 right? Noooo ʕº̫͡ºʔ In fact, i = 5, j = 4!, the other j = 5. How so? Let's examine how a for loop actually works

Init: i = 0, j = i
Condition: i < 5 && (j - 1) <= 3
Conditional Code: j = i
Increment: i++

             --F-> end loop
             |
Init ----> Condition --T-> ConCode ----> Increment
             /                               |
             |                               /
             --------------------------------

You see we get to increment i = 5, then the condition becomes false and the loop ends. We never get to the conditional code in this case.

Brute Force Code

Very brute force O(n^3)



# generate every substring 

# then calcualte the longest substring that satisfies the condition 


def lengthOfLongestSubstringKDistinct(self, s: str, k: int) -> int:
    
    n = len(s)
    
    max_length = 0 
    
    if n == 0 or k <= 0:
        return max_length
    
    
    for i in range(n):
        for j in range(i+1, n+1):
            current_substr = s[i:j]
            current_set = set(current_substr)
            if len(current_set) > k:
                break
            if len(current_substr) > max_length:
                max_length = len(current_substr)
    
    return max_length 
    

brute force O(n^2)

def lengthOfLongestSubstringKDistinct(self, s: str, k: int) -> int:
    
    n = len(s)
    
    max_size = 0 
    
    if n == 0 or k <= 0:
        return max_size
    

    
    for i in range(n):
        
        current_set = set()
        
        for j in range(i, n):
            
            current_set.add(s[j])
            
            if len(current_set) > k:
                break
            
            current_window_size = j - i + 1 
            
            if current_window_size > max_size:
                max_size = current_window_size
    
    return max_size 
    
public int lengthOfLongestSubstringKDistinct(String str, int k) {
    if (k <= 0 || str == null || str.length() == 0) {
        return 0;
    }

    int maxLength = 0;
    Set<Character> charSet = new HashSet<>();

    int i, j;
    for (i = 0; i < str.length(); i++) {
        charSet.clear();
        for (j = i; j < str.length(); j++) {
            charSet.add(str.charAt(j));
            if (charSet.size() > k) {
                break;
            }
        }
        maxLength = Math.max(maxLength, j-i);
    }

    return maxLength;
}

The time complexity is O(n^2). A relatively long string will result in Time Limit Exceeded.

Sliding Window Approach

Observe the following:

I: s = "eceba", k = 3
S  e c e b a
L1 i---->j
L2   i---->j
L3     i-->j
L4       i->j

Moving j back to i is not necessary. Just extend the window as we go.

Set is not enough. We need a map for the frequency of each character.

Sliding Window Code

Extend - add new key to map or increment the value
Maintain - check the current window is the max length
Shrink - decrement the value or evict the key if value is 0
public int lengthOfLongestSubstringKDistinct(String s, int k) {
    // validate input
    if (s == null || k < 0) {
        return -1;
    }
    // init map, key is character in s, value is its occurence
    Map<Character, Integer> charMap = new HashMap<>();
    // sliding window
    int maxLength = 0;
    int j = 0;
    for (int i = 0; i < s.length(); i++) {
        // extend
        char curChar;
        while (j < s.length()) {
            curChar = s.charAt(j);
            if (charMap.containsKey(curChar)) {
                charMap.put(curChar, charMap.get(curChar) + 1);
            } else {
                if (charMap.size() < k) {
                    charMap.put(curChar, 1);
                } else {
                    break; // and condition in the while loop
                }
            }
            j++;
        }
        // update
        maxLength = Math.max(maxLength, j - i);
        // shrink
        char removeChar = s.charAt(i);
        if (charMap.get(removeChar) != null && charMap.get(removeChar) > 1) {
            charMap.put(removeChar, charMap.get(removeChar) - 1);
        } else {
            charMap.remove(removeChar);
        }
    }
    return maxLength;
}
# because you have to extend to get a longer answer 
# compute the substrings in between is not going to yield a better solution  

def lengthOfLongestSubstringKDistinct(self, s: str, k: int) -> int:
    
    n = len(s)
    
    max_size = 0 
    
    if n == 0 or k <= 0:
        return max_size
    
    # char is the just str with len 1 
    letter_dict: Dict[str, int] = {}
    j = 0
    
    for i in range(n):
        
        # extend 
        # exit when there is one over 
        while j < n:
            current_letter = s[j]
            if current_letter in letter_dict:
                letter_dict[current_letter] = letter_dict[current_letter] + 1
                j += 1                    
            else:
                if len(letter_dict) == k:
                    break
                
                letter_dict[current_letter] = 1
                j += 1
        
        # check
        current_window_size = j - i
        max_size = max(max_size, current_window_size)
        
        # shrink 
        first_letter = s[i]
        if letter_dict[first_letter] > 1:
            letter_dict[first_letter] = letter_dict[first_letter] - 1
        else:
            letter_dict.pop(first_letter)
    
    return max_size 
    

Complexity

Only slide the window one way so O(n) time. O(k) space.

Follow Up

Data stream

Last updated