Given a string, find the longest substring that contains at most K distinct characters.
Example
I: s = "eceba", k = 2
O: 3 because "ece"
I: s = "eceba", k = 3
O: 4 bacause "eceb"
Analysis
The question asks for a longest substring with a constraint. Typically, this is a substring/window problem.
Brute Force Approach
From the start of every string, have a hashset, stops when size exceeds k. Store an int to keep track of the longest substring under this constraint.
Here is the pseudo-code:
func longestSubKD(str, k)
if s == null || len(s) == 0
return 0
int maxLenght = 0
Set<Character> charSet = new HashSet
for i from 0 to len(s)
charSet.clear()
for j from i to len(s) and charSet.size() <= k // comment
charSet.add(str.charAt(j))
end for
maxLength = Math.max(maxLength, j-i)
// since the for loop breaks after the condition becomes false
// (one step beyond the constraint), j-i+1 the +1 is omitted
end for
return maxLength
end func
The pseudo-code looks good but there's one logical error.
How For Loop Works
int i = 0;
int j = i;
for (i = 0; i < 5 && (j - 1) <= 3; i++) {
j = i;
}
System.out.println("j: " + j + " i: " + i);
for (j = 0; j - 1 <= 3; j++) {
}
System.out.println("the other j: " + j);
What is the output for i, j, and the other j? They should all output 5 right? Noooo ʕº̫͡ºʔ
In fact, i = 5, j = 4!, the other j = 5. How so? Let's examine how a for loop actually works
Init: i = 0, j = i
Condition: i < 5 && (j - 1) <= 3
Conditional Code: j = i
Increment: i++
--F-> end loop
|
Init ----> Condition --T-> ConCode ----> Increment
/ |
| /
--------------------------------
You see we get to increment i = 5, then the condition becomes false and the loop ends. We never get to the conditional code in this case.
Brute Force Code
Very brute force O(n^3)
# generate every substring
# then calcualte the longest substring that satisfies the condition
def lengthOfLongestSubstringKDistinct(self, s: str, k: int) -> int:
n = len(s)
max_length = 0
if n == 0 or k <= 0:
return max_length
for i in range(n):
for j in range(i+1, n+1):
current_substr = s[i:j]
current_set = set(current_substr)
if len(current_set) > k:
break
if len(current_substr) > max_length:
max_length = len(current_substr)
return max_length
brute force O(n^2)
def lengthOfLongestSubstringKDistinct(self, s: str, k: int) -> int:
n = len(s)
max_size = 0
if n == 0 or k <= 0:
return max_size
for i in range(n):
current_set = set()
for j in range(i, n):
current_set.add(s[j])
if len(current_set) > k:
break
current_window_size = j - i + 1
if current_window_size > max_size:
max_size = current_window_size
return max_size
public int lengthOfLongestSubstringKDistinct(String str, int k) {
if (k <= 0 || str == null || str.length() == 0) {
return 0;
}
int maxLength = 0;
Set<Character> charSet = new HashSet<>();
int i, j;
for (i = 0; i < str.length(); i++) {
charSet.clear();
for (j = i; j < str.length(); j++) {
charSet.add(str.charAt(j));
if (charSet.size() > k) {
break;
}
}
maxLength = Math.max(maxLength, j-i);
}
return maxLength;
}
The time complexity is O(n^2). A relatively long string will result in Time Limit Exceeded.
Sliding Window Approach
Observe the following:
I: s = "eceba", k = 3
S e c e b a
L1 i---->j
L2 i---->j
L3 i-->j
L4 i->j
Moving j back to i is not necessary. Just extend the window as we go.
Set is not enough. We need a map for the frequency of each character.
Sliding Window Code
Extend - add new key to map or increment the value
Maintain - check the current window is the max length
Shrink - decrement the value or evict the key if value is 0
public int lengthOfLongestSubstringKDistinct(String s, int k) {
// validate input
if (s == null || k < 0) {
return -1;
}
// init map, key is character in s, value is its occurence
Map<Character, Integer> charMap = new HashMap<>();
// sliding window
int maxLength = 0;
int j = 0;
for (int i = 0; i < s.length(); i++) {
// extend
char curChar;
while (j < s.length()) {
curChar = s.charAt(j);
if (charMap.containsKey(curChar)) {
charMap.put(curChar, charMap.get(curChar) + 1);
} else {
if (charMap.size() < k) {
charMap.put(curChar, 1);
} else {
break; // and condition in the while loop
}
}
j++;
}
// update
maxLength = Math.max(maxLength, j - i);
// shrink
char removeChar = s.charAt(i);
if (charMap.get(removeChar) != null && charMap.get(removeChar) > 1) {
charMap.put(removeChar, charMap.get(removeChar) - 1);
} else {
charMap.remove(removeChar);
}
}
return maxLength;
}
# because you have to extend to get a longer answer
# compute the substrings in between is not going to yield a better solution
def lengthOfLongestSubstringKDistinct(self, s: str, k: int) -> int:
n = len(s)
max_size = 0
if n == 0 or k <= 0:
return max_size
# char is the just str with len 1
letter_dict: Dict[str, int] = {}
j = 0
for i in range(n):
# extend
# exit when there is one over
while j < n:
current_letter = s[j]
if current_letter in letter_dict:
letter_dict[current_letter] = letter_dict[current_letter] + 1
j += 1
else:
if len(letter_dict) == k:
break
letter_dict[current_letter] = 1
j += 1
# check
current_window_size = j - i
max_size = max(max_size, current_window_size)
# shrink
first_letter = s[i]
if letter_dict[first_letter] > 1:
letter_dict[first_letter] = letter_dict[first_letter] - 1
else:
letter_dict.pop(first_letter)
return max_size
Complexity
Only slide the window one way so O(n) time. O(k) space.