Word Break

Question (LC.139)

Given a non-empty string and a dictionary of words, determine if the string can be segmented into a sequence of words.

Example

I: "helloworld", ["hello", "world"]
O: true
I: "hellow", ["hello", "world"]
O: false

This is a doability question (can be reframed as an optimizaiton problem) Overlapping subproblems - [__][_][_][___]_____ Start from start find a word then keep search, we can clearly see some subtrees can be the same

Single Sequence DP

1. Define optimal subproblem
BK[n] = is [0:n] a separatable string

2. Solve by prefix
for i from 0 to n-1
    if BK[i] and dict.contains(S[i:n])
        BK[n] = true
otherwise by default BK[n] = false
remember array index is 1 behind dp index

3. Base case
BK[0] = true

4. Topo order
for i from 1 to n

5. Final answer
return BK[n]
public boolean wordBreak(String str, List<String> wordList) {
    // preprocessing
    Set<String> dict = new HashSet<>();
    for (String word : wordList) {
        dict.add(word);
    }
    int n = str.length();
    // create memo table
    boolean[] BK = new boolean[n+1];
    // init base case
    BK[0] = true;
    // topo order
    for (int i = 1; i <= n; i++) {
        for (int j = 0; j < i; j++) {
            if (BK[j] && dict.contains(str.substring(j+1-1,i+1-1))) {
                BK[i] = true; // break;
            }
        }
    }
    // final answer
    return BK[n];
}

Time & Space Complexity

Time right now is O(n^2). We can improve the performance by adding a break statement once BK[i] is set to true. And get the maximum word length in the preprocessing and add an additional check in the inner loop. Space complexity needs to be linear since the dependencies are all of the previous subproblems.

Follow Up (LC.140)

Instead of checking the breakability, we want to return a list of strings on all the possibles ways to segment the input string.

Example

Example:
I: s = "catsanddog"
O: ["cat sand dog", "cats and dog"]
I: s = "catsanddogg"
O: []
find the first word to break
then recursively to break the suffix
memoize the already searched subtree to prune searches

Code

public List<String> wordBreak(String input, List<String> wordList) {
    // prepossessing
    Set<String> dict = new HashSet<>();
    for (String word : wordList) {
        dict.add(word);
    }
    // kick off the recurrence
    Map<String, List<String>> dp = new HashMap<>();
    return breakHelper(input, dict, dp);
}

private List<String> breakHelper(String string, 
                                 Set<String> dict, 
                                 Map<String, List<String>> dp) {
    // prune the repetitive subtrees
    if (dp.containsKey(string)) {
        return dp.get(string);
    }
    // base case
    List<String> result = new ArrayList<>();
    if (string.length() == 0) {
        return result;
    }
    // last word case no space after
    if (dict.contains(string)) {
        result.add(string);
    }
    // find the first word to break
    // then search on suffix
    for (int i = 1; i < string.length(); i++) {
        String word = string.substring(0, i);
        String suffix = string.substring(i);
        if (dict.contains(word)) {
            List<String> swordList = breakHelper(suffix, dict, dp);
            for (String swords : swordList) {
                // need concatenate the prefix with each list of suffix words
                result.add(word + " " + swords);
            }
        }
    }
    // when backtrack memoize the subtree
    dp.put(string, result);
    return result;
}

Time Complexity

O(len(wordDict) ^ len(n / minWordLength))

Last updated