Wildcard Matching

Wildcard Matching (LI.192)

Match a given string with a wildcard pattern
'?' Matches any single character
'*' Matches any sequence of characters (including empty)

Example

isMatch("ab", "?*") => true
isMatch("aa","a") => false
isMatch("aabceef","*a*") => true
isMatch("minutes","m?nu*s") => true

Brute Force Search

Search by prefix

base case
if (i == 0 && j == 0) return true
if either one is not 0 return false

search
if B[j] == '*'
    match 1,2,3,4,...
    skip
if B[j] == '.'
    match 1
if A[i] == B[j]
    match 1
else
    don't match return false

Incorrect DP

From the usage of '*', we can clearly see there are overlapping problems. Also this problem is asking doability so 0 or 1 edge can translate into an optimization problem. But what's wrong with this DP solution?

1. Define optimal subproblem
RX[i,j] = whether A[0:i] matches B[0:j]

2. Solve by prefix
Case 1: 
if B[j] == '*'
    WD[i,j] = WD[i-p, j-1] for all p from [0 to i-j] // match 1,2,3,4,...
    WD[i,j] = WD[i,j-1] // skip
Case 2:
if B[j] == '?'
    WD[i,j] = WD[i-1,j-1] // match 1

Case 3:
if A[i] == B[j]
    WD[i,j] = WD[i-1,j-1] // match 1
else
    WD[i,j] = false

3. Init base case
// first row
for j from 0 to n
    WD[0,j] = true

// first col
for i from 0 to m
    WD[i,0] = true

4. Topo order
row by row
for i from 1 to m
    for j from 1 to n

5. Final answer
RX[m,n]

The way it initializes the base cases obviously doesn't match with the definition of the subproblem. Yes, in 2D DP we normally initialize the first col and the first row. But always draw an example to made sure if it actually makes sense in the context of the question. So WD[0][0] should be true because an empty string matches up with an empty string. Then the rest of the first column should be all false because empty string cannot match with anything else. The first row is more interesting. Consider case "" and "******". The first row should be all true in this set up. We want to keep the first row true until we hit the first non-star character.

A second issue is the recurrence when B[j] is star. Think about the definition of the subproblem. Consider this example "lol" and "**". The current recurrence only gives WD[1,0]. The intention of the current recurrence is clear - it tries to match 1 or more characters with one * symbol. But we are doing DP. We are not solving these subproblems. They are already solved. All the dependencies are cached and ready to be reused. We only care about the current step (or the last step if we solve by prefix). Then, what are the possibilities for the * in the current step? 1. It can stand by do nothing pass l to rely on its previous buddy to match it 2. Or I can match with the l to get the job done and call it a day (just as the average Joe ?) 3. I need to match with l and might need to match more in the future so you need to pay me overtime. (Or in the context of edit distance, we are deleting the character from s1) The third option might be hard to come up right away. Solving by prefix guarantees us that any previous subproblems are solved and taken care of, but it doesn't guarantee us any subproblems afterward.

Correct DP

1. Define optimal subproblem
RX[i,j] = whether A[0:i] matches with B[0:j]

2. Solve by prefix
Case 1: 
if B[j] == '*'
    RX[i,j] = RX[i,j-1] // skip (horizontal)
    RX[i,j] = RX[i-1,j-1] // one and done (diagonal)
    RX[i,j] = RX[i-1,j] // match more than one (vertical)
Case 2:
if B[j] == '?'
    RX[i,j] = RX[i-1,j-1] // match one and done

Case 3:
if A[i] == B[j]
    RX[i,j] = RX[i-1,j-1] // match one and done
else
    RX[i,j] = false

3. Init base case
RX[0,0] = true
// first row
for j from 1 to n
    if (B[j-1] != *'') break;
    RX[0,j] = true // if all star
// first col auto init to false

4. Topo order
row by row
for i from 1 to m
    for j from 1 to n

5. Final answer
RX[m,n]

Code

public boolean isMatch(String string, String pattern) {     
    char[] A = string.toCharArray(), B = pattern.toCharArray();
    int m = A.length, n = B.length;
    // create memo table
    boolean[][] WD = new boolean[m+1][n+1];
    // init base case
    WD[0][0] = true;
    // first row
    for (int j = 1; j <= n; j++) {
        if (B[j-1] != '*') break;
        WD[0][j] = true;
    }
    // topo order
    for (int i = 1; i <= m; i++) {
        for (int j = 1; j <= n; j++) {
            if (B[j-1] == '*') {
                WD[i][j] = 
                    WD[i][j-1] || // skip
                    WD[i-1][j-1] || // one and done
                    WD[i-1][j]; // match multiple
            } else if (B[j-1] == '?') {
                WD[i][j] = WD[i-1][j-1];
            } else {
                if (A[i-1] == B[j-1]) {
                    WD[i][j] = WD[i-1][j-1];
                } 
                // else { WD[i][j] = false; }
            }
        }
    }
    // final answer
    return WD[m][n];
}

Time & Space Complexity

Time and space are both O(mn). Space can be further reduced to O(m).

PreviousRegex Matching NextInterleaving String

Last updated 5 years ago