Regex Matching

Question (LC.10)

Match a given string with a regex pattern '.' Matches any single character '*' Matches zero or more of the preceding element

Example

("a", "ab*") => true
("", "b****") => invalid regex assume never happens
("aabceef", "*a*") => false
("minutes","m?nu*s") => false

DP Approach

This question is pretty much wildcard matching with a change of power on *. Here * is no longer a wildcard but a count of a capture group. For example, if we want to capture two lower case letters, we can do [a-z][a-z]. If we want to capture 0 to many lower case letters, we can do [a-z]*.

1. Define optimal subproblem
RX[i,j] = whether A[0:i] matches with B[0:j]

2. Solve by prefix
Case 1 a-z: 
if A[i] == B[j]
    RX[i,j] = RX[i-1,j-1] // match one and done
else
    RX[i,j] = false
Case 2 .:
if B[j] == '.'
    RX[i,j] = RX[i-1,j-1] // match one and done
Case 3:
if B[j] == '*'
    Case 3.1:
    if A[i] == B[j-1] || A[i-1] == '.'
        RX[i,j] = RX[i,j-2] // skip ex. A* = ""
        RX[i,j] = RX[i,j-1] // match one and done ex. A* = A
        RX[i,j] = RX[i-1,j] // match more than one ex. A* = AAA
    Case 3.2:
    if A[i] != B[j-1]
        RX[i,j] = RX[i,j-1] // skip

3. Init base case
RX[0,0] = true
Since each '*' can eliminate the charter before it, 
the first row needs to be updated by the one before previous.
for j from 2 to n // j = 1 will be false
    RX[0][j] = RX[0][j-2] and B[j-1] == '*'
// first col auto init to false

4. Topo order
row by row
for i from 1 to m
    for j from 1 to n

5. Final answer
RX[m,n]

DP Code

public boolean isMatch(String string, String regex) {     
    char[] A = string.toCharArray(), B = regex.toCharArray();
    int m = A.length, n = B.length;
    // create memo table
    boolean[][] RX = new boolean[m+1][n+1];
    // init base cases
    RX[0][0] = true;
    // first row where [a-z]* can be empty
    for (int j = 2; j <= n; j++) {
        if (RX[0][j-2] && B[j-1] == '*') {
            RX[0][j] = true;
        }
    }
    // topo order
    for (int i = 1; i <= m; i++) {
        for (int j = 1; j <= n; j++) {
            if (B[j-1] == '*') {
                if (A[i-1] == B[j-2] || B[j-2] == '.') {
                    RX[i][j] = 
                        RX[i][j-2] || // skip
                        RX[i][j-1] || // one and done (A* => "")
                        RX[i-1][j]; // match multiple                        
                } else {
                    RX[i][j] = RX[i][j-2]; // skip
                }
            } else if (B[j-1] == '.') {
                RX[i][j] = RX[i-1][j-1];
            } else {
                if (A[i-1] == B[j-1]) {
                    RX[i][j] = RX[i-1][j-1];
                } 
                // else { RX[i][j] = false; }
            }
        }
    }
    // final answer
    return RX[m][n];
}

Time & Space Complexity

O(n^2)

Last updated