Regex Matching

Question (LC.10)

Match a given string with a regex pattern '.' Matches any single character '*' Matches zero or more of the preceding element

Example

("a", "ab*") => true
("", "b****") => invalid regex assume never happens
("aabceef", "*a*") => false
("minutes","m?nu*s") => false

DP Approach

This question is pretty much wildcard matching with a change of power on *. Here * is no longer a wildcard but a count of a capture group. For example, if we want to capture two lower case letters, we can do [a-z][a-z]. If we want to capture 0 to many lower case letters, we can do [a-z]*.

1. Define optimal subproblem
RX[i,j] = whether A[0:i] matches with B[0:j]

2. Solve by prefix
Case 1 a-z: 
if A[i] == B[j]
    RX[i,j] = RX[i-1,j-1] // match one and done
else
    RX[i,j] = false
Case 2 .:
if B[j] == '.'
    RX[i,j] = RX[i-1,j-1] // match one and done
Case 3:
if B[j] == '*'
    Case 3.1:
    if A[i] == B[j-1] || A[i-1] == '.'
        RX[i,j] = RX[i,j-2] // skip ex. A* = ""
        RX[i,j] = RX[i,j-1] // match one and done ex. A* = A
        RX[i,j] = RX[i-1,j] // match more than one ex. A* = AAA
    Case 3.2:
    if A[i] != B[j-1]
        RX[i,j] = RX[i,j-1] // skip

3. Init base case
RX[0,0] = true
Since each '*' can eliminate the charter before it, 
the first row needs to be updated by the one before previous.
for j from 2 to n // j = 1 will be false
    RX[0][j] = RX[0][j-2] and B[j-1] == '*'
// first col auto init to false

4. Topo order
row by row
for i from 1 to m
    for j from 1 to n

5. Final answer
RX[m,n]

DP Code

Time & Space Complexity

O(n^2)

Last updated