Edit Distance
Question (LC.72)
Given two strings, find the minimum number of edits to transform string1 to string2.
Example
I: "Saturday", "Sunday"
O: 4
I: "More", "Less"Brute Force Search
The most brute force search is just to enumerate all the possible matches for #edits = 1, 2, 3, ..., n until we find a match. Or we can do a smarter search that starts either from the beginning or the end, comparing the corresponding characters.
If we start from the beginning, then the minimum number of edits for s1 to match s2
if s1[m] == s2[n]
return editDistance(s1, s2, m+1, n+1)
else
return 1 + Math.min(
editDistance(s1, s2, m, n + 1), // insertion
editDistance(s1, s2, m + 1, n), // deletion
editDistance(s1, s2, m + 1, n + 1) // substitution
) ;The recurrence above should capture all the possibilities. Inside the else statement, the presumed equivalent cost of edit 1 + the min of the rest is guaranteed to be optimal. What if spending one edit is actually more cost efficient then having a match in the if statement? We can get a formal proof by reordering steps in an arbitrary optimum solution. But this example should give some good intuition. Consider SabSunday and Sunday. If we match the S in the beginning, we just have to delete the second S when we get there. The minimum number of edits still remain as 3.
Brute Force Code
Search works. The exponential time complexity doesn't. A relative short example like dinitrophenylhydrazine and acetylphenylhydrazine will give Time Limit Exceeded. See the induction proof in Rod Cutting.
Bottom Up DP
Observe the recursion for the following example
If we define a subproblem as ED(i,j), then ED(i,j) represents the optimal solution at using the suffix s1[i:m] to fully match the suffix s2[j:n]. From the recursion tree, we can find quite a few subtrees can be reused if they are memoized. Dynamic programming is the way to go.
Solve by Suffix
Solve by Prefix
Defining the subproblem by prefix might be easier to write sometimes.
Prefix vs Suffix
The tabulation process for prefix is going down (removing from the end so tabulate down) and for suffix is going up (removing from the beginning so tabulate up).
By Prefix
By Suffix


Time & Space Complexity
Time is O(m*n) but space is trickier. We are using O(m*n) space right now for better understanding. But we can get to constant space with careful implementation since the dependencies are only two layers.
Last updated