10.9
(a) Longest Common Subsequence
Given two strings T (length m) and P (length n) build a table C[0‥m][0‥n]:
for i = 0..m: C[i][0] = 0
for j = 0..n: C[0][j] = 0
for i = 1..m:
for j = 1..n:
if T[i-1] == P[j-1]:
C[i][j] = C[i-1][j-1] + 1
else:
C[i][j] = max(C[i-1][j], C[i][j-1])
C[m][n] is |LCS|; back-tracking from (m,n) yields the subsequence itself.
Complexity: O(mn) time, O(min(m,n)) space with two-row optimisation.
Shortest Common Supersequence
Use the same table and merge the strings while tracing back:
i = m, j = n, S = ""
while i>0 or j>0:
if i>0 and j>0 and T[i-1] == P[j-1]:
S = T[i-1] + S; i--; j--
elif j == 0 or (i>0 and C[i-1][j] ≥ C[i][j-1]):
S = T[i-1] + S; i--
else:
S = P[j-1] + S; j--
Length formula: [math]|SCS(T,P)| = m + n − |LCS(T,P)|], so the walk above is O(m+n).
(b) Edit distance without substitutions
Let L be any common subsequence of T and P.
• Delete from T the m−|L| symbols not in L.
• Insert into the result the n−|L| symbols that appear in P but not in L.
This edit script costs m + n − 2|L|.
Choosing L to be the longest common subsequence minimises the cost, giving
[math]d(T,P)=|T|+|P|−2|LCS(T,P)|.[/math]
For the supersequence, every symbol common to both strings is written once, every other symbol once again, hence
[math]|SCS(T,P)| = |T| + |P| − |LCS(T,P)|.[/math]
Subtracting the two equalities we obtain the desired identity
[math]d(T,P) = |SCS(T,P)| − |LCS(T,P)|.[/math]
The constructions above therefore provide optimal algorithms for LCS, SCS and insertion/deletion edit distance.
Back to
Chapter 10