5.11

From The Algorithm Design Manual Solution Wiki
Jump to navigation Jump to search

Idea. In the standard convolution-based matcher we build, for every alphabet symbol σ, two 0-1 vectors
 Tσ[i] = 1 iff text T[i] = σ, 0 otherwise
 Pσ[j] = 1 iff pattern P[m−1−j] = σ, 0 otherwise (pattern is reversed).
The pattern occurs at text position i when the sum of m convolutions
 ∑σ(Tσ*Pσ)[i+m−1] (1)
equals m (the pattern length).

Wildcard “*”. Let W be the set of pattern indices j whose symbol is “*”. At those indices we do not care which text character appears, so they must never contribute to a mismatch and must not be counted on the right-hand side of (1).
Therefore
1. For every σ and every wildcard position j∈W, set Pσ[j]=0.
2. Let k = m − |W| be the number of non-wildcard symbols.
3. After step 1 the convolutions are computed exactly as before; a text position i is a match iff the sum in (1) equals k, not m.

Pseudocode.

matchWildcardFFT(T, P): m ← |P|, n ← |T| W ← { j : P[j] = '*' } k ← m - |W| for each σ in alphabet: build Tσ[0..n-1] # 1 where T[i]=σ build Pσ[0..m-1] = 0 for j = 0 .. m-1: if j ∉ W and P[j] = σ: Pσ[m-1-j] ← 1 # reverse Cσ ← FFTconvolve(Tσ, Pσ) # length n+m-1 S ← element-wise sum of all Cσ answer ← { i : 0 ≤ i ≤ n-m and S[i+m-1] = k } return answer
Proof of correctness. • If the pattern (with * wildcards) matches T starting at i, every non-wildcard P[j] equals T[i+j]; exactly those positions contribute 1 to some convolution, giving S[i+m−1]=k.
• If S[i+m−1]=k, each of the k non-wildcard positions contributed 1, hence their characters coincide; wildcards impose no restriction, so the pattern matches.

Complexity. O((|Σ|+1)(n + m) log(n+m)) time for |Σ| FFTs and one summation, and O(|Σ|(n+m)) space. When the alphabet is constant this is O((n+m) log(n+m)), the same as the original algorithm.

Example. T = “shots”, P = “sh*t”. Vectors for σ=s: Ts = 1 0 0 0 1, Ps = 0 1 0 0 σ=h: Th = 0 1 0 0 0, Ph = 0 0 1 0 σ=t: Tt = 0 0 0 1 0, Pt = 1 0 0 0 Wildcard ‘*’ at j=2 removes that column from every Pσ. Summing convolutions yields S[4]=3=k → match starts at position 0 (“shot”).


Back to Chapter 5