5.11
Idea. In the standard convolution-based matcher we build, for every alphabet symbol σ, two 0-1 vectors
Tσ[i] = 1 iff text T[i] = σ, 0 otherwise
Pσ[j] = 1 iff pattern P[m−1−j] = σ, 0 otherwise (pattern is reversed).
The pattern occurs at text position i when the sum of m convolutions
∑σ(Tσ*Pσ)[i+m−1] (1)
equals m (the pattern length).
Wildcard “*”. Let W be the set of pattern indices j whose symbol is “*”. At those indices we do not care which text
character appears, so they must never contribute to a mismatch and must not be counted on the right-hand side of (1).
Therefore
1. For every σ and every wildcard position j∈W, set Pσ[j]=0.
2. Let k = m − |W| be the number of non-wildcard symbols.
3. After step 1 the convolutions are computed exactly as before; a text position i is a match iff the sum in (1) equals k, not m.
Pseudocode.
• If S[i+m−1]=k, each of the k non-wildcard positions contributed 1, hence their characters coincide; wildcards impose no restriction, so the pattern matches.
Complexity. O((|Σ|+1)(n + m) log(n+m)) time for |Σ| FFTs and one summation, and O(|Σ|(n+m)) space. When the alphabet is constant this is O((n+m) log(n+m)), the same as the original algorithm.
Example. T = “shots”, P = “sh*t”. Vectors for σ=s: Ts = 1 0 0 0 1, Ps = 0 1 0 0 σ=h: Th = 0 1 0 0 0, Ph = 0 0 1 0 σ=t: Tt = 0 0 0 1 0, Pt = 1 0 0 0 Wildcard ‘*’ at j=2 removes that column from every Pσ. Summing convolutions yields S[4]=3=k → match starts at position 0 (“shot”).
Back to
Chapter 5