5.11

Idea. In the standard convolution-based matcher we build, for every alphabet symbol σ, two 0-1 vectors
T_σ[i] = 1 iff text T[i] = σ, 0 otherwise
P_σ[j] = 1 iff pattern P[m−1−j] = σ, 0 otherwise (pattern is reversed).
The pattern occurs at text position i when the sum of m convolutions
∑_σ(T_σ*P_σ)[i+m−1] (1)
equals m (the pattern length).

Wildcard “*”. Let W be the set of pattern indices j whose symbol is “*”. At those indices we do not care which text character appears, so they must never contribute to a mismatch and must not be counted on the right-hand side of (1).
Therefore
1. For every σ and every wildcard position j∈W, set P_σ[j]=0.
2. Let k = m − |W| be the number of non-wildcard symbols.
3. After step 1 the convolutions are computed exactly as before; a text position i is a match iff the sum in (1) equals k, not m.

Pseudocode.

matchWildcardFFT(T, P):
    m ← |P|, n ← |T|
    W ← { j : P[j] = '*' }
    k ← m - |W|
    for each σ in alphabet:
        build Tσ[0..n-1]            # 1 where T[i]=σ
        build Pσ[0..m-1] = 0
        for j = 0 .. m-1:
            if j ∉ W and P[j] = σ:
                 Pσ[m-1-j] ← 1      # reverse
        Cσ ← FFTconvolve(Tσ, Pσ)     # length n+m-1
    S ← element-wise sum of all Cσ
    answer ← { i : 0 ≤ i ≤ n-m  and  S[i+m-1] = k }
    return answer

Proof of correctness. • If the pattern (with * wildcards) matches T starting at i, every non-wildcard P[j] equals T[i+j]; exactly those positions contribute 1 to some convolution, giving S[i+m−1]=k.
• If S[i+m−1]=k, each of the k non-wildcard positions contributed 1, hence their characters coincide; wildcards impose no restriction, so the pattern matches.

Complexity. O((|Σ|+1)(n + m) log(n+m)) time for |Σ| FFTs and one summation, and O(|Σ|(n+m)) space. When the alphabet is constant this is O((n+m) log(n+m)), the same as the original algorithm.

Example. T = “shots”, P = “sh*t”. Vectors for σ=s: T_s = 1 0 0 0 1, P_s = 0 1 0 0 σ=h: T_h = 0 1 0 0 0, P_h = 0 0 1 0 σ=t: T_t = 0 0 0 1 0, P_t = 1 0 0 0 Wildcard ‘*’ at j=2 removes that column from every P_σ. Summing convolutions yields S[4]=3=k → match starts at position 0 (“shot”).

Back to Chapter 5

5.11

Navigation menu

Personal tools

Namespaces

Variants

Views

Navigation