(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Let's use the example given, for words say <i>A, B, C</i>, in the problem but not get too tied to the specific values. It helps to think about sorting the search string words by their indexes in the document:

<pre> A C B A A C * * B B * * * * C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 </pre>

In the code example this is just a call to sort but since each word index list is sorted we could use <code>MergeSort</code> which means this portion of the code is $\mathcal{O}(n)$ for $n$ indexes.

Once we have the list we add push each element into a hash with a separate key for each word and update our estimate of the snippet <i>span</i> that includes all the words.

Here's python code for one simple implementation.

<pre> import sys

def smallest_snippet(*args):


args -- K lists of word positions for K words

>>> smallest_snippet([1,10], [2,20], [3,30])
[1, 3]
>>> smallest_snippet([1,9,27], [6,10,19], [8,12,14])
[8, 10]
>>> smallest_snippet([1,4,11,27], [3,6,10,19], [5,8,12,14])
[3, 5]
>>> smallest_snippet([1,4,5], [3,9,10], [2,6,15])
[1, 3]


   master = []
for i in range(len(args)):
master.extend(map(lambda j: (i,j), args[i])) # ith word, jth index
master.sort(lambda x,y: cmp(x[1], y[1]))         # TODO: mergesort
tops = {}                                        # { word i: index j }
best = [master[0][1], master[-1][1]]
minspan = best[-1] - best[0] + 1

   # update span after each new index tuple
for (i,j) in master:
tops[i] = j
if len(tops) == len(args):
curr = [min(tops.values()), max(tops.values())]
if curr[1] - curr[0] < minspan:
minspan = curr[1] - curr[0]
best = curr

return best


if __name__ == "__main__":

   import doctest
doctest.testmod()

   sys.exit()


</pre>