mardi 12 juillet 2011

Minimal length universal words

Let's start off with a little definition of a universal word. For a language $\mathcal{L}$ over an alphabet $\mathcal{A}$, a word $w \in \mathcal{A}^*$ is universal if any word of $\mathcal{L}$ is a subsequence of $w$:

\[x \in \mathcal{L} \Rightarrow x \in w\]

For example consider $\mathcal{L}=\{0, 1\}^2$, the language formed by the words that have exactly two symbols over a binary alphabet. A universal word for $\mathcal{L}$ is 00011011.

It is clear that universal words always exist for any language. Now, we are interested in minimal length universal words. A minimal length universal word for the language above is 11001.

Let's have a look at a friendly family of languages for which there are minimal universal words with known length: the languages made of all the possible fixed $n$-size words over an alphabet of $k$ symbols. You can prove that for such languages:
  1. any universal word of size $k^n+n-1$ is minimal,
  2. there exists a minimal length universal word of size $k^n+n-1$,
  3. constructing a minimal length universal word with any prefix of length $n$ can be done in $O(k^n)$ time.
Here's such a word for $n=8, k=2$:
00000000111111110111111001111101011111000111101101111010011110010111100001110111011001110101011101000111001101110010011100010111000001101101011011000110101001101001011010000110011001010110010001100010011000010110000001010101000101001001010000010010000100010000000
I'm joining a simple and not too efficient python program for generating these, which basically is a constructive proof of existence. The obvious application of these are in bruteforcing digital locks that match the password with the last $n$ keys pressed every time a key is pressed. You only have to press an average of $(k^n+n-1)/2$ keys instead of $n k^n/2$, and this is optimal.

EDIT: Thanks Antony for pointing to the fact that these are called de Bruijn sequences.