mardi 12 juillet 2011

Minimal length universal words

Let's start off with a little definition of a universal word. For a language $\mathcal{L}$ over an alphabet $\mathcal{A}$, a word $w \in \mathcal{A}^*$ is universal if any word of $\mathcal{L}$ is a subsequence of $w$:

\[x \in \mathcal{L} \Rightarrow x \in w\]

For example consider $\mathcal{L}=\{0, 1\}^2$, the language formed by the words that have exactly two symbols over a binary alphabet. A universal word for $\mathcal{L}$ is 00011011.

It is clear that universal words always exist for any language. Now, we are interested in minimal length universal words. A minimal length universal word for the language above is 11001.

Let's have a look at a friendly family of languages for which there are minimal universal words with known length: the languages made of all the possible fixed $n$-size words over an alphabet of $k$ symbols. You can prove that for such languages:
  1. any universal word of size $k^n+n-1$ is minimal,
  2. there exists a minimal length universal word of size $k^n+n-1$,
  3. constructing a minimal length universal word with any prefix of length $n$ can be done in $O(k^n)$ time.
Here's such a word for $n=8, k=2$:
00000000111111110111111001111101011111000111101101111010011110010111100001110111011001110101011101000111001101110010011100010111000001101101011011000110101001101001011010000110011001010110010001100010011000010110000001010101000101001001010000010010000100010000000
I'm joining a simple and not too efficient python program for generating these, which basically is a constructive proof of existence. The obvious application of these are in bruteforcing digital locks that match the password with the last $n$ keys pressed every time a key is pressed. You only have to press an average of $(k^n+n-1)/2$ keys instead of $n k^n/2$, and this is optimal.

EDIT: Thanks Antony for pointing to the fact that these are called de Bruijn sequences.

mercredi 13 octobre 2010

The Social Network: a flawed voting scheme

As it happens, I saw The Social Network. For my defense, I should mention that this is solely the unfortunate result of my friends' perfidy: a moment of inattention and bim! you're suddenly learning about the romantic life of Mark Zuckerberg - or whatever. Nothing really made me go "wait a second", except in the very beginning of the film when our protagonist happily codes a voting scheme for ranking Harvard's girls - by "hotness" of course. As the story goes, he calls his best friend over for him to explain "the chess playing algorithm". I am clueless as to what chess has to do with this, but let's move on. (EDIT: They are probably talking about the Elo rating system which does indeed make some sense.)

Back at my university, we had the same thing on our local network periodically going on (at each beginning of the academic year - I let you guess why). The concept is fairly primitive. You are presented two girls' pictures and you click on which of the two you like best. Then two other girls are randomly paired and presented to you, and so on. In the end, after many many votes, you magically end up with some ranking of the girls. No comments on the social value of this. So, what is really going on behind the scene?

Let's get more precise. Basically, you have $n$ voters and $m$ choices. Each voter has a complete ordering over $\{1,...,m\}$, meaning that for any two choices $x,y$, one has either $x < y$ or $x > y$. Is complete ordering a realistic assumption? Well, it does not matter, as long as you are only given the $<$ or $>$ alternative in the voting scheme, so we will restict ourselves to this case. There are exactly $2^{\frac{m(m-1)}{2}$ possible distinct complete orderings, some of them are not transitive: you can have $a>b$ and $b>c$ but $a < c$. But is it possible that if I prefer $a$ to $b$ and $b$ to $c$, then I actually prefer $c$ to $a$? The answer is negative if I am being self-consistent. Thus, an opinion is a transitive complete ordering, i.e. a permutation. Therefore, voters' opinions are reasonably accounted for by permutations over the $m$ choices, and there is exactly $m!$ possible distinct opinions.

Thus, if $S_m$ denotes the group of permutations of size $m$, the voting scheme is adequately represented by a function $f$ which type is:

$$f : S_m^n \rightarrow S_m$$
Functions such as $f$ are called aggregation functions, because their aim is to aggregate voters' information into a single, hopefully fair and representative global choice. Now, unpleasant things start happening here. Even without knowing how $f$ works, we have some important limitations on what can $f$ achieve at best. Perhaps the simplest thing to think of is Condorcet's voting paradox: there exist configurations of voters' rankings such that $f$ will contradict a majority of voters' preferences.

Take 3 voters having personal preferences $a>b>c$, $b>c>a$ and $c>a>b$. We see that a strict majority of 2 voters prefer $a$ to $b$, $b$ to $c$ and $c$ to $a$. Thus, the majority ordering is not commutative, but $f$ is restricted to deliver a permutation. Thus any resulting aggregated ranking will contradict at least one of the 3 majority preferences.
Arrow's impossibility theorem extends and precises this puzzling result: basically there exists no fair (several natural requirements) aggregating functions for this problem. So, whatever the actual algorithm does, it can not claim to find the majority ranking of candidates - because no such thing exists in general. That's fairly poor for something supposed to give the general opinion of voters, right?

But in all honesty, we did not need Arrow's theorem to figure out that the idea does not work so well. Basically, the easiest way to implement this stuff is to keep track of the number of clicks on each candidate and to rank them accordingly. Thus, cheating is extremely easy by repeatedly clicking on someone's face. And the more you click, the stronger your voice is. Well, from my perspective, it seems that this is the number one reason why after a short initial enthusiastic phase, such applications invariably die out. And that's for the best :-)