How to check if a sequence is random?

Question

When I was thinking about various types of pseudo-randomness, the following question struck me:

Suppose that a sequence $a_n \in \{0,1\}$ is given. Is there a way to check if it is genuinely random?

More concretely, suppose that we know in advance that $a_n$ are either:

all chosen at random, according to the same distribution, and indenpendently, or
produced by some, unknown, algorithm (i.e. there exists a deterministic Turing machine which, given $n$ as input produces $a_n$ on output.)

Assume additionally that the distribution is not too close to degenerate (say, $.001 < Pr(a_n = 1) <.999$).

Does there exist an algorithm which gets the (infinite) sequence $a_n$ on input (i.e. is given access to a ``black box'' which produces $a_n$ given $n$) such that:

if the sequence $a_n$ is random, then the algorithm keeps going indefinitely with probability $>1 - \varepsilon$,
if the sequence is deterministic, the algorithm outputs YES in finite time, always.

Note that the restriction that the distribution is not too degenerate is necessary, else in 2. we don't know when to output YES for the constant sequence $0,0,0\dots$. Also, obviously, in 2. there is no hope of determining the Turing machine which produces $a_n$. The question only makes sense if the sequence $a_n$ is infinite, else every sequence is deterministic (as pointed out in the comments).

There is a related question, but it asks for practical test. What I am interested in is what the situation is ``in principle''. It seems to me that such algorithm should exist, but I am wondering if I am missing something.

The question is kind of ill-posed. Any sequence is deterministic: it is the output of the Turing machine that writes such a sequence. But a way to say how "random" a sequence is (according to Kolmogorov) is to run some compression algorithm and compare the sizes of the original sequence and the compressed one. — Jack D'Aurizio, Sep 12 '14 at 23:40
When I said "a sequence" I meant "an infinite sequence". They are not deterministic in this sense. — Jakub Konieczny, Sep 12 '14 at 23:42

score 3 · Answer 1 · answered Sep 12 '14 at 23:44

3

No, you can't. There are many tests for randomness: having roughly half the bits be zero, having the right number of strings of $0$'s of various lengths, having the right proportion of each eight bit chunk, etc. If you search for "randomness test" you can read about many of them. Generally they can prove a string is not random (at a certain confidence level), but not that it is random. For example, suppose I gave you the string created by XORing the strings of $e$ starting from the millionth bit and $pi$ starting from the billionth. We would expect this string to pass all the statistical tests you might try, but it has a very simple rule behind it. Unless you recognize the string somehow, you are unlikely to figure out the rule.

answered Sep 12 '14 at 23:44

Ross Millikan

374,822

Well, I can think of one statistical test I might try that discovers this sequence. Namely, just produce the sequence you mentioned, and check if $a_n$ is equal to $n$-th term of your sequence. Of course, this is a rather obscure test, but I can afford to iterate over similarly obscure tests and see what happens. – Jakub Konieczny Sep 13 '14 at 00:06
Yes, but you have to think of all the ways I might generate a "random" sequence for that to work. Maybe my sequence is none of the ones you thought to try, but some easily describable sequence nonetheless. Again, you can prove my sequence not random by finding the description, but you really can't prove it random. – Ross Millikan Sep 13 '14 at 00:12
I don't actually want to prove a sequence random - I am well aware that this would be impossible. What puzzles me is if I can prove a sequence fails to be random, whenever it fails to be random. Imagine for a moment that I have another black box, which tells me if a given Turing machine produces a 0/1-valued sequence. If I had that, I could devise an algorithm which iterates over all such Turing machines. For each Turing machine I wait until either $a_n$ differs from the output, or I am reasonably sure $a_n$ agrees with the output more than a random sequence would. – Jakub Konieczny Sep 13 '14 at 00:20
Of course, I do not have such black box, but if I did, I think this would result in the type of algorithm in the question. At least, I would be able to account for all "random" sequences you can think of (as long as they are computable).
In more down-to-earth terms, I could also do the same without the black box, instead iterating over all Turing machines corresponding to "simple" mathematical constructions (i.e. ones that are bound to stop for "simple" reasons).
– Jakub Konieczny Sep 13 '14 at 00:24

score 2 · Accepted Answer · answered Sep 14 '14 at 02:27

I think you're essentially uncovering the foundations of algorithmic randomness.

The question is extremely related to Kolmogorov Complexity (which is at the basis of that field); intuitively, if any finite prefix of the sequence has small Kolmogorov complexity, then at least that part of the sequence is not very random.

This also leads to the answer being basically "no", we cannot check if the sequence is random, because Kolmogorov complexity is incomputable. So even checking whether a finite-length string is random is incomputable. Intuitively, it's very closely related to the halting problem and computing busy-beaver numbers.

How to check if a sequence is random?

2 Answers2