Boring numbers, complexity and Chaitin's incompleteness theorem

Informally, Chaitin’s incompleteness theorem states that there is a constant LL, such that we can’t prove the Kolmogorov complexity of any specific bit string to be larger than LL. We can of course prove that there are infinitely many bit strings with higher complexity than LL — but we can’t name a single one!

John Baez calls this constant LL the complexity barrier. And surprisingly, he argues that it is probably very low (on the order of a few thousand bits for reasonable encoding schemes)!

At least to me, this is a pretty amazing fact. Consider all the material available on the internet for instance: everything ever written online, all the videos and images, and binaries for every computer program out there. “Obviously” we can’t just write a Python program of a few kilobytes that outputs all of this, … right? Well, I’m pretty sure we can’t, but somewhat incredibly, there’s no proof of that!

Take a moment to be properly astonished by this result because the aim of this post is to make it as obvious as possible. We’ll get there soon enough, but first let’s look at some fun paradoxa.

There are no boring numbers …

The follwing “paradox” is quite famous:

Assume there was an uninteresting natural number. Then the smallest such number would be interesting — because it’s the smallest uninteresting number, that’s quite an interesting property! This is a contradiction, so there can be no uninteresting natural numbers.

We can formalize it as follows: we have some boolean “boringness” property, call it PP, defined over natural numbers. So P(n)P(n) just means ”nn is boring”. Being the lowest boring number is itself interesting:

P(minP(n)n)is falseP\left(\min_{P(n)} n\right)\quad \text{is false}

This is self-contradictory if there are any nn such that P(n)P(n), so no natural numbers can have property PP.

… and every number can be described in 13 words or less

There’s a arguably more interesting variation of this paradox: let Pk(n)P_k(n) mean ”nn cannot be described in fewer than kk words”. Consider then the description “the smallest natural number which cannot be described in fewer than 14 words”. In our notation, this would be

minP14(n)n\min_{P_{14}(n)} n

However, this description is only 13 words long, so

P14(minP14(n)n)is falseP_{14}\left(\min_{P_{14}(n)} n\right)\quad \text{is false}

which is a contradiction if such a number exists. Therefore, every number can be described in at most 13 words.

This seems suspicious: while there are many 13-word phrases, there’s still only a finite number of them1. So there aren’t enough short descriptions to go around for each natural number to get one.

There are arbitrarily complex strings …

Of course the problem is that “cannot be described in fewer than kk words” is not a well-defined property because there is no unambiguous mapping from English descriptions to numbers.

But what if we replace English by a formal language to circumvent this issue? For example, a Turing machine without any input either halts and outputs a number (in the form of a bit string), or it runs forever. If we fix an encoding for Turing machines, any Turing machine has a length, so we can define Pk(n)P_k(n) as ”nn is not the output of any halting Turing machine with length less than kk”. Or more briefly: “the Kolmogorov complexity of nn is at least kk“.

So let’s repeat our argument with Turing machines instead of natural language. We need to define a program that outputs minPk(n)n\min_{P_k(n)} n while being itself shorter than kk. Given a subroutine that checks whether Pk(n)P_k(n) for arbitrary nn, we could simply iterate over n=1,2,n = 1, 2, \ldots until we find one such that Pk(n)P_k(n) and then output that. If such an nn existed, this program would output minPk(n)n\min_{P_k(n)} n, so by the same argument as before, there can’t be any nn with complexity higher than kk

… which can’t be right, because just as with natural language descriptions, there are only finitely many programs of length k\leq k, but infinitely many bit strings.

This time, the issue is the phrase “Given a subroutine …“: PkP_k is undecidable, so this subroutine unfortunately doesn’t exist2. Or fortunately, if you value the consistency of mathematics.

In essense, the problem is this: in English, the smallest number with a simple property can be described in few words because you only need to describe the property and a few additional words. But the same is not true for Kolmogorov complexity if the property isn’t decidable.

For any decidable property, the argument works: the smallest number with that property will have low Kolmogorov complexity (where “low” means “not much larger than the complexity of the property”). Let’s see what we can get by applying that insight.

… but not provably complex ones!

This post is supposed to be about Chaitin’s incompleteness theorem, so we’d better connect all of this talk about paradoxa and complexity to that. The only missing ingredient is to consider provably large complexities, rather than just large complexities as before.

This means Pk(n)P_k(n) will now be ”nn encodes a proof that some specific bit string has Kolmogorov complexity higher than kk”. We need some system of logic to make “proof” well-defined and an encoding scheme of proofs as natural numbers but we’ll ignore that since our goal is just to gain intuition.

This new PkP_k is clearly decidable: we just need to check whether nn is a valid encoding of a proof and whether that proof shows that some specific number has Kolmogorov complexity higher than kk.

This program has length logk\log k (to encode kk) plus some constant. As we saw, this means that minPk(n)n\min_{P_k(n)} n also has Kolmogorov complexity at most logk\log k (up to a constant): we can iterate over nn and return the first one for which Pk(n)P_k(n) holds.

So far there’s no contradiction: nn proves that the Kolmogorov complexity of some specific number, call if M(n)M(n), is larger than kk. And we’ve only seen that the Kolmogorov complexity of minPk(n)n\min_{P_k(n)} n is low. But of course MM is itself computable and in fact by a pretty short program (which just looks at what nn proves). So Mk:=M(minPk(n)n)M_k := M\left(\min_{P_k(n)} n\right) also has a small complexity, more precisely:

K(Mk)logk+constK(M_k) \leq \log k + \text{const}

But by construction, K(Mk)>kK(M_k) > k (that’s what minPk(n)n\min_{P_k(n)} n encodes a proof of). So we get

k<K(Mk)<logk+constk < K(M_k) < \log k + \text{const}

which obviously can’t hold if kk is sufficiently large. So for kk greater than some constant LL, we run into a paradox … if any nn encoding such a proof exists. If there is no nn encoding a proof that some specific bit string has complexity higher than kk, then there is no smallest such nn, and we can’t define MkM_k. This proves Chaitin’s Incompleteness Theorem:

There is some constant LL such that for any given bit string, we can’t prove it has Kolmogorov complexity higher than LL.

Further reading

  • The “fact” that every number can be described in at most 13 words is known as the Berry paradox
  • You might like John Baez’s post that I already linked above. In addition to a discussion of Chaitin’s incompleteness theorem, he talks about how another famous paradox — the surprise examination paradox — motivates a proof of Gödel’s second incompleteness theorem! Or you could read the paper by Kritchman and Raz that introduced that proof.
  • Chaitin himself briefly points out the similarity of his impossibility result to Berry’s paradoxon in Gödel’s Theorem and Information (1982)

  1. At least in languages where you can’t just create an arbitrary number of new words by composing existing ones. If you can, then one word is enough to “describe” any number
  2. It might seem like we’ve in fact proven that PkP_k is undecidable, because if it was decidable, we’d get a contradiction. But actuallly, we’ve only shown that PkP_k can’t be computed by any program of length less than kk. A longer program doesn’t immediately lead to a contradiction.