On the Turing-Completeness of C

On November 17th, 2012, a link to a Brainfuck interpreter written using the C preprocessor language was posted to Hacker News. In the necessary flurry of HN comments that followed, the question of CPP’s Turing Completeness came up, to be answered negatively: the proposed interpreter needs to be extended manually for each possible loop upper bound, which makes it barely equivalent to Hofstadter’s BlooP language from Gödel, Escher, Bach.

However a gem was hidden in the midst of these comments: is the C language Turing-Complete?

Despite the appearance of controversy, with some commenters leaning on the side of “yes” and some other on the side of “no,” there is only one correct answer from the perspective of the language specification. But it’s not intuitive.

EDIT (Nov 20th): following advice from both Scott and commenter E.G. Daylight, I have followed up with a “cleanup post” that summarizes the argument and focuses the discussion below.

❦❦❦

As commenter Scott Schneider puts it, the answer can only phrased after one disentangles the language from the implementation:

Turing completeness is about separating the expression of a computation from the machinery that carries out the computation. If your expression language can be mapped to the idealized Turing machine, then your expression language can express all computations that can be computed. (That is, it is Turing Complete.) This allows us to reason about the expression of computation outside of the machinery that carries it out - because the machinery will always have limits.

The subtlety is: where do we draw the line between the expression and the machinery?

That is, assuming the question is “is the C preprocessor Turing Complete?” I answer no. Because implied in that question is that the C preprocessor language is the expression language we are interested in, and the C preprocessor is the machinery. The reason why I say that the C preprocessor expression language is not Turing complete is simple: you cannot actually express recursion or unbounded loops. […] Why? Because the simulated loops are bounded, always. If you can’t express arbitrary loops or recursion, then you can’t compute everything that can be computed.

Note that I stressed express. Yes, an actual C program is clearly limited in how much it can recurse, but the language itself can express unbounded recursion.

I followed and agreed to his reasoning so far. C can indeed express unbounded recursion, and therefore is strictly more expressive than the C preprocessor language.

However, is the ability to express unbounded recursions sufficient to be Turing-Complete? Another commenter worries that the fixed width of pointers in C amounts to fixing the amount of storage reachable from any C program, and thus losing Turing Completeness. Scott follows up with:

You don’t need to reason about C’s pointers to consider [it’s] Turing Complete. That it allows arbitrary storage*, it has conditionals and allows full looping (either unbounded loops or full recursion) is enough. That is, a language that had C’s semantics sans pointers would still be Turing Complete.

* emphasis mine

This argument seems valid at face value. If the language does indeed allow arbitrary storage, it would become possible to implement a bignum library on top of it using only regular recursion and conditionals, and then use integer and pointer types expressed as bignums.

An obstacle remains, however: how do we know that C indeed allows arbitrary storage?

Based on my extensive analysis of the ISO 9899:1999 and 9899:2011 language standards, I would like to highlight this is not trivial to determine.

First of all, the notion of “arbitrary storage” cannot be extracted from the use of C arrays and pointers. The reason is that all pointer types are defined in the language to have a static, fixed width, and so have array indices. This excludes using large arrays or dynamic invocations of “malloc” at run-time to grab storage up to arbitrary sizes.

Arbitrary storage, if such a thing indeed exists in C, would need to come from the ability to define an arbitrary large number of separate objects, each equipped with its own storage. (For clarity, I have summarized what constitutes an object and the distinction with storage in appendix F of this book).

Obviously, we cannot reserve arbitrarily more space by adding extra object definitions in the program’s source. This would not be satisfying: we would need to modify the program every time we need more storage; i.e. for a given program specification the amount of storage would be finite. Turing Completeness can thus only be attained if there exists a way to allocate an arbitrary large number of new objects at run-time starting from a static program specification.

The only mechanism available in C that enables dynamically and arbitrarily growing storage is the automatic storage allocation of local variables in unbounded recursions. But even this is not satisfactory: the local variables in one function activation are not directly visible from another function activation, e.g. the next recursion step. To access one level from the next, one would need to take the address of the local variable and pass a pointer to it. However, since pointers have a statically fixed width, this would in turn again limit the number of different storage slots that can be used/visible at any level.

Instead, possibly, if C was extended with primitives to navigate variables, at run-time, between function activations without using pointers, then one could claim that C supports arbitrary storage and thus Turing Completeness. Unfortunately, that is not the case with standard C.

I have read and heard arguments that the limitation could be “easily lifted” by simply stating that pointers have dynamically varying sizes. Unfortunately, due to the way aggregation is defined in C, the resulting type system would inductively require that all objects have dynamically varying sizes, and thus that all allocations must happen dynamically. The resulting language would be so different from C that I would deem it quite misleading to keep the same name.

❦❦❦

EDIT (Nov 19th): Following an e-mail conversation with Scott, I now realize a shortcoming of my perspective. Scott highlights that the compiler is part of the “machine,” and thus any compiler-specific constants such as the widths of types are not limiting factors to determine Turing-Completeness.

His argument is as follows: suppose one has written a given terminating program which may require, during its execution, up to some arbitrary amount of storage. Without changing that program, it is possible a posteriori to implement a piece of computer hardware and its C compiler that provide enough storage to accommodate this computation. This may require widening the widths of char (via CHAR_BITS) and/or pointers (via size_t), but the program would not need to be modified. Because this is possible, C is indeed Turing-Complete for terminating programs.

The tricky part of this argument is that it only works when considering terminating programs. Terminating programs have this nice property that they have a static upper bound to their storage requirement, which one can determine experimentally by running the program over the desired input with increasing storage sizes, until it “fits.”

The reason why I was mislead in my train of thoughts it that I was considering the wider class of “useful” non-terminating programs, e.g. interactive programs with non-deterministic input. Let us consider, for example, a C program that repeatedly reads a value N from the computer’s keyboard and repeatedly computes the fibonacci sequence and stores it in memory up to N iterations. By definition, this program does not terminate and uses up to N memory slots, with N arbitrarily large at run-time. This program would obviously eventually fail for some inputs on any real computer with a finite memory. But it would work for all possible inputs on a real (theoretical) Turing Machine because a real Turing Machine has an infinite tape.

The question is thus whether one could write such a program in C and compile it to a theoretical Turing Machine. The answer, according to my previous argument, is no: even if we were compiling C to a machine with infinite memory, the finiteness of C’s address space would get in the way.

So the abstract machine of C is equivalent to a Turing Machine when considering only terminating programs, but is not when including also non-terminating programs.