## On the Turing-Completeness of C

On November 17th, 2012, a link to a Brainfuck interpreter written using the C preprocessor language was posted to Hacker News. In the necessary flurry of HN comments that followed, the question of CPP’s Turing Completeness came up, to be answered negatively: the proposed interpreter needs to be extended manually for each possible loop upper bound, which makes it barely equivalent to Hofstadter’s BlooP language from Gödel, Escher, Bach.

However a gem was hidden in the midst of these comments: is the C language Turing-Complete?

Despite the appearance of controversy, with some commenters leaning on the side of “yes” and some other on the side of “no,” there is only one correct answer from the perspective of the language specification. But it’s not intuitive.

**EDIT (Nov 20th):** following advice from both Scott and commenter E.G.
Daylight, I have followed up with a “cleanup
post”
that summarizes the argument and focuses the discussion below.

❦❦❦

As commenter Scott Schneider puts it, the answer can only phrased after one disentangles the language from the implementation:

Turing completeness is about separating the

expressionof a computation from themachinerythat carries out the computation. If your expression language can be mapped to the idealized Turing machine, then your expression language can express all computations that can be computed. (That is, it is Turing Complete.) This allows us to reason about the expression of computation outside of the machinery that carries it out - because the machinery will always have limits.The subtlety is: where do we draw the line between the expression and the machinery?

That is, assuming the question is “is the C preprocessor Turing Complete?” I answer no. Because implied in that question is that the C preprocessor language is the expression language we are interested in, and the C preprocessor is the machinery. The reason why I say that the C preprocessor expression language is not Turing complete is simple: you cannot actually express recursion or unbounded loops. […] Why? Because the simulated loops are bounded, always. If you can’t

expressarbitrary loops or recursion, then you can’t compute everything that can be computed.Note that I stressed

express. Yes, an actual C program is clearly limited in how much it can recurse, but the language itself canexpressunbounded recursion.

I followed and agreed to his reasoning so far. C can indeed express unbounded recursion, and therefore is strictly more expressive than the C preprocessor language.

However, is the ability to express unbounded recursions sufficient to be Turing-Complete? Another commenter worries that the fixed width of pointers in C amounts to fixing the amount of storage reachable from any C program, and thus losing Turing Completeness. Scott follows up with:

You don’t need to reason about C’s pointers to consider [it’s] Turing Complete. That itallows arbitrary storage*, it has conditionals and allows full looping (either unbounded loops or full recursion) is enough. That is, a language that had C’s semantics sans pointers would still be Turing Complete.

* emphasis mine

This argument seems valid at face value. If the language does indeed allow arbitrary storage, it would become possible to implement a bignum library on top of it using only regular recursion and conditionals, and then use integer and pointer types expressed as bignums.

An obstacle remains, however: **how do we know that C indeed allows
arbitrary storage?**

Based on my extensive analysis of the ISO 9899:1999 and 9899:2011 language standards, I would like to highlight this is not trivial to determine.

First of all, the notion of “arbitrary storage” cannot be extracted from
the use of C arrays and pointers. The reason is that all pointer types
are *defined in the language* to have a static, fixed width, and so have
array indices. This excludes using large arrays or dynamic invocations
of “malloc” at run-time to grab storage up to arbitrary sizes.

Arbitrary storage, if such a thing indeed exists in C, would need to
come from the ability to define an arbitrary large number of separate
*objects*, each equipped with its own storage. (For clarity, I have
summarized what constitutes an object and the distinction with storage
in appendix F of this
book).

Obviously, we cannot reserve arbitrarily more space by adding extra object definitions in the program’s source. This would not be satisfying: we would need to modify the program every time we need more storage; i.e. for a given program specification the amount of storage would be finite. Turing Completeness can thus only be attained if there exists a way to allocate an arbitrary large number of new objects at run-time starting from a static program specification.

The only mechanism available in C that enables dynamically and
arbitrarily growing storage is the automatic storage allocation of local
variables in unbounded recursions. But even this is not satisfactory:
the local variables in one function activation are *not directly
visible* from another function activation, e.g. the next recursion step.
To access one level from the next, one would need to take the address of
the local variable and pass a pointer to it. However, since pointers
have a statically fixed width, this would in turn again limit the number
of different storage slots that can be used/visible at any level.

Instead, possibly, if *C was extended with primitives to navigate
variables, at run-time, between function activations without using
pointers*, then one could claim that C supports arbitrary storage and
thus Turing Completeness. Unfortunately, that is not the case with
standard C.

I have read and heard arguments that the limitation could be “easily
lifted” by simply stating that *pointers have dynamically varying
sizes.* Unfortunately, due to the way aggregation is defined in C, the
resulting type system would inductively require that all objects have
dynamically varying sizes, and thus that all allocations must happen
dynamically. The resulting language would be so different from C that I
would deem it quite misleading to keep the same name.

❦❦❦

EDIT (Nov 19th): Following an e-mail conversation with Scott, I now realize a shortcoming of my perspective. Scott highlights that the compiler is part of the “machine,” and thus any compiler-specific constants such as the widths of types are not limiting factors to determine Turing-Completeness.

His argument is as follows: suppose one has written a given terminating
program which may require, during its execution, up to some arbitrary
amount of storage. Without changing that program, it is possible *a
posteriori* to implement a piece of computer hardware and its C compiler
that provide enough storage to accommodate this computation. This may
require widening the widths of char (via CHAR_BITS) and/or pointers
(via size_t), but the program would not need to be modified. Because
this is possible, C is indeed Turing-Complete for terminating programs.

The tricky part of this argument is that it only works when considering terminating programs. Terminating programs have this nice property that they have a static upper bound to their storage requirement, which one can determine experimentally by running the program over the desired input with increasing storage sizes, until it “fits.”

The reason why I was mislead in my train of thoughts it that I was
considering the wider class of “useful” *non-terminating programs*, e.g.
interactive programs with non-deterministic input. Let us consider, for
example, a C program that repeatedly reads a value N from the computer’s
keyboard and repeatedly computes the fibonacci sequence and stores it in
memory up to N iterations. By definition, this program does not
terminate and uses up to N memory slots, with N arbitrarily large at
run-time. This program would obviously eventually fail for *some* inputs
on any real computer with a finite memory. But it would work for all
possible inputs on a real (theoretical) Turing Machine because a real
Turing Machine has an infinite tape.

The question is thus whether one could write such a program in C and compile it to a theoretical Turing Machine. The answer, according to my previous argument, is no: even if we were compiling C to a machine with infinite memory, the finiteness of C’s address space would get in the way.

So the abstract machine of C is equivalent to a Turing Machine when considering only terminating programs, but is not when including also non-terminating programs.

**Publication Date:**