How did early compilers overcome the need for a compiler to compile itself?

bhobba · Jun 13, 2022

This came up as a question on another forum - how can you compile the code of a C compiler when you need a C++ compiler to do it? How was the first C compiler made? How is it done when a new CPU comes out?

To cut a long story short, it’s done by bootstrapping. We now have more sophisticated tools, so I will first describe how it was done in the early days. It was not exactly how I am going to describe it. It used the old language BCPL, but the principles are the same. First, you write an assembler in direct bytecode. It is a simple assembler - nothing fancy at all. The easiest to write is what I call a Forth Assembler.

Forth would have to be the easiest language there is. You simply read a string of words and numbers separated by whitespace (ie spaces). If it is a word, you look up a dictionary to find the address of the associated subroutine. If it is a number, push the number onto the Forth stack. If it can't do either, create an error. The executable is so easy it can be put on the boot sector of a USB disk. It can contain a routine to read the rest of the disk into memory, execute the code, and save the code you write. Then you have a word for each computer instruction, so you have a simple assembler. You could, if you wanted, create a whole Forth-based sophisticated operating system. But most stuff these days is written in C++, so the first thing to try is to get a C compiler working.

Added Later: The above is a bit sketchy. I have located a site giving more detail:
https://niedzejkob.p4.team/bootstrap/

Probably the simplest C is TinyC which is written in C. To see a simplistic version of its code, search for KartikTalwar/3095780. I originally gave the code, but IMHO it distracted more than helped.

The full vesion does some pretty cool things:
https://nova.disfarm.unimi.it/manual/plugins/tcc-doc.htm

There is a PDF on exactly how it works (also see the book - A Retargetable C Compiler Design and Implementation):
https://www.researchgate.net/publication/2810190_A_retargetable_compiler_for_ANSI_C

The hard part is you have to rewrite that code in your simple Forth Assembler. It will be no easy task, but a determined person can do it with perseverance and the information above.

Then you use your FORTH C compiler to compile the TinyC code. But of course, the executable is the crappy executable your Forth C compiler produces. To fix that, you recompile it to produce a nice TinyC executable. This process is called bootstrapping a compiler.

Today we have much better tools, such as LLVM Lite, which allows a compiler to be written in Python. You write the compiler in Python and LLVM. Then you write the compiler in that language and bootstrap it as before. The Linux kernel is written in C. You could install the Linux kernel and a full Linux operating system.

These days the best C compilers (in fact, the best compilers in general) are written in C++. So if you want the best compiler, you need a C++ to C converter (EMMTRX makes one). You take the compiler you want, written in C++, for C++; translate it to C, then compile it. The executable is used to recompile the C++ compiler to get the best C++ executable. Bootstrapping again.

Someone developed a very efficient and small (114K) version of Python written in C++ instead of C. This version of Python was very easy to interface to programs written in C++. Since it is so small, instead of having the C programs as DLLs (you can if you want), it created a special version of the Python virtual machine with the C++ programs linked in. So what was done was to write a very efficient version of the python interpreter integrated with C++ in C++ and a Python preprocessor. It’s called Python++. They then used it to write an even better version of Python called TPython++:
https://gitlab.com/hartsantler/tpythonpp

Again bootstrapping at work.

I hope this has shown just how powerful the bootstrapping concept is.

Thanks
Bill

Halc · Jun 13, 2022

bhobba said:

how can you compile the code of a C compiler when you need a C++ compiler to do it?

A C compiler need not be written in C++. I worked for somebody who wrote one by hand in assembler. It had to be able to run in 64k of memory, and the low level language allowed some interesting space saving tricks.

bhobba said:

How is it done when a new CPU comes out?

Usually a cross compiler is used, which can have tested code ready even before the new CPU prototype exists. Most of the compiler code already exists and is tested, and it just needs to be altered for the new code set. The bootstrap process is at some point necessary to solve the chicken and egg problem.

More to the point, at some point the first assembler had to be written without aid of being able to write it in assembly code.

bhobba · Jun 13, 2022

Halc said:

A C compiler need not be written in C++.

Of course. I described a process where it was written in my quick and dirty Forth Assembler. Then bootstrapped to Tinyc.

Halc said:

More to the point, at some point the first assembler had to be written without aid of being able to write it in assembly code.

You could write the quick and dirty Forth assembler in bytecode. That would in itself involve some steps. For example, you need some bytecode to store and load an executable and a simple editor to modify the bytecode.

Thanks
Bill

jedishrfu · Jun 13, 2022

If you really want to talk about bootstrapping. The first tool is the assembler. Compilers would spit out assembler code that would be processed by the assembler to machine code. A linker would take that machine code and construct a machine image that actually got loaded.

In the case of C++ in the early days, there was the "compiler" cfront which converted C++ code to C code. The C compiler would do its magic to make assembler then to machine code for the linking process.

Newer versions of the compilers generate machine code directly skipping the assembler step.

I'm sure we can make these steps even more precise and with how its done across various platforms...

Here's a discusion on the technique:

https://www.geeksforgeeks.org/bootstrapping-in-compiler-design/

bhobba · Jun 14, 2022

jedishrfu said:

Here's a discusion on the technique:
https://www.geeksforgeeks.org/bootstrapping-in-compiler-design/

My post was rather specific and old-fashioned. The above is general and more modern. It uses the T-Diagrams, which some may not be familiar with:

Thanks
Bill

Vanadium 50 · Jun 14, 2022

jedishrfu said:

Newer versions of the compilers generate machine code directly skipping the assembler step.

Well...sort of, They don't write out assembly, but they do "think it".

As part of the history of bootstraps, there are also cross-assemblers (in analogy to cross-compilers). Back in the day, you could buy an assembler that took 8080 assembler input, and produced 8086 output. However, you were (obviously) restricted to 8-bit operations and 64k addressing.

anorlunda · Jun 14, 2022

In 1967 I worked on a project with the assembler run on a different machine (GE 225) than the target machine (GE 4020). We assembled and printed source listings weekly. Developers did machine code patches the rest of the week.

For the next cycle, the week's patches were hand translated to assembler.

The first version of the assembler source was generated by a FORTRAN compiler on yet another machine (GE 625) and hand translated from 625 assembler to 225 assembler. Our friend and mentor, @jedishrfu also worked with that same GE 625.

The point is that any and all kind of bootstrapping you can imagine was used somewhere.

How did early compilers overcome the need for a compiler to compile itself?

FAQ: How did early compilers overcome the need for a compiler to compile itself?

What is a compiler?

Why was there a need for a compiler to compile itself?

How did early compilers overcome the need for a compiler to compile itself?

What were the benefits of compilers being able to compile themselves?

Are modern compilers still able to compile themselves?

Similar threads

Hot Threads

Recent Insights