How did early compilers overcome the need for a compiler to compile itself?

  • Thread starter bhobba
  • Start date
In summary, bootstrapping is the process of creating a compiler or assembler using a simpler version of itself. This was commonly done in the early days of programming, with the first C compiler being written in BCPL and then bootstrapped to create a more sophisticated version. Nowadays, tools like LLVM Lite allow for compilers to be written in higher-level languages like Python, which can then be used to compile the original language's code. This process is also used when a new CPU is released, with a cross compiler being used to create tested code for the new architecture. The first assembler was also created using a similar bootstrapping process.
  • #1
10,824
3,690
This came up as a question on another forum - how can you compile the code of a C compiler when you need a C++ compiler to do it? How was the first C compiler made? How is it done when a new CPU comes out?

To cut a long story short, it’s done by bootstrapping. We now have more sophisticated tools, so I will first describe how it was done in the early days. It was not exactly how I am going to describe it. It used the old language BCPL, but the principles are the same. First, you write an assembler in direct bytecode. It is a simple assembler - nothing fancy at all. The easiest to write is what I call a Forth Assembler.

Forth would have to be the easiest language there is. You simply read a string of words and numbers separated by whitespace (ie spaces). If it is a word, you look up a dictionary to find the address of the associated subroutine. If it is a number, push the number onto the Forth stack. If it can't do either, create an error. The executable is so easy it can be put on the boot sector of a USB disk. It can contain a routine to read the rest of the disk into memory, execute the code, and save the code you write. Then you have a word for each computer instruction, so you have a simple assembler. You could, if you wanted, create a whole Forth-based sophisticated operating system. But most stuff these days is written in C++, so the first thing to try is to get a C compiler working.

Added Later: The above is a bit sketchy. I have located a site giving more detail:
https://niedzejkob.p4.team/bootstrap/

Probably the simplest C is TinyC which is written in C. To see a simplistic version of its code, search for KartikTalwar/3095780. I originally gave the code, but IMHO it distracted more than helped.

The full vesion does some pretty cool things:
https://nova.disfarm.unimi.it/manual/plugins/tcc-doc.htm

There is a PDF on exactly how it works (also see the book - A Retargetable C Compiler Design and Implementation):
https://www.researchgate.net/publication/2810190_A_retargetable_compiler_for_ANSI_C

The hard part is you have to rewrite that code in your simple Forth Assembler. It will be no easy task, but a determined person can do it with perseverance and the information above.

Then you use your FORTH C compiler to compile the TinyC code. But of course, the executable is the crappy executable your Forth C compiler produces. To fix that, you recompile it to produce a nice TinyC executable. This process is called bootstrapping a compiler.

Today we have much better tools, such as LLVM Lite, which allows a compiler to be written in Python. You write the compiler in Python and LLVM. Then you write the compiler in that language and bootstrap it as before. The Linux kernel is written in C. You could install the Linux kernel and a full Linux operating system.

These days the best C compilers (in fact, the best compilers in general) are written in C++. So if you want the best compiler, you need a C++ to C converter (EMMTRX makes one). You take the compiler you want, written in C++, for C++; translate it to C, then compile it. The executable is used to recompile the C++ compiler to get the best C++ executable. Bootstrapping again.

Someone developed a very efficient and small (114K) version of Python written in C++ instead of C. This version of Python was very easy to interface to programs written in C++. Since it is so small, instead of having the C programs as DLLs (you can if you want), it created a special version of the Python virtual machine with the C++ programs linked in. So what was done was to write a very efficient version of the python interpreter integrated with C++ in C++ and a Python preprocessor. It’s called Python++. They then used it to write an even better version of Python called TPython++:
https://gitlab.com/hartsantler/tpythonpp

Again bootstrapping at work.

I hope this has shown just how powerful the bootstrapping concept is.

Thanks
Bill
 
Last edited:
  • Like
Likes Filip Larsen, Mark44, FactChecker and 1 other person
Technology news on Phys.org
  • #2
bhobba said:
how can you compile the code of a C compiler when you need a C++ compiler to do it?
A C compiler need not be written in C++. I worked for somebody who wrote one by hand in assembler. It had to be able to run in 64k of memory, and the low level language allowed some interesting space saving tricks.

bhobba said:
How is it done when a new CPU comes out?
Usually a cross compiler is used, which can have tested code ready even before the new CPU prototype exists. Most of the compiler code already exists and is tested, and it just needs to be altered for the new code set. The bootstrap process is at some point necessary to solve the chicken and egg problem.

More to the point, at some point the first assembler had to be written without aid of being able to write it in assembly code.
 
  • Like
Likes Vanadium 50, berkeman, bhobba and 1 other person
  • #3
Halc said:
A C compiler need not be written in C++.

Of course. I described a process where it was written in my quick and dirty Forth Assembler. Then bootstrapped to Tinyc.

Halc said:
More to the point, at some point the first assembler had to be written without aid of being able to write it in assembly code.

You could write the quick and dirty Forth assembler in bytecode. That would in itself involve some steps. For example, you need some bytecode to store and load an executable and a simple editor to modify the bytecode.

Thanks
Bill
 
  • #4
If you really want to talk about bootstrapping. The first tool is the assembler. Compilers would spit out assembler code that would be processed by the assembler to machine code. A linker would take that machine code and construct a machine image that actually got loaded.

In the case of C++ in the early days, there was the "compiler" cfront which converted C++ code to C code. The C compiler would do its magic to make assembler then to machine code for the linking process.

Newer versions of the compilers generate machine code directly skipping the assembler step.

I'm sure we can make these steps even more precise and with how its done across various platforms...

Here's a discusion on the technique:

https://www.geeksforgeeks.org/bootstrapping-in-compiler-design/
 
  • Like
Likes bhobba
  • #6
jedishrfu said:
Newer versions of the compilers generate machine code directly skipping the assembler step.
Well...sort of, They don't write out assembly, but they do "think it".

As part of the history of bootstraps, there are also cross-assemblers (in analogy to cross-compilers). Back in the day, you could buy an assembler that took 8080 assembler input, and produced 8086 output. However, you were (obviously) restricted to 8-bit operations and 64k addressing.
 
  • Like
Likes bhobba
  • #7
In 1967 I worked on a project with the assembler run on a different machine (GE 225) than the target machine (GE 4020). We assembled and printed source listings weekly. Developers did machine code patches the rest of the week.

For the next cycle, the week's patches were hand translated to assembler.

The first version of the assembler source was generated by a FORTRAN compiler on yet another machine (GE 625) and hand translated from 625 assembler to 225 assembler. Our friend and mentor, @jedishrfu also worked with that same GE 625.

The point is that any and all kind of bootstrapping you can imagine was used somewhere.
 
  • Informative
Likes bhobba

FAQ: How did early compilers overcome the need for a compiler to compile itself?

What is a compiler?

A compiler is a type of computer program that translates source code written in a programming language into a lower-level language, typically machine code. This allows the computer to understand and execute the instructions written by the programmer.

Why was there a need for a compiler to compile itself?

Early compilers were written in assembly language, which was time-consuming and tedious. By having a compiler compile itself, it could be translated into a higher-level language, making it easier to maintain and modify.

How did early compilers overcome the need for a compiler to compile itself?

Early compilers used a process known as bootstrapping, where a small part of the compiler was written in assembly language and then used to compile the rest of the compiler written in a higher-level language. This allowed for the creation of more efficient and maintainable compilers.

What were the benefits of compilers being able to compile themselves?

By being able to compile itself, the compiler became more portable and could be easily adapted to different computer systems. It also reduced the time and effort required to develop and maintain the compiler.

Are modern compilers still able to compile themselves?

Yes, modern compilers are still able to compile themselves using more advanced techniques and technologies. This allows for continuous improvement and optimization of the compiler itself.

Back
Top