Storing program source as relations in a database instead of text file

PeterDonis · Dec 15, 2021

elcaro said:

Database are used by definition for storing relations.

They can store relations, but that is not the only thing they can store. The term "relational database" does not mean that the only thing the database can store is relations. It just means the database engine is optimized for dealing with relations.

elcaro said:

The only thing we can argue about is wether or not that that is the case in this particular case for storing program source.

I'm still confused about the point of this thread. By your OP and the title of the thread, I had understood you to be proposing that program source be stored as relations in a database. Is that what you're proposing, or not? Does the title of the thread need to be changed?

valenumr · Dec 15, 2021

PeterDonis said:

They can store relations, but that is not the only thing they can store. The term "relational database" does not mean that the only thing the database can store is relations. It just means the database engine is optimized for dealing with relations.I'm still confused about the point of this thread. By your OP and the title of the thread, I had understood you to be proposing that program source be stored as relations in a database. Is that what you're proposing, or not? Does the title of the thread need to be changed?

I was under the impression that the question is if a dbms can execute an arbitrary program. Otherwise, we have things like uml to model a lot of the concepts being referenced, which certainly can be stored in a database.

PeterDonis · Dec 15, 2021

valenumr said:

I was under the impression that the question is if a dbms can execute an arbitrary program.

I don't know where you're getting that impression, since it is not what either the thread title or the thread OP says.

valenumr · Dec 15, 2021

PeterDonis said:

I don't know where you're getting that impression, since it is not what either the thread title or the thread OP says.

Maybe a leap of logic. If the system can fully describe a program, it isn't a huge stretch to think that it can interpret it, but perhaps that's a stretch too far.

Mark44 · Dec 15, 2021

elcaro said:

I gave some arguments why it still could be usefull (for generating automatic build/make scripts for example) for that purpose, without the requirement of breaking the program source down to relations at the deepest level.

But the thread title is "Storing program source as relations in a database instead of text file." Since you seem to have given up on that idea, it would probably be best to give this thread a different title.

valenumr · Dec 15, 2021

Here is where it gets a little weird to me. If one can store a program description, I think it follows that one should also be able to retrieve the program and all the information it contains. And if one can extract all information of the program as stored, one should be able to interpret it's intent.

FactChecker · Dec 15, 2021

valenumr said:

Here is where it gets a little weird to me. If one can store a program description, I think it follows that one should also be able to retrieve the program and all the information it contains. And if one can extract all information of the program as stored, one should be able to interpret it's intent.

IMO, it is certainly impractical to store all the information in a relational database. I won't say it is impossible, but it is definitely beyond my ability to envision.

Tom.G · Dec 15, 2021

Rhetorical Question:
How does the initial proposal vary from a Flow Chart?

synch · Dec 18, 2021

elcaro said:

...
But intrinsically a program can also be seen as a collection of relations between different objects, which can be stored as tuples in a relational database system.
...

The relations between parts of the programs structure no doubt could be stored as tuples in a relational database using SQL. If I remember correctly, the relational model insists that the mechanisms for database maintenance are also implemented and stored relationally so I guess the usual RDBMS is a sort of example already. BUT... most other tasks and programs have to use extensive sequential and procedural logic, which is not a database strength, in fact, even storing derived detail is frowned on by RDBMS analysis unless it is a speed optimisation of some sort.
I can see the logic in the idea, but practically it would be a nightmare. Programs would have to be written in SQL. Eg to add two variables - the SQL program would have to apply read locks, access the variables' current value in tables and use the + operator in the call, yada yada yada ... the SQL would be horrendous after a while and it would run very slowly, with most RDBMS not exactly optimised for non-standard use. And that is avoiding the question of defining operators' actions in what contexts and so on.

However the logic certainly has value - I have found Codd's excellent rules for RDBMS are also surprisingly good at promoting good program structure, when (loosely) applied to writing program code not anywhere near RDBMS :) !
Eg thinking of a line of code like a row of fields in a table, and transactions (as far as realistic) - a line should not have dependency within the line, a line of code should achieve one action only, sequential lines of a composite action should be in a defined block that can be tried as a single action or rolled back, and so on . (Not surprising when you think about it - just IMHO)

jack action · Dec 19, 2021

synch said:

the SQL program would have to apply read locks, access the variables' current value in tables and use the + operator in the call, yada yada yada ...

I must admit that I was interested in the OP idea and I had problems following this discussion as I wasn't sure I understood other people's criticisms. But I never got from the OP that the program would be run by SQL. Just like no program is run directly from a text file.

What I understood was that instead of saving a text file, you saved everything as objects stored in tables - with names like 'variables', 'operator', 'function', 'control_structure', 'function', 'classes', 'namespace', 'expression' - all of them having some attributes or requirements. Of course, it will still have to be compiled before being executed. The compiler will have to already understand that, for example, using the operator with operator_ID = 1 means that it must perform an addition (or whatever). That is no different than a compiler reading a text file and it understands that when reading a '+' means that it must perform an addition.

The interesting advantages cited in the OP are that you can:

follow the relations and enforce the rules before compiling
automate certain processes (ex.: if function A is used, called library X)
study relations in complex programs more easily
let programmers choose their own language (or maybe even their own grammar; you could choose that an expression is ended by a semicolon or a new line) when converting from/to a human-readable text file (maybe even reading/writing flow charts)
etc... (as I'm basically restating the OP)

It doesn't affect the speed of programming as, in the end, the SQL file must still be compiled to a binary file to be executed.

I guess the basic concept is that it would be storing the code in a state just before code optimization is done in a compiler, maybe even just before machine code generation. So preprocessing, lexical analysis, parsing, and semantic analysis are already done prior to saving the file. This would reduce file size and also reduce compilation time. Think of Javascript programs that are compiled (and maybe even downloaded) every time a web page is open.

FactChecker · Dec 19, 2021

jack action said:

The interesting advantages cited in the OP are that you can:

follow the relations and enforce the rules before compiling

automate certain processes (ex.: if function A is used, called library X)

study relations in complex programs more easily

let programmers choose their own language (or maybe even their own grammar; you could choose that an expression is ended by a semicolon or a new line) when converting from/to a human-readable text file (maybe even reading/writing flow charts)

etc... (as I'm basically restating the OP)

It doesn't affect the speed of programming as, in the end, the SQL file must still be compiled to a binary file to be executed.

My main complaint is that it seems to suppress the order of execution, which I feel is usually the primary information in the code text. And if the sequence of execution is the main "sort order" in a database, I don't know if I would still call it a relational database.
Of course, it is still possible to build relational databases that are used as part of the compilation process. For all I know, that may already be done.

jack action · Dec 19, 2021

FactChecker said:

it seems to suppress the order of execution, which I feel is usually the primary information in the code text. And if the sequence of execution is the main "sort order" in a database, I don't know if I would still call it a relational database.

The order of execution wouldn't be different. It is about how the information is stored (which includes the order of execution).

Imagine each library stored as a database. Looking for information through the different libraries is much easier and can be done in a million different ways. You can find easily in a project if a library (database) is not used anymore. You may easily search a library that you already have for something you want to execute, instead of loading (or creating) a new library. You don't look for text (which could be written in a million different ways), you look for relations.

The 'relational' lies in the storage, not in the execution.

FactChecker said:

Of course, it is still possible to build relational databases that are used as part of the compilation process. For all I know, that may already be done.

What the OP says is: «Why not save the programs as those relational databases instead of the original text files? Much likely the file sizes and the compilation time will be smaller and the access to information will be more flexible.»

Mark44 · Dec 19, 2021

jack action said:

Imagine each library stored as a database.

This is already done with compiled languages like C and C++. The database has the form of a lookup table whose keys are the function names (C) or function signatures (C++). Associated with each function name/function signature is the compiled machine code for the function.

jack action said:

Looking for information through the different libraries is much easier and can be done in a million different ways. You can find easily in a project if a library (database) is not used anymore. You may easily search a library that you already have for something you want to execute, instead of loading (or creating) a new library.

There already are tools and techniques that can be used to determine which functions a library exports, so I don't see how the suggested changes do anything that isn't already done.

jack action said:

You don't look for text (which could be written in a million different ways), you look for relations.

For the compiled languages I'm talking about, the library (either a static library or dynamic link library) doesn't contain text -- it contains just the machine code associated with the functions plus any other objects that the library exports.

FactChecker · Dec 19, 2021

jack action said:

The order of execution wouldn't be different. It is about how the information is stored (which includes the order of execution).

So, would the order of execution be stored as a number in some of the entries, or as a pointer to the next line to execute, or what? IMO, it would not be a relational database because the most important information for execution is stored as a linked list. It just seems hopelessly confusing and hard to interpret to save such information that way.

suremarc · Dec 20, 2021

A friend of mine used to work at NYSE. The lead software architect there (his name is Alexei Lebedev) created a system that generates what is essentially a zero-overhead in-memory database in C++. You define your data in the form of tabular relations, and it knows how to generate efficient code for algorithms and data structures from it. I have not used this system myself, but from what I have heard they use it extensively to encode highly cross-referenced data types as well as business entities (for example, other exchanges and their attributes, e.g. MIC’s), and case-driven testing.

An open-source implementation can be found at github.com/alexeilebedec/openacr. Even if you are not going to use it, I think the problem it attempts to solve is interesting and the author wrote a decent amount explaining his rationale. I recommend the read.

Jarvis323 · Dec 20, 2021

elcaro said:

Summary:: Most programming languages and environments store program source as text in source (and accompanying) files. But a program can be seen as a collection of relations between different objects, and thus these relations could also be stored in a data base. For example: every Variable has a relation to a Type, every function call implies a relation between caller and callee, etc. Has there ever been an attempt to store program source in the form of relations into a database?

Almost all (compiled or interpreted) programming langues store the program source in the form of a series of bytes (using an encoding like ASCII or UTF-8) into a text file, enforcing the grammer of the programming language using a parser (as part of the compilation process or interpretation of the source text).

But intrinsically a program can also be seen as a collection of relations between different objects, which can be stored as tuples in a relational database system.

For example:

A program Variable needs to have a definition, which creates both a relation between the variable and a type, and also a location within another object (for example a function and/or module) in which the definition takes place.

A function call creates a relation between the calling function or module and a called function, a relation between its return value and a type, and a relation between the function and the object or module in which it was defined.

Etc.

No programming language or environment (perhaps with the exception of a language/programming environment like smalltalk) however stores these relations as such, most languages store it as text in a source file.

The advantages of storing program source as relations in a database are multiple, like for instance:

One can enforce strong typing rules for all objects.

One can produce many usefull insights into the program source, like dependencies between objects (what function gets called by what function, what modules or objects use what variables, etc.).

Compilation could be done piece wise (per object) as part of the editing proces (saving the object also pre-compiles it, and shows the errors encountered during compilation if any - source is stored anyway).

Each variable or object name needs to be stored only once, and renaming objects would only take place at one point. Optionally, each programmer could name objects in their native language and script.

The program itself can be stored in the form of an abstract syntax tree, simplyfying the process of creating object code and executable generation.

Refactoring a program, or moving functions between modules, and retaining all dependency relations (like header files), could be done much simpler and less error prone then in a textual environment.

Build/make information can be easily extracted from the relations already stored in the database, using the dependency relations already stored.

Optionally also the versioning system itself could be implemented as part of the programming development system, also storing version information as relations into a database.

For compatibility with the usual work environment and programming development tools, easy import and export functionality could be provided for such a programming development environment.

I guess it's probably not exactly what you're thinking about, but logic programming languages are based on sets of relations, with prolog being a good example.

the program logic is expressed in terms of relations, represented as facts and rules. A computation is initiated by running a query over these relations.

https://en.m.wikipedia.org/wiki/Prolog

Storing program source as relations in a database instead of text file

Similar threads

Hot Threads

Touch-typing for programmers

How to calculate Tension for a series of connected points?

Python Complaining About Python

Fortran Reading files in pre-f77 - handling end of file

Sequential Analog Computers?

Recent Insights

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers