Assembly language programming vs Other programming languages

In summary: As an example, my boss was a fantastic assembly code programmer. He would do things like reuse the area for program initialization as a buffer for reading in data from files. It saved memory to do this the expense of not being able to easily debug the program because a section of code has been overwritten with data. His rationale was that you wouldn't necessarily need to see that code since it was already run and no longer needed.As part of initialization, he would setup a long string of load and store commands to build a list of numbers:lda #1sta tbl+0lda #2sta tbl+1...When I saw the code, I told him I could make it more compact using a loop and he said I should
  • #106
TMT said:
Think on a case a machine having 16 register (as IBM), In program; you branch into a subroutine, you will going to save registers before starting subroutine and restore them before return back to caller Tel me how many programmer could count and mark which registers are altered in subroutine and write code to save and restore only those register. (register saving & restoring time cost is depend on # of register involved) But if your H_L compiler has an intelligence to consider this code will save & restore only altered registers process and optimize code accordingly. Please take this simple example as only to express my intention Since we can embed some logic in compiler we can let compiler will generate more optimal code than human can. (especially if you accept all programmer will not be smart as a versed one) in H_L compiler you even preprocess written code and localize some part as optimizable and apply specific process to optimize code generation. You can not train all your staff as versed assembler programmer. But you can use a high quality (optimization intelligence embedded) H_L language compiler to produce faster code.
Yes, of course to this as well. It's not a surprise that a dim-witted assembly programmer would likely produce slower code than code written in a high-level language and using a compiler that produces highly optimized object code. No surprise here.
 
Technology news on Phys.org
  • #107
TMT said:
Think on a case a machine having 16 register (as IBM) ...
Once again Mark has beat me to it in responding and again, I agree w/ him. You keep making valid points that are off the track of the thrust of this thread. I think we're going to have to just agree to disagree on this one.
 
  • Like
Likes jim mcnamara
  • #108
TMT said:
Most people think assembly is speedy than high level languages. But I'm telling if high level language is configured more intelligently, it may be faster than assembler.
In the case where compilers can produce assembly code, the assembly code would be the same as the high level language. In some cases, assembly programmers will have a compiler produce assembly code to look for "clever" code, typically when working with a processor new to the programmer. As posted by others here, the point of the thread isn't about intelligent compilers versus dumb assembly programmers.

See post #21 for example cases where assembly code is still being used today:

post #21
 
Last edited:
  • #109
Is it possible to know whether your program (written in a high-level language) is slow due to the compiler's reinterpretation of your input (i.e. the assembler output could be improved by human intelligence) or if you screwed up with the algorithm and you have to look for a better one?

For example the code below could be improved in speed I guess by doing the multiplication in the end:
Code:
int sum = 0.0;
for(int i=1; i<=10; i++){
    sum+= 2*i;
}

improved:
Code:
int sum = 0.0;
for(int i=0; i<10; i++){
    sum+=i;
}
sum *= 2;
or even better use the bitwise operations (sum<<1 ??)... at which point any delay would reach the innefficiency of the compiler vs human input?
 
  • #110
With the exception of fairly specialized circumstances, it's unlikely that a compiler will generate code that could be really significantly improved by twiddling the machine code but it's very easy to code an algorithm in a way that is very inefficient. Just as a trivial example, you could use a bubble sort instead of a quick sort.
 
  • Like
Likes ChrisVer
  • #111
It's rarely compiler's fault.
A colleague of mine loves to build long strings one character at a time, e.g.
C:
std::string toBeLong="";
for (int i=0; i<300; i++)
  toBeLong+=getc();
Depending on implementation, this could mean 300 reallocs are performed, plus the memory gets fragmented. I would allocate some memory in advance, but it's a few lines longer, and he argues that shorter code is better for maintenance.

He also uses a HashTable where I would use simple array with a lookup function - again, more coding on my side.

He uses Exceptions a lot, I catch them at first possible place and try not to raise any.

It is choices like this that you face most of the time. You'll use the best algorithm anyway, so not much space for improvement here.
And rewriting a piece of code into another language is pretty rare, kind of a "desperate measure", and often not even possible (JavaScript & co.)

Knowing assembly can help you estimate how fast/slow a piece of code will be, even if you never write assembly code.
 
  • Like
Likes ChrisVer
  • #112
Well moving for example the multiplication of 2 outside the for loop is both an algorithmic process as well as an improvement to the output assembler... I tried this simple example I gave above and the contents of the for loop in the two cases are:
Code:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 10             sub    $0x10,%rsp
   8:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
   f:   c7 45 f8 01 00 00 00    movl   $0x1,-0x8(%rbp)
  16:   eb 0c                   jmp    24 <main+0x24>
  18:   8b 45 f8                mov    -0x8(%rbp),%eax
  1b:   01 c0                   add    %eax,%eax       <<<<<<<<<<<<<<<<<<<<
  1d:   01 45 fc                add    %eax,-0x4(%rbp)
  20:   83 45 f8 01             addl   $0x1,-0x8(%rbp)
  24:   83 7d f8 0a             cmpl   $0xa,-0x8(%rbp)
  28:   7e ee                   jle    18 <main+0x18>

Improved:

Code:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 10             sub    $0x10,%rsp
   8:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
   f:   c7 45 f8 01 00 00 00    movl   $0x1,-0x8(%rbp)
  16:   eb 0a                   jmp    22 <main+0x22>
  18:   8b 45 f8                mov    -0x8(%rbp),%eax
  1b:   01 45 fc                add    %eax,-0x4(%rbp)
  1e:   83 45 f8 01             addl   $0x1,-0x8(%rbp)
  22:   83 7d f8 0a             cmpl   $0xa,-0x8(%rbp)
  26:   7e f0                   jle    18 <main+0x18>

I multiarrowed the line that is removed (which is doubling what was in the eax)
and then I may say that the first jump to 22 is unnecessary, which I think is a feature of the for loop - to make sure it should be entered... but I trust it will enter since I manually wrote for i=1 and not i=11 or 40 or ...
Also one interesting thing is that when the multiplication is done within the for loop, the assembler code does the doubling by add ... when it happens outside the for loop it does it with shifting left (and so *=2 and <<=2 in my case are equivalent).
 
Last edited:
  • #113
What compiler flags (and compiler) did you use?
 
  • #114
glappkaeft said:
What compiler flags (and compiler) did you use?
hmm... so I wrote a program in AStester.cxx
then used gcc -c AStester.cxx to produce the .o file (so I think no flags?)
and looked in the contents by objdump -D AStester.o
 
  • #115
ChrisVer said:
Is it possible to know whether your program (written in a high-level language) is slow due to the compiler's reinterpretation of your input (i.e. the assembler output could be improved by human intelligence) or if you screwed up with the algorithm and you have to look for a better one?

For example the code below could be improved in speed I guess by doing the multiplication in the end:
Code:
int sum = 0.0;
for(int i=1; i<=10; i++){
    sum+= 2*i;
}

improved:
Code:
int sum = 0.0;
for(int i=0; i<10; i++){
    sum+=i;
}
sum *= 2;
or even better use the bitwise operations (sum<<1 ??)... at which point any delay would reach the innefficiency of the compiler vs human input?
The second example is quite a bit better, as it takes the multiplication out of the loop. Multiplication is a lot more expensive than addition in terms of processor time, although an optimizing compiler would probably replace 2 * i with a shift.
 
  • #116
Mark44 said:
The second example is quite a bit better, as it takes the multiplication out of the loop. Multiplication is a lot more expensive than addition in terms of processor time, although an optimizing compiler would probably replace 2 * i with a shift.
Actually only the weakest CPUs have slow multiplication. Pretty much any ARM or Pentium and above only take 1 cycle per nonzero bit (or even less), that is, multiplying by 1001001 (binary) only takes 3 cycles.
And, pretty much any compiler does indeed replace multiplication by powers of 2 with shifts.
 
  • #117
SlowThinker said:
Actually only the weakest CPUs have slow multiplication. Pretty much any ARM or Pentium and above only take 1 cycle per nonzero bit (or even less), that is, multiplying by 1001001 (binary) only takes 3 cycles.
And, pretty much any compiler does indeed replace multiplication by powers of 2 with shifts
Well when I tested the above 2 code snippnets and compared times the "slow" one took ~120,000 and the "optimized" one took ~42,000 (tests : 10M iterations)... One extra interesting part was that I also tried the bitwise operation instead of the multiplication, and for up to 10M iterations, the bit-shifting operation was faster, above 10M the two became comparable (couldn't tell the difference).
 
  • #118
ChrisVer said:
Well when I tested the above 2 code snippnets and compared times the "slow" one took ~120,000 and the "optimized" one took ~42,000 (tests : 10M iterations)...
It's interesting that shortening a cycle from 6 to 5 instructions (and taking out the fastest one), the time was cut to 1/3.
This tells me
1) Speed of short loops depends on things like the actual address where it lands,
2) Too much optimization is a waste of time because you can't predict that
You can try -O3 flag or Microsoft or Intel compiler, optimized for speed. They include NOPs at various places to fix the alignment; then the results might be more comparable.
 
  • #119
Think on a case a machine having 16 register (as IBM), In program; you branch into a subroutine, you will going to save registers before starting subroutine and restore them before return back to caller

Reference https://www.physicsforums.com/threa...gramming-languages.912679/page-6#post-5772249

Smart hardware guys can help too.
There's the venerable TMS9900
where program counter defines the start of your stack
and can be any location in memory
so you can context switch with just one save...
 
  • #120
ChrisVer said:
hmm... so I wrote a program in AStester.cxx
then used gcc -c AStester.cxx to produce the .o file (so I think no flags?)
and looked in the contents by objdump -D AStester.o

Then the code is likely entirely unoptimized. An optimizing compiler would probably remove the code entirely if sum is not used elsewhere later (dead code elimination) or replace your code with sum = 110 (constant folding)

Edit: Note that is is considered bad practice to initialize an int with a double value.
 
  • #121
glappkaeft said:
Edit: Note that is is considered bad practice to initialize an int with a double value.
hmm yeh, I had started it double, but then it complained with the usage of << . I changed it to integer but I forgot to remove the .0 ...

Hm, so that means the assembler code would be something like:
movl $0x6e , -0x4 (%rbp)
(sum is registered in 0x4 and moves 110 in it)
without a for loop?
Does that happen because the compiler runs the code and gets the result before producing the output?
 
  • #122
ChrisVer said:
without a for loop?
Does that happen because the compiler runs the code and gets the result before producing the output?
It doesn't really run the code, but it optimizes it.

int chrisver()

{
int sum = 0;
for(int i=1; i<=10; i++)

{
sum+= 2*i;
}
return sum;
0FB94090 mov eax,6Eh
}
0FB94095 ret
 
  • #123
ChrisVer said:
Does that happen because the compiler runs the code and gets the result before producing the output?

Since it doesn't really run the code it is usually said that the compiler evaluates the code at compile time. If you are interested in these things the compiler optimization section on wikipedia is a pretty good place to start. For instance constant folding is here: https://en.wikipedia.org/wiki/Constant_folding

Added: If you want to see what the compiler does with your code in a real world scenario you need to make sure the number of times the for-loop is repeated is not known at compile time, e.g. depends on user keyboard input, data from a file or similar.
 
Last edited:
Back
Top