Diff: AssemblyLanguage - Waikato Linux Users Group

Differences between current version and revision by previous author of AssemblyLanguage.

Other diffs: Previous Major Revision, Previous Revision, or view the Annotated Edit History

Newer page:	version 12	Last edited on Saturday, October 7, 2006 6:37:17 pm	by AristotlePagaltzis
Older page:	version 11	Last edited on Thursday, July 1, 2004 11:05:27 am	by JohnMcPherson	Revert

@@ -5,69 +5,76 @@

AssemblyLanguage code is not portable across different [CPU] architectures, of which there are many: Intel [x86], [MIPS], and the Motorola m68000 series, to name but a few. Early versions of [Unix] were written in assembler, and when BellLabs got new machines, they re-wrote their operating system for the new MachineCode, until they finally re-wrote most of it in [C] in 1973.

AssemblyLanguage code is difficult to understand and maintain. It is usually easier to start from scratch than to debug faulty code.

-A Compiler such as [GCC] will hide its generation of AssemblyLanguage code from you as it generates its object files and the executables. It is however possible to tell it to generate the AssemblyLanguage code for you by passing it the __ -S__ CommandLine option

+A Compiler such as [GCC] will hide its generation of AssemblyLanguage code from you as it generates its object files and the executables. It is however possible to tell it to generate the AssemblyLanguage code for you by passing it the <tt> -S</tt> CommandLine option

Here is an example. First, the [C] code:

- ~~int main(void) {~~

- ~~int i;~~

- i = 5;

- i = i * 3;

- printf("%d\n",i);

- i = 0xff;

- return i;

- }

+<verbatim>

+#include <stdio.h>

+

+int main(void) {

+ int i;

+

+ i = 5;

+ i = i * 3;

+ printf("%d\n",i);

+ i = 0xff;

+ return i;

+}

+</verbatim>

Now you can translate this to assembler. If I do this on an [x86] (ie [Intel] machine), I get:

- ~~__$ gcc -S x.c ; cat x.s__~~

- ~~.file "x.c"~~

- ~~.version "01.01"~~

- ~~gcc2_compiled.:~~

- ~~.section .rodata~~

- ~~.LC0:~~

- ~~.string "%d\n"~~

- ~~.text~~

- ~~.align 4~~

- ~~.globl main~~

- ~~.type main,@function~~

- ~~main:~~

- ~~pushl %ebp~~

- ~~movl %esp,%ebp~~

- ~~subl $24,%esp~~

- ~~movl $5,-4(%ebp)~~

- ~~movl -4(%ebp),%eax~~

- ~~movl %eax,%edx~~

- ~~addl %edx,%edx~~

- ~~leal (%eax,%edx),%ecx~~

- ~~movl %ecx,-4(%ebp)~~

- ~~addl $-8,%esp~~

- ~~movl -4(%ebp),%eax~~

- ~~pushl %eax~~

- ~~pushl $.LC0~~

- ~~call printf~~

- ~~addl $16,%esp~~

- ~~movl $255,-4(%ebp)~~

- ~~movl -4(%ebp),%edx~~

- ~~movl %edx,%eax~~

- ~~jmp .L2~~

- ~~.p2align 4,,7~~

- ~~.L2:~~

- ~~leave~~

- ~~ret~~

- ~~.Lfe1:~~

- ~~.size main,.Lfe1-main~~

- ~~.ident "GCC: (GNU) 2.95.3 20010315 (release)"~~

-__~~movl~~ __, __ jmp__ , __ addl__ , etc are mnemonics for individual [CPU] instruction OpCodes. __ %esp__ , ~~___~~ %ebp__ etc are mnemonics for registers. For example, __ %esp__ is the [Stack] Pointer - it points to the top of the current process's [Stack]. The first __ movl__ copies the value in __ %esp__ into __ %ebp__ , then the __ subl__ subtracts 24 off __ %esp__ , so that the [Stack] has grown by 24 bytes. The next __ movl__ copies the value 5 into [Stack], 4 bytes below its end. This address is where the variable __ i__ is being stored, so all accesses to __ i__ in the [C] code become references to this memory location in MachineCode. We can also witness an optimization here: instead of doing i*3, it does i+(i+i). That's the __ addl__ and __ leal__ instructions. Below that, it puts some pointers (to __ printf__ 's arguments) on the stack and calls __ printf__ , which pulls its arguments from the stack.

+<pre>

+ __$ gcc -S x.c && cat x.s __

+ .file "x.c"

+ .section .rodata

+.LC0:

+ .string "%d\n"

+ .text

+.globl main

+ .type main , @function

+main:

+ pushl %ebp

+ movl %esp, %ebp

+ subl $8, %esp

+ andl $-16, %esp

+ movl $, %eax

+ addl $15, %eax

+ shrl $4, %eax

+ sall $4, %eax

+ subl %eax, %esp

+ movl $5, -4(%ebp)

+ movl -4(%ebp), %edx

+ movl %edx, %eax

+ addl %eax, %eax

+ addl %edx, %eax

+ movl %eax, -4(%ebp)

+ subl $8, %esp

+ pushl -4(%ebp)

+ pushl $.LC0

+ call printf

+ addl $16, %esp

+ movl $255, -4(%ebp)

+ movl -4(%ebp), %eax

+ leave

+ ret

+ .size main, .-main

+ .section .note.GNU-stack,"",@progbits

+ .ident "GCC: (GNU) 3.4.6"

+</pre>

+

+<tt>movl</tt>, <tt> jmp</tt> , <tt> addl</tt> , etc are mnemonics for individual [CPU] instruction OpCodes. <tt> %esp</tt> , <tt> %ebp</tt> etc are mnemonics for registers. For example, <tt> %esp</tt> is the [Stack] Pointer - it points to the top of the current process's [Stack]. The first <tt> movl</tt> copies the value in <tt> %esp</tt> into <tt> %ebp</tt> , then the <tt> subl</tt> subtracts 24 off <tt> %esp</tt> , so that the [Stack] has grown by 24 bytes. The next <tt> movl</tt> copies the value 5 into [Stack], 4 bytes below its end. This address is where the variable <tt> i</tt> is being stored, so all accesses to <tt> i</tt> in the [C] code become references to this memory location in MachineCode. We can also witness an optimization here: instead of doing i*3, it does i+(i+i). That's the <tt> addl</tt> and <tt> leal</tt> instructions. Below that, it puts some pointers (to <tt> printf</tt> 's arguments) on the stack and calls <tt> printf</tt> , which pulls its arguments from the stack.

As you can see, explaining what AssemblyLanguage code is doing line-by-line is tediously boring. This is how programmers used to write code, and it is a common fact that AssemblyLanguage programmers get paid more per line of code than those who hack away in higher level languages.

We can also note that it is extremely bad for your health to rely on the [GCC] output of some [C] code when learning [x86] AssemblyLanguage. [GCC] generates extremely horrid code on occassion, especially when working with multiplication and division because [x86] multiplication and division instructions are restricted in the registers they can use.

-However, the output of [GCC] can be a tremendously useful resource when optimising [C] code. Especialy when mixing different sizes of integers (char, int, long), the resulting MachineCode is sometimes flooded with unexpected typecasting instructions. While concealed at the [C] level, these extra instructions are quite obvious in the AssemblyLanguage (lots of __ and__ instructions and often additional __ mov__ ).

+However, the output of [GCC] can be a tremendously useful resource when optimising [C] code. Especialy when mixing different sizes of integers (char, int, long), the resulting MachineCode is sometimes flooded with unexpected typecasting instructions. While concealed at the [C] level, these extra instructions are quite obvious in the AssemblyLanguage (lots of <tt> and</tt> instructions and often additional <tt> mov</tt> ).

-Another sample piece of [ AssemblyLanguage] code for [Linux] can be found in the [ HelloWorld] page.

+Another sample piece of AssemblyLanguage code for [Linux] can be found in the HelloWorld page.

----

CategoryProgrammingLanguages