Diff: AssemblyLanguage - Waikato Linux Users Group

Differences between version 5 and predecessor to the previous major change of AssemblyLanguage.

Other diffs: Previous Revision, Previous Author, or view the Annotated Edit History

Newer page:	version 5	Last edited on Sunday, August 17, 2003 12:58:13 pm	by StuartYeates	Revert
Older page:	version 2	Last edited on Wednesday, September 11, 2002 12:37:40 pm	by JohnMcPherson	Revert

@@ -1,11 +1,15 @@

+AssemblyLanguage is 1:1 translation of MachineCode into a human readable form.

+

+----

+

The Art of AssemblyLanguage Programming is a delicate topic.

There are many processor Architectures, with different instruction sets.

-A large list of different architectures can be found somewhere on the [gcc(1)] page, but here are a couple: Intel x86, [MIPS], and the Motorola m68000 series.

+A large list of different architectures can be found somewhere on the [gcc(1)] page, but here are a couple: Intel [ x86] , [MIPS], and the Motorola m68000 series.

AssemblyLanguage is a language constructed of instructions which correlate to MachineCode on a 1 for 1 basis. Thus each AssemblyLanguage instruction is a MachineCode instruction.

-The most common form of AssemblyLanguage programming is done on the x86 Architecture. A sample piece of [AssemblyLanguage] code for Linux can be found in the [HelloWorld] section.

+The most common form of AssemblyLanguage programming is done on the [ x86] Architecture. A sample piece of [AssemblyLanguage] code for Linux can be found in the [HelloWorld] section.

It is a common fact that AssemblyLanguage programmers get paid more per line of code than those who hack away in higher level languages.

AssemblyLanguage programming has the following advantages:

@@ -29,9 +33,9 @@

i=0xff;

return i;

}

-Now you can translate this to assembler. If I do this on an ~~ix86~~ (ie [Intel] machine), I get:

+Now you can translate this to assembler. If I do this on an [x86] (ie [Intel] machine), I get:

$ gcc -S x.c ; cat x.s

.file "x.c"

.version "01.01"

gcc2_compiled.:

@@ -68,6 +72,13 @@

ret

.Lfe1:

.size main,.Lfe1-main

.ident "GCC: (GNU) 2.95.3 20010315 (release)"

+

+The commands movl, jmp, addl, etc are [OpCodes] - that is they are mnemonics for individual [CPU] instructions.

The %esp, %ebp etc are registers. For example, %esp is the Stack Pointer - it points to the base(?) of the current process's memory stack. The first "movl" copies the value in %esp into %ebp, then the "subl" subtracts 24 off %esp, so that the stack has grown by 24 bytes. The next "movl" copies the value 5 into stack, 4 bytes below end of the stack. This address is where the variable i is being stored, so all accesses to i in the C code become references to this memory location in assembler. As you can see, explaining what assembler is doing line-by-line is tediously boring. Instead of doing i*3, it does i+(i+i). That's the "addl" and "leal" instructions. Below that, it puts some pointers (to printf's arguments) on the stack and calls printf, which gets it's arguments off the stack. This is how programmers used to write code. Early versions of [Unix] were written in assembler - when BellLabs got new machines, they re-wrote their operating system for the new machine code, until they re-wrote it in [C] in 1973.

+

+From this we can also note that it is extremely bad for your health to rely on the [gcc(1)] output of some [C] code when learning [AssemblyLanguage]. [gcc(1)] generates some extremely horrid code on occassions especially when working with multiplication and division. This is due to the fact that the AssemblyLanguage multiplication and division instructions use three of the four general purpose registers for their inputs and results.

+

+--

+See also OpCodes