Differences between version 7 and previous revision of AssemblyLanguage.
Other diffs: Previous Major Revision, Previous Author, or view the Annotated Edit History
Newer page: | version 7 | Last edited on Sunday, October 26, 2003 8:36:14 am | by AristotlePagaltzis | Revert |
Older page: | version 6 | Last edited on Wednesday, October 8, 2003 9:47:21 pm | by AristotlePagaltzis | Revert |
@@ -1,28 +1,13 @@
-AssemblyLanguage is 1:1 translation of MachineCode into a human readable form
.
+AssemblyLanguage is 1:1 translation of MachineCode into English mnemonics
.
-----
-
-
The Art of AssemblyLanguage Programming is a delicate topic.
-There are many processor Architectures, with different instruction sets.
-A large list of different architectures
can be found somewhere on the [gcc(1)] page, but here are a couple: Intel [x86], [MIPS],
and the Motorola m68000 series.
-
-AssemblyLanguage
is a language constructed of instructions which correlate
to MachineCode on a 1 for 1 basis
. Thus each
AssemblyLanguage instruction
is a MachineCode instruction
.
-
-The most common form
of AssemblyLanguage programming is done on
the [x86] Architecture. A sample piece of [AssemblyLanguage] code
for Linux can be found in
the [HelloWorld] section.
-
-It is a common fact that AssemblyLanguage programmers get paid more per line
of code than those who hack away in higher level languages
.
+The Art of AssemblyLanguage Programming is a delicate topic. By programming in AssemblyLanguage you
can hand optimize code
and achieve efficiency that
is difficult if not impossible to duplicate in
a higher level
language. However, current computers are fast enough
to write most code in less efficient higher level languages
. AssemblyLanguage is still used for embedded systems (where space and CPU speed are limited), and in parts of an OperatingSystem that are run very frequently or must run fast
. Some parts
of the GNU C library are also written in assembly
for the same reasons (for example, some
of the maths functions)
.
-AssemblyLanguage programming has the following advantages:
-* The hacker is able to HandOptimize
code as it
is being written.
-* It is very difficult, if
not impossible
, to create code in
a higher level language which will execute faster than hand
-optimized AssemblyLanguage
.
+AssemblyLanguage code is not portable across different [CPU] architectures, of which there are many: Intel [x86], [MIPS], and the Motorola m68000 series
, to name but
a few. Early versions of [Unix] were written in assembler, and when BellLabs got new machines, they re
-wrote their operating system for the new MachineCode, until they finally re-wrote most of it in [C] in 1973
.
-Disadvantages of
AssemblyLanguage programming:
-* The
code is very
difficult to read, especially when having to
maintain somebody elses code
.
-*
It is usually easier to start from scratch than to debug faulty code.
-* Due to the above two reasons, debugging is rarely done. Especially on hand-optimized
code.
+AssemblyLanguage code is difficult to understand and
maintain. It is usually easier to start from scratch than to debug faulty code.
-A Compiler such as [
gcc(1)]
will hide it's
generation of AssemblyLanguage code from you as it generates it's
object files and the executables.
It is however possible to tell it to generate the AssemblyLanguage code for you by passing it the -S CommandLine option
+A Compiler such as gcc(1) will hide its
generation of AssemblyLanguage code from you as it generates its
object files and the executables. It is however possible to tell it to generate the AssemblyLanguage code for you by passing it the __
-S__
CommandLine option
Here is an example. First, the [C] code:
int main(void) {
int i;
@@ -34,9 +19,9 @@
return i;
}
Now you can translate this to assembler. If I do this on an [x86] (ie [Intel] machine), I get:
-
$ gcc -S x.c ; cat x.s
+ __
$ gcc -S x.c ; cat x.s__
.file "x.c"
.version "01.01"
gcc2_compiled.:
.section .rodata
@@ -73,12 +58,14 @@
.Lfe1:
.size main,.Lfe1-main
.ident "GCC: (GNU) 2.95.3 20010315 (release)"
-The commands
movl, jmp, addl, etc are [OpCodes] - that is they are mnemonics for individual
[CPU
] instructions.
+__
movl__
, __
jmp__
, __
addl__
, etc are mnemonics for individual
[CPU] instruction
OpCodes. __%esp__, ___%ebp__ etc are mnemonics for registers. For example, __%esp__ is the [Stack
] Pointer
- it points to the top of the current process's [Stack]. The first __movl__ copies the value in __%esp__ into __%ebp__, then the __subl__ subtracts 24 off __%esp__, so
that the [Stack] has grown by 24 bytes. The next __movl__ copies the value 5 into [Stack], 4 bytes below its end. This address
is where the variable __i__ is being stored, so all accesses to __i__ in the
[C
] code become references to this memory location in MachineCode. We can also witness an optimization here: instead of doing i*3, it does i+(i+i). That's the __addl__ and __leal__
instructions. Below that, it puts some pointers (to __printf__'s arguments) on the stack and calls __printf__, which pulls its arguments from the stack
.
-The %esp, %ebp etc are registers. For example, %esp is the Stack Pointer - it points to the base(?) of the current process's memory stack. The first "movl" copies the value in %esp into %ebp, then the "subl" subtracts 24 off %esp, so that the stack has grown by 24 bytes. The next "movl" copies the value 5 into stack, 4 bytes below end of the stack. This address is where the variable i is being stored, so all accesses to i in the C code become references to this memory location in assembler.
As you can see, explaining what assembler
is doing line-by-line is tediously boring. Instead of doing i*3, it does i+(i+i). That's the "addl" and "leal" instructions. Below that, it puts some pointers (to printf's arguments) on the stack and calls printf, which gets it's arguments off the stack
. This is how programmers used to write code. Early versions
of [Unix] were written in assembler - when BellLabs got new machines, they re-wrote their operating system for the new machine
code, until they re-wrote it in [C]
in 1973
.
+As you can see, explaining what AssemblyLanguage code
is doing line-by-line is tediously boring. This is how programmers used to write code, and it is a common fact that AssemblyLanguage programmers get paid more per line
of code than those who hack away
in higher level languages
.
-From this we
can also note that it is extremely bad for your health to rely on the [
gcc(1)]
output of some [C] code when learning [AssemblyLanguage
]. [
gcc(1)]
generates some
extremely horrid code on occassions
especially when working with multiplication and division. This is due to the fact that the AssemblyLanguage
multiplication and division instructions use three of
the four general purpose
registers for their inputs and results
.
+We
can also note that it is extremely bad for your health to rely on the gcc(1) output of some [C] code when learning [x86
] AssemblyLanguage
. gcc(1) generates extremely horrid code on occassion,
especially when working with multiplication and division because [x86]
multiplication and division instructions are restricted in
the registers they can use
.
---
-See also OpCodes
+Another sample piece of [AssemblyLanguage] code for [Linux] can be found in the [HelloWorld] page.
+
+--
--
+CategoryProgrammingLanguages