How compiling works under Linux (and Unices in general)

Note this is the longwinded approach to compiling a C program. gcc(1) is smart enough to do most of these steps for you automagically. :)

  1. cpp(1) takes a .c and some .h files, preprocess the source and generates an .i file.

    This file is the ultimate SourceCode, with all the #includes expanded out and all the #defines replaced.

  2. gcc(1)then takes the .i file and generates a .s file (assembler source).
  3. as(1) then takes the .s file and generates a .o file (object file).

    These are fragments of MachineCode with unresolved symbols. This means that the addresses of various variables and subroutines are not yet known, and any CPU instructions that refer to these unknown addresses must be filled in.

  4. ld(1) then takes the .o file(s) and links it/them with any libraries, resolves the symbols, and generates an executable or .a library.

    a.out is the default name given to a program if none was specified with the -o switch. The reason for this is that it used to be the assembler output (before seperate linking was used), and the assembler was called a, hence a's .out file.

    .a files are libraries of .o files. They are kinda like TarBalls of .o files. ar(1) is the tool to manage them. If you run ranlib(1) over such an archive it will create an index of all the symbols, making your compiles faster. I believe GNU ar(1) keeps the symbol table up to date so ranlib(1) isn't required, but I could be wrong.

  5. strip(1) can then optionally remove any unneeded information in the executable (such as debugging information) to reduce its size.

Alternatively, gcc foo.c baz.c -o baz will sort the entire thing out for you. :)


libtool(1) is a program to manage libraries in a CrossPlatform manner under Unix.

Instead of making a BinaryExecutable, you can make a SharedLibrary by compiling with the flags -shared and -fPIC. The latter is optional; it means to create PositionIndependentCode. If you don't use it then when the library is loaded into memory will relocate the symbols for you which will write to the memory used by the library, and thusly will cause that library not to be shared between processes due to CopyOnWrite. I don't know why you wouldn't want to use PositionIndependentCode if your platform supports it, so use it. :)

gcc foo.c -shared -fPIC -c -o foo.o

You can open dynamically loaded modules using dlopen(3). You can probably link against these, although I've never bothered figuring out how.

If you want to make a library that is statically compiled into a program then compile it into a .a file called libthenameofyourlibrary.a and put it in some directory. Then when compiling your main program use -L/path/to/the/libraries to make the compiler search that directory and put -lthenameofyourlibrary on the command line. Eg.
gcc foo.c -c -o foo.o
ar rcs libs/libfoo.a foo.o
ranlib libs/libfoo.a
gcc baz.c -Llibs/ -lfoo -o baz

See also