You might also be interested in the CommonProgrammingBugs page. Beware of HeisenBugs. If you aren't a programmer, you can help by WritingBugReports.
Debugging under Linux is done mostly by using gdb(1).
If you want to debug a program:
For a little more info see http://wand.net.nz/iam4/208/gdb.html
strace(1) lets you see what a program is doing in a coarse kind of way, if you think strace(1) is too quiet, perhaps ltrace(1)? is for you. for the bsdites amongst us, I believe these are called struss(1)? and sotrace(1)?. Darwin (MacOSX) has ptrace and ktrace (and kdump to read the created file).
The command for this is:
strace ''programname''
if the program is already running:
strace -p ''pid''
will also work.
If your program hangs, you can press Alt-\ to send it a SIGQUIT and force it to dump core. You can also force them to dump core with the command:
kill -QUIT ''programpid''
To allow crashing programs to create CoreDumps you have to remove the ulimit(1) on them. This can be done with the command:
ulimit -c unlimited
Note, this is for the shell (and all its children) only.
By default core files are placed in the working directory (often the same directory the executable is in). This may not be ideal for you if the executable is on a read only file system. To change this behaviour you can use the following command. </verbatim>
echo /var/cores/core.%e.%p >/proc/sys/kernel/core_pattern
</verbatim> %e is replaced by the executable name and %p is replaced by the pid of the process. For more possible replacements see fs/exec.c in your nearest kernel source.
gdb(1) can also do postmortem analysis on core files like so:
gdb ./''program'' ./''corefile''
If you run gdb(1) on your program and it displays the names of the functions but doesn't display their types (eg: what arguments they have or line number information) you probably didn't compile them with "-g".
You can use gdb to attach to a currently running process. For example, to change where its stderr is going:
$ gdb <executable> <process_id> (gdb) call close(2) $1 = 0 (gdb) call open("/tmp/prog-debug", 0101) $2 = 2 (gdb) cont
Note that the octal 0101 stands for O_CREAT|O_WRONLY, since gdb will complain about no debugging symbols for resolving those words otherwise. Check with your /usr/include files... the c library with debian testing at least has these definitions in /usr/include/bits/fcntl.h. (0100 + 01).
ddd(1)? appears to be a reasonable GUI interface to gdb(1) for those that are afraid of CommandLines.
Insight is another.
use assert(3) everywhere in your source code. It's much nicer at finding your bugs closer to where the bug actually hides.
Note: When using gdb(1) to debug a threaded program, gdb(1) catches two signals (SIGPWR & SIGXCPU) which are used internally by pthreads on Linux. Use
(gdb) handle SIGPWR pass nostop noprint (gdb) handle SIGXCPU pass nostop noprint
to stop gdb halting on receiving these signals.
Other neat tools for diagnosing memory errors are: