Penguin
Diff: HowToKernelAnalysisHOWTO
EditPageHistoryDiffInfoLikePages

Differences between current version and predecessor to the previous major change of HowToKernelAnalysisHOWTO.

Other diffs: Previous Revision, Previous Author, or view the Annotated Edit History

Newer page: version 3 Last edited on Tuesday, October 26, 2004 10:08:39 am by AristotlePagaltzis
Older page: version 2 Last edited on Friday, June 7, 2002 1:06:52 am by perry Revert
@@ -1,4755 +1 @@
-  
-  
-  
-!KernelAnalysis-HOWTO  
-  
-  
-  
-----  
-  
-!!!!KernelAnalysis-HOWTO  
-  
-!!Roberto Arcomano v0.61 - June 2, 2002  
-  
-  
-----  
-''This document tries to explain some things about the Linux Kernel, such  
-as the most important components, how they work, and so on. This HOWTO should  
-help prevent the reader from needing to browse all the kernel source files  
-searching for the"right function," declaration, and definition, and then linking  
-each to the other. You can find the latest version of this document at  
-http://bertolinux.fatamorgana.com If  
-you have suggestions to help make this document better, please submit your  
-ideas to me at the following address:  
-berto@fatamorgana.com''  
-----  
-  
-  
-  
-  
-!!1. Introduction  
-  
-  
-*1.1 Introduction  
-  
-*1.2 Copyright  
-  
-*1.3 Translations  
-  
-*1.4 Credits  
-  
-  
-  
-  
-  
-!!2. Syntax used  
-  
-  
-*2.1 Function Syntax  
-  
-*2.2 Indentation  
-  
-*2.3 !InterCallings Analysis  
-  
-  
-  
-  
-  
-!!3. Fundamentals  
-  
-  
-*3.1 What is the kernel?  
-  
-*3.2 What is the difference between User Mode and Kernel Mode?  
-  
-*3.3 Switching from User Mode to Kernel Mode  
-  
-*3.4 Multitasking  
-  
-*3.5 Microkernel vs Monolithic OS  
-  
-*3.6 Networking  
-  
-*3.7 Virtual Memory  
-  
-  
-  
-  
-  
-!!4. Linux Startup  
-  
-  
-  
-  
-!!5. Linux Peculiarities  
-  
-  
-*5.1 Overview  
-  
-*5.2 Pagination only  
-  
-*5.3 Softirq  
-  
-*5.4 Kernel Threads  
-  
-*5.5 Kernel Modules  
-  
-*5.6 Proc directory  
-  
-  
-  
-  
-  
-!!6. Linux Multitasking  
-  
-  
-*6.1 Overview  
-  
-*6.2 Timeslice  
-  
-*6.3 Scheduler  
-  
-*6.4 Bottom Half, Task Queues. and Tasklets  
-  
-*6.5 Very low level routines  
-  
-*6.6 Task Switching  
-  
-*6.7 Fork  
-  
-  
-  
-  
-  
-!!7. Linux Memory Management  
-  
-  
-*7.1 Overview  
-  
-*7.2 Specific i386 implementation  
-  
-*7.3 Memory Mapping  
-  
-*7.4 Low level memory allocation  
-  
-*7.5 Swap  
-  
-  
-  
-  
-  
-!!8. Linux Networking  
-  
-  
-*8.1 How Linux networking is managed?  
-  
-*8.2 TCP example  
-  
-  
-  
-  
-  
-!!9. Linux File System  
-  
-  
-  
-  
-!!10. Useful Tips  
-  
-  
-*10.1 Stack and Heap  
-  
-*10.2 Application vs Process  
-  
-*10.3 Locks  
-  
-*10.4 Copy_on_write  
-  
-  
-  
-  
-  
-!!11. 80386 specific details  
-  
-  
-*11.1 Boot procedure  
-  
-*11.2 80386 (and more) Descriptors  
-  
-  
-  
-  
-  
-!!12. IRQ  
-  
-  
-*12.1 Overview  
-  
-*12.2 Interaction schema  
-  
-  
-  
-  
-  
-!!13. Utility functions  
-  
-  
-*13.1 list_entry [[include/linux/list.h ]  
-  
-*13.2 Sleep  
-  
-  
-  
-  
-  
-!!14. Static variables  
-  
-  
-*14.1 Overview  
-  
-*14.2 Main variables  
-  
-  
-  
-  
-  
-!!15. Glossary  
-  
-  
-  
-  
-!!16. Links  
-----  
-  
-!!1. Introduction  
-  
-!!1.1 Introduction  
-  
-  
-  
-This HOWTO tries to define how parts of the__ __Linux Kernel work, what are  
-the main functions and data structures used, and how the "wheel spins". You can  
-find the latest version of this document at  
-http://www.fatamorgana.com/bertolinux If you have suggestions to help  
-make this document better, please submit your ideas to me at the following  
-address:  
-berto@fatamorgana.comCode used within this document refers to the Linux Kernel version  
-2.4.x, which is the last stable kernel version at time of writing this HOWTO.  
-  
-!!1.2 Copyright  
-  
-  
-  
-Copyright (C) 2000,2001,2002 Roberto Arcomano. This document is free; you  
-can redistribute it and/or modify it under the terms of the GNU General Public  
-License as published by the Free Software Foundation; either version 2 of the  
-License, or (at your option) any later version. This document is distributed  
-in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even  
-the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  
-See the GNU General Public License for more details. You can get a copy of  
-the GNU GPL  
- here  
-!!1.3 Translations  
-  
-  
-  
-If you want to translate this document you are free to do so. However,  
-you will need to do the following:  
-  
-  
-  
-  
-  
-#Check that another version of the document doesn't already exist at your  
-local LDP  
-#  
-  
-#Maintain all 'Introduction' sections (including 'Introduction', 'Copyright',  
-'Translations' , 'Credits').  
-#  
-  
-  
-  
-Warning! You don't have to translate TXT or HTML file, you have to modify  
-LYX or SGML file, so that it is possible to convert it all other formats (TXT,  
-HTML, RIFF, etc.).  
-  
-  
-No need to ask me to translate! You just have to let me know (if you want)  
-about your translation.  
-  
-  
-Thank you for your translation!  
-  
-!!1.4 Credits  
-  
-  
-  
-Thanks to  
-Linux Documentation Project for publishing and uploading my document quickly.  
-----  
-  
-!!2. Syntax used  
-  
-!!2.1 Function Syntax  
-  
-  
-  
-When speaking about a function, we write:  
-  
-  
-  
-  
-"function_name [[ file location . extension ]"  
-  
-  
-  
-For example:  
-  
-  
-  
-  
-"schedule [[kernel/sched.c]"  
-  
-  
-  
-tells us that we talk about  
-  
-  
-"schedule"  
-  
-  
-function retrievable from file  
-  
-  
-[[ kernel/sched.c ]  
-  
-  
-Note: We also assume /usr/src/linux as the starting directory.  
-  
-!!2.2 Indentation  
-  
-  
-  
-Indentation in source code is 3 blank characters.  
-  
-!!2.3 !InterCallings Analysis  
-  
-  
-!Overview  
-  
-  
-We use the"!InterCallings Analysis "(ICA) to see (in an indented fashion)  
-how kernel functions call each other.  
-  
-  
-For example, the sleep_on command is described in ICA below:  
-  
-  
-  
-  
-|sleep_on  
-|init_waitqueue_entry --  
-|__add_wait_queue | enqueuing request  
-|list_add |  
-|__list_add --  
-|schedule --- waiting for request to be executed  
-|__remove_wait_queue --  
-|list_del | dequeuing request  
-|__list_del --  
-sleep_on ICA  
-  
-  
-  
-The indented ICA is followed by functions' locations:  
-  
-  
-  
-  
-  
-*sleep_on [[kernel/sched.c]  
-*  
-  
-*init_waitqueue_entry [[include/linux/wait.h]  
-*  
-  
-*__add_wait_queue  
-*  
-  
-*list_add [[include/linux/list.h]  
-*  
-  
-*__list_add  
-*  
-  
-*schedule [[kernel/sched.c]  
-*  
-  
-*__remove_wait_queue [[include/linux/wait.h]  
-*  
-  
-*list_del [[include/linux/list.h]  
-*  
-  
-*__list_del  
-*  
-  
-  
-  
-Note: We don't specify anymore file location, if specified just before.  
-  
-!Details  
-  
-  
-In an ICA a line like looks like the following  
-  
-  
-  
-  
-function1 -> function2  
-  
-  
-  
-means that < function1 > is a generic pointer to another function.  
-In this case < function1 > points to < function2 >.  
-  
-  
-When we write:  
-  
-  
-  
-  
-function:  
-  
-  
-  
-it means that < function > is not a real function. It is a label  
-(typically assembler label).  
-  
-  
-In many sections we may report a ''C'' code or a ''pseudo-code''. In real  
-source files, you could use ''assembler'' or ''not structured'' code. This  
-difference is for learning purposes.  
-  
-!PROs of using ICA  
-  
-  
-The advantages of using ICA (!InterCallings Analysis) are many:  
-  
-  
-  
-  
-  
-*You get an overview of what happens when you call a kernel function  
-*  
-  
-*Function locations are indicated after the function, so ICA could also  
-be considered as a little ''function reference''  
-*  
-  
-*!InterCallings Analysis (ICA) is useful in sleep/awake mechanisms, where  
-we can view what we do before sleeping, the proper sleeping action, and what  
-we'll do after waking up (after schedule).  
-*  
-  
-  
-!CONTROs of using ICA  
-  
-  
-  
-  
-  
-*Some of the disadvantages of using ICA are listed below:  
-*  
-  
-  
-  
-As all theoretical models, we simplify reality avoiding many details, such  
-as real source code and special conditions.  
-  
-  
-  
-  
-  
-*Additional diagrams should be added to better represent stack conditions,  
-data values, and so on.  
-*  
-  
-----  
-  
-!!3. Fundamentals  
-  
-!!3.1 What is the kernel?  
-  
-  
-  
-The kernel is the "core" of any computer system: it is the "software" which  
-allows users to share computer resources.  
-  
-  
-The kernel can be thought ofas the main software of the OS (Operating System),  
-which may also include graphics management.  
-  
-  
-For example, under Linux (like other Unix-like OSs), the XWindow environment  
-doesn't belong to the Linux Kernel, because it manages only graphical operations  
-(it uses user mode I/O to access video card devices).  
-  
-  
-By contrast, Windows environments (Win9x, WinME, WinNT, Win2K, WinXP, and  
-so on) are a mix between a graphical environment and kernel.  
-  
-!!3.2 What is the difference between User Mode and Kernel Mode?  
-  
-  
-!Overview  
-  
-  
-Many years ago, when computers were as big as a room, users ran their applications  
-with much difficulty and, sometimes, their applications crashed the computer.  
-  
-  
-  
-  
-!Operative modes  
-  
-  
-To avoid having applications that constantly crashed, newer OSs were designed  
-with 2 different operative modes:  
-  
-  
-  
-  
-  
-#Kernel Mode: the machine operates with critical data structure, direct  
-hardware (IN/OUT or memory mapped), direct memory, IRQ, DMA, and so on.  
-#  
-  
-#User Mode: users can run applications.  
-#  
-  
-  
-  
-  
-  
-| Applications /|\  
-| ______________ |  
-| | User Mode | |  
-| ______________ |  
-| | |  
-Implementation | _______ _______ | Abstraction  
-Detail | | Kernel Mode | |  
-| _______________ |  
-| | |  
-| | |  
-| | |  
-\|/ Hardware |  
-  
-  
-  
-Kernel Mode "prevents" User Mode applications from damaging the system or  
-its features.  
-  
-  
-Modern microprocessors implement in hardware at least 2 different states.  
-For example under Intel, 4 states determine the PL (Privilege Level). It is  
-possible to use ,1,2,3 states, with 0 used in Kernel Mode.  
-  
-  
-Unix OS requires only 2 privilege levels, and we will use such a paradigm  
-as point of reference.  
-  
-!!3.3 Switching from User Mode to Kernel Mode  
-  
-  
-!When do we switch?  
-  
-  
-Once we understand that there are 2 different modes, we have to know when  
-we switch from one to the other.  
-  
-  
-Typically, there are 2 points of switching:  
-  
-  
-  
-  
-  
-#When calling a System Call: after calling a System Call, the task voluntary  
-calls pieces of code living in Kernel Mode  
-#  
-  
-#When an IRQ (or exception) comes: after the IRQ an IRQ handler (or exception  
-handler) is called, then control returns back to the task that was interrupted  
-like nothing was happened.  
-#  
-  
-  
-!System Calls  
-  
-  
-System calls are like special functions that manage OS routines which live  
-in Kernel Mode.  
-  
-  
-A system call can be called when we:  
-  
-  
-  
-  
-  
-*access an I/O device or a file (like read or write)  
-*  
-  
-*need to access privileged information (like pid, changing scheduling policy  
-or other information)  
-*  
-  
-*need to change execution context (like forking or executing some other  
-application)  
-*  
-  
-*need to execute a particular command (like ''chdir'', ''kill", ''brk'',  
-or ''signal'')  
-*  
-  
-  
-  
-  
-  
-| |  
-------->| System Call i | (Accessing Devices)  
-| | | | [[sys_read()] |  
-| ... | | | |  
-| system_call(i) |-------- | |  
-| [[read()] | | |  
-| ... | | |  
-| system_call(j) |-------- | |  
-| [[get_pid()] | | | |  
-| ... | ------->| System Call j | (Accessing kernel data structures)  
-| | | [[sys_getpid()]|  
-| |  
-USER MODE KERNEL MODE  
-Unix System Calls Working  
-  
-  
-  
-System calls are almost the only interface used by User Mode to talk with  
-low level resources (hardware). The only exception to this statement is when  
-a process uses ''ioperm'' system call. In this case a device can be accessed  
-directly by User Mode process (IRQs cannot be used).  
-  
-  
-NOTE: Not every ''C'' function is a system call, only some of them.  
-  
-  
-Below is a list of System Calls under Linux Kernel 2.4.17, from [[  
-arch/i386/kernel/entry.S ]  
-  
-  
-  
-  
-.long SYMBOL_NAME(sys_ni_syscall) /* 0 - old "setup()" system call*/  
-.long SYMBOL_NAME(sys_exit)  
-.long SYMBOL_NAME(sys_fork)  
-.long SYMBOL_NAME(sys_read)  
-.long SYMBOL_NAME(sys_write)  
-.long SYMBOL_NAME(sys_open) /* 5 */  
-.long SYMBOL_NAME(sys_close)  
-.long SYMBOL_NAME(sys_waitpid)  
-.long SYMBOL_NAME(sys_creat)  
-.long SYMBOL_NAME(sys_link)  
-.long SYMBOL_NAME(sys_unlink) /* 10 */  
-.long SYMBOL_NAME(sys_execve)  
-.long SYMBOL_NAME(sys_chdir)  
-.long SYMBOL_NAME(sys_time)  
-.long SYMBOL_NAME(sys_mknod)  
-.long SYMBOL_NAME(sys_chmod) /* 15 */  
-.long SYMBOL_NAME(sys_lchown16)  
-.long SYMBOL_NAME(sys_ni_syscall) /* old break syscall holder */  
-.long SYMBOL_NAME(sys_stat)  
-.long SYMBOL_NAME(sys_lseek)  
-.long SYMBOL_NAME(sys_getpid) /* 20 */  
-.long SYMBOL_NAME(sys_mount)  
-.long SYMBOL_NAME(sys_oldumount)  
-.long SYMBOL_NAME(sys_setuid16)  
-.long SYMBOL_NAME(sys_getuid16)  
-.long SYMBOL_NAME(sys_stime) /* 25 */  
-.long SYMBOL_NAME(sys_ptrace)  
-.long SYMBOL_NAME(sys_alarm)  
-.long SYMBOL_NAME(sys_fstat)  
-.long SYMBOL_NAME(sys_pause)  
-.long SYMBOL_NAME(sys_utime) /* 30 */  
-.long SYMBOL_NAME(sys_ni_syscall) /* old stty syscall holder */  
-.long SYMBOL_NAME(sys_ni_syscall) /* old gtty syscall holder */  
-.long SYMBOL_NAME(sys_access)  
-.long SYMBOL_NAME(sys_nice)  
-.long SYMBOL_NAME(sys_ni_syscall) /* 35 */ /* old ftime syscall holder */  
-.long SYMBOL_NAME(sys_sync)  
-.long SYMBOL_NAME(sys_kill)  
-.long SYMBOL_NAME(sys_rename)  
-.long SYMBOL_NAME(sys_mkdir)  
-.long SYMBOL_NAME(sys_rmdir) /* 40 */  
-.long SYMBOL_NAME(sys_dup)  
-.long SYMBOL_NAME(sys_pipe)  
-.long SYMBOL_NAME(sys_times)  
-.long SYMBOL_NAME(sys_ni_syscall) /* old prof syscall holder */  
-.long SYMBOL_NAME(sys_brk) /* 45 */  
-.long SYMBOL_NAME(sys_setgid16)  
-.long SYMBOL_NAME(sys_getgid16)  
-.long SYMBOL_NAME(sys_signal)  
-.long SYMBOL_NAME(sys_geteuid16)  
-.long SYMBOL_NAME(sys_getegid16) /* 50 */  
-.long SYMBOL_NAME(sys_acct)  
-.long SYMBOL_NAME(sys_umount) /* recycled never used phys() */  
-.long SYMBOL_NAME(sys_ni_syscall) /* old lock syscall holder */  
-.long SYMBOL_NAME(sys_ioctl)  
-.long SYMBOL_NAME(sys_fcntl) /* 55 */  
-.long SYMBOL_NAME(sys_ni_syscall) /* old mpx syscall holder */  
-.long SYMBOL_NAME(sys_setpgid)  
-.long SYMBOL_NAME(sys_ni_syscall) /* old ulimit syscall holder */  
-.long SYMBOL_NAME(sys_olduname)  
-.long SYMBOL_NAME(sys_umask) /* 60 */  
-.long SYMBOL_NAME(sys_chroot)  
-.long SYMBOL_NAME(sys_ustat)  
-.long SYMBOL_NAME(sys_dup2)  
-.long SYMBOL_NAME(sys_getppid)  
-.long SYMBOL_NAME(sys_getpgrp) /* 65 */  
-.long SYMBOL_NAME(sys_setsid)  
-.long SYMBOL_NAME(sys_sigaction)  
-.long SYMBOL_NAME(sys_sgetmask)  
-.long SYMBOL_NAME(sys_ssetmask)  
-.long SYMBOL_NAME(sys_setreuid16) /* 70 */  
-.long SYMBOL_NAME(sys_setregid16)  
-.long SYMBOL_NAME(sys_sigsuspend)  
-.long SYMBOL_NAME(sys_sigpending)  
-.long SYMBOL_NAME(sys_sethostname)  
-.long SYMBOL_NAME(sys_setrlimit) /* 75 */  
-.long SYMBOL_NAME(sys_old_getrlimit)  
-.long SYMBOL_NAME(sys_getrusage)  
-.long SYMBOL_NAME(sys_gettimeofday)  
-.long SYMBOL_NAME(sys_settimeofday)  
-.long SYMBOL_NAME(sys_getgroups16) /* 80 */  
-.long SYMBOL_NAME(sys_setgroups16)  
-.long SYMBOL_NAME(old_select)  
-.long SYMBOL_NAME(sys_symlink)  
-.long SYMBOL_NAME(sys_lstat)  
-.long SYMBOL_NAME(sys_readlink) /* 85 */  
-.long SYMBOL_NAME(sys_uselib)  
-.long SYMBOL_NAME(sys_swapon)  
-.long SYMBOL_NAME(sys_reboot)  
-.long SYMBOL_NAME(old_readdir)  
-.long SYMBOL_NAME(old_mmap) /* 90 */  
-.long SYMBOL_NAME(sys_munmap)  
-.long SYMBOL_NAME(sys_truncate)  
-.long SYMBOL_NAME(sys_ftruncate)  
-.long SYMBOL_NAME(sys_fchmod)  
-.long SYMBOL_NAME(sys_fchown16) /* 95 */  
-.long SYMBOL_NAME(sys_getpriority)  
-.long SYMBOL_NAME(sys_setpriority)  
-.long SYMBOL_NAME(sys_ni_syscall) /* old profil syscall holder */  
-.long SYMBOL_NAME(sys_statfs)  
-.long SYMBOL_NAME(sys_fstatfs) /* 100 */  
-.long SYMBOL_NAME(sys_ioperm)  
-.long SYMBOL_NAME(sys_socketcall)  
-.long SYMBOL_NAME(sys_syslog)  
-.long SYMBOL_NAME(sys_setitimer)  
-.long SYMBOL_NAME(sys_getitimer) /* 105 */  
-.long SYMBOL_NAME(sys_newstat)  
-.long SYMBOL_NAME(sys_newlstat)  
-.long SYMBOL_NAME(sys_newfstat)  
-.long SYMBOL_NAME(sys_uname)  
-.long SYMBOL_NAME(sys_iopl) /* 110 */  
-.long SYMBOL_NAME(sys_vhangup)  
-.long SYMBOL_NAME(sys_ni_syscall) /* old "idle" system call */  
-.long SYMBOL_NAME(sys_vm86old)  
-.long SYMBOL_NAME(sys_wait4)  
-.long SYMBOL_NAME(sys_swapoff) /* 115 */  
-.long SYMBOL_NAME(sys_sysinfo)  
-.long SYMBOL_NAME(sys_ipc)  
-.long SYMBOL_NAME(sys_fsync)  
-.long SYMBOL_NAME(sys_sigreturn)  
-.long SYMBOL_NAME(sys_clone) /* 120 */  
-.long SYMBOL_NAME(sys_setdomainname)  
-.long SYMBOL_NAME(sys_newuname)  
-.long SYMBOL_NAME(sys_modify_ldt)  
-.long SYMBOL_NAME(sys_adjtimex)  
-.long SYMBOL_NAME(sys_mprotect) /* 125 */  
-.long SYMBOL_NAME(sys_sigprocmask)  
-.long SYMBOL_NAME(sys_create_module)  
-.long SYMBOL_NAME(sys_init_module)  
-.long SYMBOL_NAME(sys_delete_module)  
-.long SYMBOL_NAME(sys_get_kernel_syms) /* 130 */  
-.long SYMBOL_NAME(sys_quotactl)  
-.long SYMBOL_NAME(sys_getpgid)  
-.long SYMBOL_NAME(sys_fchdir)  
-.long SYMBOL_NAME(sys_bdflush)  
-.long SYMBOL_NAME(sys_sysfs) /* 135 */  
-.long SYMBOL_NAME(sys_personality)  
-.long SYMBOL_NAME(sys_ni_syscall) /* for afs_syscall */  
-.long SYMBOL_NAME(sys_setfsuid16)  
-.long SYMBOL_NAME(sys_setfsgid16)  
-.long SYMBOL_NAME(sys_llseek) /* 140 */  
-.long SYMBOL_NAME(sys_getdents)  
-.long SYMBOL_NAME(sys_select)  
-.long SYMBOL_NAME(sys_flock)  
-.long SYMBOL_NAME(sys_msync)  
-.long SYMBOL_NAME(sys_readv) /* 145 */  
-.long SYMBOL_NAME(sys_writev)  
-.long SYMBOL_NAME(sys_getsid)  
-.long SYMBOL_NAME(sys_fdatasync)  
-.long SYMBOL_NAME(sys_sysctl)  
-.long SYMBOL_NAME(sys_mlock) /* 150 */  
-.long SYMBOL_NAME(sys_munlock)  
-.long SYMBOL_NAME(sys_mlockall)  
-.long SYMBOL_NAME(sys_munlockall)  
-.long SYMBOL_NAME(sys_sched_setparam)  
-.long SYMBOL_NAME(sys_sched_getparam) /* 155 */  
-.long SYMBOL_NAME(sys_sched_setscheduler)  
-.long SYMBOL_NAME(sys_sched_getscheduler)  
-.long SYMBOL_NAME(sys_sched_yield)  
-.long SYMBOL_NAME(sys_sched_get_priority_max)  
-.long SYMBOL_NAME(sys_sched_get_priority_min) /* 160 */  
-.long SYMBOL_NAME(sys_sched_rr_get_interval)  
-.long SYMBOL_NAME(sys_nanosleep)  
-.long SYMBOL_NAME(sys_mremap)  
-.long SYMBOL_NAME(sys_setresuid16)  
-.long SYMBOL_NAME(sys_getresuid16) /* 165 */  
-.long SYMBOL_NAME(sys_vm86)  
-.long SYMBOL_NAME(sys_query_module)  
-.long SYMBOL_NAME(sys_poll)  
-.long SYMBOL_NAME(sys_nfsservctl)  
-.long SYMBOL_NAME(sys_setresgid16) /* 170 */  
-.long SYMBOL_NAME(sys_getresgid16)  
-.long SYMBOL_NAME(sys_prctl)  
-.long SYMBOL_NAME(sys_rt_sigreturn)  
-.long SYMBOL_NAME(sys_rt_sigaction)  
-.long SYMBOL_NAME(sys_rt_sigprocmask) /* 175 */  
-.long SYMBOL_NAME(sys_rt_sigpending)  
-.long SYMBOL_NAME(sys_rt_sigtimedwait)  
-.long SYMBOL_NAME(sys_rt_sigqueueinfo)  
-.long SYMBOL_NAME(sys_rt_sigsuspend)  
-.long SYMBOL_NAME(sys_pread) /* 180 */  
-.long SYMBOL_NAME(sys_pwrite)  
-.long SYMBOL_NAME(sys_chown16)  
-.long SYMBOL_NAME(sys_getcwd)  
-.long SYMBOL_NAME(sys_capget)  
-.long SYMBOL_NAME(sys_capset) /* 185 */  
-.long SYMBOL_NAME(sys_sigaltstack)  
-.long SYMBOL_NAME(sys_sendfile)  
-.long SYMBOL_NAME(sys_ni_syscall) /* streams1 */  
-.long SYMBOL_NAME(sys_ni_syscall) /* streams2 */  
-.long SYMBOL_NAME(sys_vfork) /* 190 */  
-.long SYMBOL_NAME(sys_getrlimit)  
-.long SYMBOL_NAME(sys_mmap2)  
-.long SYMBOL_NAME(sys_truncate64)  
-.long SYMBOL_NAME(sys_ftruncate64)  
-.long SYMBOL_NAME(sys_stat64) /* 195 */  
-.long SYMBOL_NAME(sys_lstat64)  
-.long SYMBOL_NAME(sys_fstat64)  
-.long SYMBOL_NAME(sys_lchown)  
-.long SYMBOL_NAME(sys_getuid)  
-.long SYMBOL_NAME(sys_getgid) /* 200 */  
-.long SYMBOL_NAME(sys_geteuid)  
-.long SYMBOL_NAME(sys_getegid)  
-.long SYMBOL_NAME(sys_setreuid)  
-.long SYMBOL_NAME(sys_setregid)  
-.long SYMBOL_NAME(sys_getgroups) /* 205 */  
-.long SYMBOL_NAME(sys_setgroups)  
-.long SYMBOL_NAME(sys_fchown)  
-.long SYMBOL_NAME(sys_setresuid)  
-.long SYMBOL_NAME(sys_getresuid)  
-.long SYMBOL_NAME(sys_setresgid) /* 210 */  
-.long SYMBOL_NAME(sys_getresgid)  
-.long SYMBOL_NAME(sys_chown)  
-.long SYMBOL_NAME(sys_setuid)  
-.long SYMBOL_NAME(sys_setgid)  
-.long SYMBOL_NAME(sys_setfsuid) /* 215 */  
-.long SYMBOL_NAME(sys_setfsgid)  
-.long SYMBOL_NAME(sys_pivot_root)  
-.long SYMBOL_NAME(sys_mincore)  
-.long SYMBOL_NAME(sys_madvise)  
-.long SYMBOL_NAME(sys_getdents64) /* 220 */  
-.long SYMBOL_NAME(sys_fcntl64)  
-.long SYMBOL_NAME(sys_ni_syscall) /* reserved for TUX */  
-.long SYMBOL_NAME(sys_ni_syscall) /* Reserved for Security */  
-.long SYMBOL_NAME(sys_gettid)  
-.long SYMBOL_NAME(sys_readahead) /* 225 */  
-  
-  
-!IRQ Event  
-  
-  
-When an IRQ comes, the task that is running is interrupted in order to  
-service the IRQ Handler.  
-  
-  
-After the IRQ is handled, control returns backs exactly to point of interrupt,  
-like nothing happened.  
-  
-  
-  
-  
-Running Task  
-|-----------| (3)  
-NORMAL | | | [[break execution] IRQ Handler  
-EXECUTION (1)| | | ------------->|---------|  
-| \|/ | | | does |  
-IRQ (2)---->| .. |-----> | some |  
-| | |<----- | work |  
-BACK TO | | | | | ..(4). |  
-NORMAL (6)| \|/ | <-------------|_________|  
-EXECUTION |___________| [[return to code]  
-(5)  
-USER MODE KERNEL MODE  
-User->Kernel Mode Transition caused by IRQ event  
-  
-  
-  
-The numbered steps below refer to the sequence of events in the diagram  
-above:  
-  
-  
-  
-  
-  
-#Process is executing  
-#  
-  
-#IRQ comes while the task is running.  
-#  
-  
-#Task is interrupted to call an "Interrupt handler".  
-#  
-  
-#The "Interrupt handler" code is executed.  
-#  
-  
-#Control returns back to task user mode (as if nothing happened)  
-#  
-  
-#Process returns back to normal execution  
-#  
-  
-  
-  
-Special interest has the Timer IRQ, coming every TIMER ms to manage:  
-  
-  
-  
-  
-  
-#Alarms  
-#  
-  
-#System and task counters (used by schedule to decide when stop a process  
-or for accounting)  
-#  
-  
-#Multitasking based on wake up mechanism after TIMESLICE time.  
-#  
-  
-  
-!!3.4 Multitasking  
-  
-  
-!Mechanism  
-  
-  
-The key point of modern OSs is the "Task". The Task is an application running  
-in memory sharing all resources (included CPU and Memory) with other Tasks.  
-  
-  
-This "resource sharing" is managed by the "Multitasking Mechanism". The Multitasking  
-Mechanism switches from one task to another after a "timeslice" time. Users have  
-the "illusion" that they own all resources. We can also imagine a single user  
-scenario, where a user can have the "illusion" of running many tasks at the same  
-time.  
-  
-  
-To implement this multitasking, the task uses "the state" variable, which  
-can be:  
-  
-  
-  
-  
-  
-#READY, ready for execution  
-#  
-  
-#BLOCKED, waiting for a resource  
-#  
-  
-  
-  
-The task state is managed by its presence in a relative list: READY list  
-and BLOCKED list.  
-  
-!Task Switching  
-  
-  
-The movement from one task to another is called ''Task Switching''. many  
-computers have a hardware instruction which automatically performs this operation.  
-Task Switching occurs in the following cases:  
-  
-  
-  
-  
-  
-#After Timeslice ends: we need to schedule a "Ready for execution" task and  
-give it access.  
-#  
-  
-#When a Task has to wait for a device: we need to schedule a new task and  
-switch to it *  
-#  
-  
-  
-  
-* We schedule another task to prevent "Busy Form Waiting", which occurs  
-when we are waiting for a device instead performing other work.  
-  
-  
-Task Switching is managed by the "Schedule" entity.  
-  
-  
-  
-  
-Timer | |  
-IRQ | | Schedule  
-| | | ________________________  
-|----->| Task 1 |<------------------>|(1)Chooses a Ready Task |  
-| | | |(2)Task Switching |  
-| |___________| |________________________|  
-| | | /|\  
-| | | |  
-| | | |  
-| | | |  
-| | | |  
-|----->| Task 2 |<-------------------------------|  
-| | | |  
-| |___________| |  
-. . . . .  
-. . . . .  
-. . . . .  
-| | | |  
-| | | |  
------->| Task N |<--------------------------------  
-| |  
-|___________|  
-Task Switching based on !TimeSlice  
-  
-  
-  
-A typical Timeslice for Linux is about 10 ms.  
-  
-  
-  
-  
-| |  
-| | Resource _____________________________  
-| Task 1 |----------->|(1) Enqueue Resource request |  
-| | Access |(2) Mark Task as blocked |  
-| | |(3) Choose a Ready Task |  
-|___________| |(4) Task Switching |  
-|_____________________________|  
-|  
-|  
-| | |  
-| | |  
-| Task 2 |<-------------------------  
-| |  
-| |  
-|___________|  
-Task Switching based on Waiting for a Resource  
-  
-  
-!!3.5 Microkernel vs Monolithic OS  
-  
-  
-!Overview  
-  
-  
-Until now we viewed so called Monolithic OS, but there is also another  
-kind of OS: ''Microkernel''.  
-  
-  
-A Microkernel OS uses Tasks, not only for user mode processes, but also  
-as a real kernel manager, like Floppy-Task, HDD-Task, Net-Task and so on. Some  
-examples are Amoeba, and Mach.  
-  
-!PROs and CONTROs of Microkernel OS  
-  
-  
-PROS:  
-  
-  
-  
-  
-  
-*OS is simpler to maintain because each Task manages a single kind of operation.  
-So if you want to modify networking, you modify Net-Task (ideally, if it is  
-not needed a structural update).  
-*  
-  
-  
-  
-CONS:  
-  
-  
-  
-  
-  
-*Performances are worse than Monolithic OS, because you have to add 2*TASK_SWITCH  
-times (the first to enter the specific Task, the second to go out from it).  
-*  
-  
-  
-  
-My personal opinion is that, Microkernels are a good didactic example (like  
-Minix) but they are not ''optimal'', so not really suitable. Linux uses a few  
-Tasks, called "Kernel Threads" to implement a little microkernel structure (like  
-kswapd, which is used to retrieve memory pages from mass storage). In this  
-case there are no problems with perfomance because swapping is a very slow  
-job.  
-  
-!!3.6 Networking  
-  
-  
-!ISO OSI levels  
-  
-  
-Standard ISO-OSI describes a network architecture with the following levels:  
-  
-  
-  
-  
-  
-#Physical level (examples: PPP and Ethernet)  
-#  
-  
-#Data-link level (examples: PPP and Ethernet)  
-#  
-  
-#Network level (examples: IP, and X.25)  
-#  
-  
-#Transport level (examples: TCP, UDP)  
-#  
-  
-#Session level (SSL)  
-#  
-  
-#Presentation level (FTP binary-ascii coding)  
-#  
-  
-#Application level (applications like Netscape)  
-#  
-  
-  
-  
-The first 2 levels listed above are often implemented in hardware. Next  
-levels are in software (or firmware for routers).  
-  
-  
-Many protocols are used by an OS: one of these is TCP/IP (the most important  
-living on 3-4 levels).  
-  
-!What does the kernel?  
-  
-  
-The kernel doesn't know anything (only addresses) about first 2 levels  
-of ISO-OSI.  
-  
-  
-In RX it:  
-  
-  
-  
-  
-  
-#Manages handshake with low levels devices (like ethernet card or modem)  
-receiving "frames" from them.  
-#  
-  
-#Builds TCP/IP "packets" from "frames" (like Ethernet or PPP ones),  
-#  
-  
-#Convers ''packets'' in ''sockets'' passing them to the right application  
-(using port number) or  
-#  
-  
-#Forwards packets to the right queue  
-#  
-  
-  
-  
-  
-  
-frames packets sockets  
-NIC ---------> Kernel ----------> Application  
-| packets  
---------------> Forward  
-- RX -  
-  
-  
-  
-In TX stage it:  
-  
-  
-  
-  
-  
-#Converts sockets or  
-#  
-  
-#Queues datas into TCP/IP ''packets''  
-#  
-  
-#Splits ''packets" into "frames" (like Ethernet or PPP ones)  
-#  
-  
-#Sends ''frames'' using HW drivers  
-#  
-  
-  
-  
-  
-  
-sockets packets frames  
-Application ---------> Kernel ----------> NIC  
-packets /|\  
-Forward -------------------  
-- TX -  
-  
-  
-!!3.7 Virtual Memory  
-  
-  
-!Segmentation  
-  
-  
-Segmentation is the first method to solve memory allocation problems: it  
-allows you to compile source code without caring where the application will  
-be placed in memory. As a matter of fact, this feature helps applications developers  
-to develop in a independent fashion from the OS e also from the hardware.  
-  
-  
-  
-  
-| Stack |  
-| | |  
-| \|/ |  
-| Free |  
-| /|\ | Segment <---> Process  
-| | |  
-| Heap |  
-| Data uninitialized |  
-| Data initialized |  
-| Code |  
-|____________________|  
-Segment  
-  
-  
-  
-We can say that a segment is the logical entity of an application, or the  
-image of the application in memory.  
-  
-  
-When programming, we don't care where our data is put in memory, we only  
-care about the offset inside our segment (our application).  
-  
-  
-We use to assign a Segment to each Process and vice versa. In Linux this  
-is not true. Linux uses only 4 segments for either Kernel and all Processes.  
-  
-!Problems of Segmentation  
-  
-  
-  
-  
-____________________  
------>| |----->  
-| IN | Segment A | OUT  
-____________________ | |____________________|  
-| |____| | |  
-| Segment B | | Segment B |  
-| |____ | |  
-|____________________| | |____________________|  
-| | Segment C |  
-| |____________________|  
------>| Segment D |----->  
-IN |____________________| OUT  
-Segmentation problem  
-  
-  
-  
-In the diagram above, we want to get exit processes A, and D and enter  
-process B. As we can see there is enough space for B, but we cannot split it  
-in 2 pieces, so we CANNOT load it (memory out).  
-  
-  
-The reason this problem occurs is because pure segments are continuous  
-areas (because they are logical areas) and cannot be split.  
-  
-!Pagination  
-  
-  
-  
-  
-____________________  
-| Page 1 |  
-|____________________|  
-| Page 2 |  
-|____________________|  
-| .. | Segment <---> Process  
-|____________________|  
-| Page n |  
-|____________________|  
-| |  
-|____________________|  
-| |  
-|____________________|  
-Segment  
-  
-  
-  
-Pagination splits memory in "n" pieces, each one with a fixed  
-length.  
-  
-  
-A process may be loaded in one or more Pages. When memory is freed, all  
-pages are freed (see Segmentation Problem, before).  
-  
-  
-Pagination is also used for another important purpose, "Swapping". If a page  
-is not present in physical memory then it generates an EXCEPTION, that will  
-make the Kernel search for a new page in storage memory. This mechanism allow  
-OS to load more applications than the ones allowed by physical memory only.  
-  
-!Pagination Problem  
-  
-  
-  
-  
-____________________  
-Page X | Process Y |  
-|____________________|  
-| |  
-| WASTE |  
-| SPACE |  
-|____________________|  
-Pagination Problem  
-  
-  
-  
-In the diagram above, we can see what is wrong with the pagination policy:  
-when a Process Y loads into Page X, ALL memory space of the Page is allocated,  
-so the remaining space at the end of Page is wasted.  
-  
-!Segmentation and Pagination  
-  
-  
-How can we solve segmentation and pagination problems? Using either 2 policies.  
-  
-  
-  
-  
-| .. |  
-|____________________|  
------>| Page 1 |  
-| |____________________|  
-| | .. |  
-____________________ | |____________________|  
-| | |---->| Page 2 |  
-| Segment X | ----| |____________________|  
-| | | | .. |  
-|____________________| | |____________________|  
-| | .. |  
-| |____________________|  
-|---->| Page 3 |  
-|____________________|  
-| .. |  
-  
-  
-  
-Process X, identified by Segment X, is split in 3 pieces and each of one  
-is loaded in a page.  
-  
-  
-We do not have:  
-  
-  
-  
-  
-  
-#Segmentation problem: we allocate per Pages, so we also free Pages and  
-we manage free space in an optimized way.  
-#  
-  
-#Pagination problem: only last page wastes space, but we can decide to use  
-very small pages, for example 4096 bytes length (losing at maximum 4096*N_Tasks  
-bytes) and manage hierarchical paging (using 2 or 3 levels of paging)  
-#  
-  
-  
-  
-  
-  
-| | | |  
-| | Offset2 | Value |  
-| | /|\| |  
-Offset1 | |----- | | |  
-/|\ | | | | | |  
-| | | | \|/| |  
-| | | ------>| |  
-\|/ | | | |  
-Base Paging Address ---->| | | |  
-| ....... | | ....... |  
-| | | |  
-Hierarchical Paging  
-  
-----  
-  
-!!4. Linux Startup  
-  
-  
-We start the Linux kernel first from C code executed from ''startup_32:''  
-asm label:  
-  
-  
-  
-  
-|startup_32:  
-|start_kernel  
-|lock_kernel  
-|trap_init  
-|init_IRQ  
-|sched_init  
-|softirq_init  
-|time_init  
-|console_init  
-|#ifdef CONFIG_MODULES  
-|init_modules  
-|#endif  
-|kmem_cache_init  
-|sti  
-|calibrate_delay  
-|mem_init  
-|kmem_cache_sizes_init  
-|pgtable_cache_init  
-|fork_init  
-|proc_caches_init  
-|vfs_caches_init  
-|buffer_init  
-|page_cache_init  
-|signals_init  
-|#ifdef CONFIG_PROC_FS  
-|proc_root_init  
-|#endif  
-|#if defined(CONFIG_SYSVIPC)  
-|ipc_init  
-|#endif  
-|check_bugs  
-|smp_init  
-|rest_init  
-|kernel_thread  
-|unlock_kernel  
-|cpu_idle  
-  
-  
-  
-  
-  
-  
-*startup_32 [[arch/i386/kernel/head.S]  
-*  
-  
-*start_kernel [[init/main.c]  
-*  
-  
-*lock_kernel [[include/asm/smplock.h]  
-*  
-  
-*trap_init [[arch/i386/kernel/traps.c]  
-*  
-  
-*init_IRQ [[arch/i386/kernel/i8259.c]  
-*  
-  
-*sched_init [[kernel/sched.c]  
-*  
-  
-*softirq_init [[kernel/softirq.c]  
-*  
-  
-*time_init [[arch/i386/kernel/time.c]  
-*  
-  
-*console_init [[drivers/char/tty_io.c]  
-*  
-  
-*init_modules [[kernel/module.c]  
-*  
-  
-*kmem_cache_init [[mm/slab.c]  
-*  
-  
-*sti [[include/asm/system.h]  
-*  
-  
-*calibrate_delay [[init/main.c]  
-*  
-  
-*mem_init [[arch/i386/mm/init.c]  
-*  
-  
-*kmem_cache_sizes_init [[mm/slab.c]  
-*  
-  
-*pgtable_cache_init [[arch/i386/mm/init.c]  
-*  
-  
-*fork_init [[kernel/fork.c]  
-*  
-  
-*proc_caches_init  
-*  
-  
-*vfs_caches_init [[fs/dcache.c]  
-*  
-  
-*buffer_init [[fs/buffer.c]  
-*  
-  
-*page_cache_init [[mm/filemap.c]  
-*  
-  
-*signals_init [[kernel/signal.c]  
-*  
-  
-*proc_root_init [[fs/proc/root.c]  
-*  
-  
-*ipc_init [[ipc/util.c]  
-*  
-  
-*check_bugs [[include/asm/bugs.h]  
-*  
-  
-*smp_init [[init/main.c]  
-*  
-  
-*rest_init  
-*  
-  
-*kernel_thread [[arch/i386/kernel/process.c]  
-*  
-  
-*unlock_kernel [[include/asm/smplock.h]  
-*  
-  
-*cpu_idle [[arch/i386/kernel/process.c]  
-*  
-  
-  
-  
-The last function ''rest_init'' does the following:  
-  
-  
-  
-  
-  
-#launches the kernel thread ''init''  
-#  
-  
-#calls unlock_kernel  
-#  
-  
-#makes the kernel run cpu_idle routine, that will be the idle loop executing  
-when nothing is scheduled  
-#  
-  
-  
-  
-In fact the start_kernel procedure never ends. It will execute cpu_idle  
-routine endlessly.  
-  
-  
-Follows ''init'' description, which is the first Kernel Thread:  
-  
-  
-  
-  
-|init  
-|lock_kernel  
-|do_basic_setup  
-|mtrr_init  
-|sysctl_init  
-|pci_init  
-|sock_init  
-|start_context_thread  
-|do_init_calls  
-|(*call())-> kswapd_init  
-|prepare_namespace  
-|free_initmem  
-|unlock_kernel  
-|execve  
-  
-----  
-  
-!!5. Linux Peculiarities  
-  
-!!5.1 Overview  
-  
-  
-  
-Linux has some peculiarities that distinguish it from other OSs. These  
-peculiarities include:  
-  
-  
-  
-  
-  
-#Pagination only  
-#  
-  
-#Softirq  
-#  
-  
-#Kernel threads  
-#  
-  
-#Kernel modules  
-#  
-  
-#''Proc'' directory  
-#  
-  
-  
-!Flexibility Elements  
-  
-  
-Points 4 and 5 give system administrators an enormous flexibility on system  
-configuration from user mode allowing them to solve also critical kernel bugs  
-or specific problems without have to reboot the machine. For example, if you  
-needed to change something on a big server and you didn't want to make a reboot,  
-you could prepare the kernel to talk with a module, that you'll write.  
-  
-!!5.2 Pagination only  
-  
-  
-  
-Linux doesn't use segmentation to distinguish Tasks from each other; it  
-uses pagination. (Only 2 segments are used for all Tasks, CODE and DATA/STACK)  
-  
-  
-  
-  
-  
-We can also say that an interTask page fault never occurs, because each  
-Task uses a set of Page Tables that are different for each Task. These tables  
-cannot point to the same physical addresses.  
-  
-!Linux segments  
-  
-  
-Under the Linux kernel only 4 segments exist:  
-  
-  
-  
-  
-  
-#Kernel Code [[0x10]  
-#  
-  
-#Kernel Data / Stack [[0x18]  
-#  
-  
-#User Code [[0x23]  
-#  
-  
-#User Data / Stack [[0x2b]  
-#  
-  
-  
-  
-[[syntax is ''Purpose [[Segment]'']  
-  
-  
-Under Intel architecture, the segment registers used are:  
-  
-  
-  
-  
-  
-*CS for Code Segment  
-*  
-  
-*DS for Data Segment  
-*  
-  
-*SS for Stack Segment  
-*  
-  
-*ES for Alternative Segment (for example used to make a memory copy between  
-2 different segments)  
-*  
-  
-  
-  
-So, every Task uses 0x23 for code and 0x2b for data/stack.  
-  
-!Linux pagination  
-  
-  
-Under Linux 3 levels of pages are used, depending on the architecture.  
-Under Intel only 2 levels are supported. Linux also supports Copy on Write  
-mechanisms (please see Cap.10 for more information).  
-  
-!Why don't interTasks address conflicts exist?  
-  
-  
-The answer is very very simple: interTask address conflicts cannot exist  
-because they are impossible. Linear -> physical mapping is done by "Pagination",  
-so it just needs to assign physical pages in an univocal fashion.  
-  
-!Do we need to defragment memory?  
-  
-  
-No. Page assigning is a dynamic process. We need a page only when a Task  
-asks for it, so we choose it from free memory paging in an ordered fashion.  
-When we want to release the page, we only have to add it to the free pages  
-list.  
-  
-!What about Kernel Pages?  
-  
-  
-Kernel pages have a problem: they can be allocated in a dynamic fashion  
-but we cannot have a guarantee that they are in contiguous area allocation,  
-because linear kernel space is equivalent to physical kernel space.  
-  
-  
-For Code Segment there is no problem. Boot code is allocated at boot time  
-(so we have a fixed amount of memory to allocate), and on modules we only have  
-to allocate a memory area which could contain module code.  
-  
-  
-The real problem is the stack segment because each Task uses some kernel  
-stack pages. Stack segments must be contiguous (according to stack definition),  
-so we have to establish a maximum limit for each Task's stack dimension. If  
-we exceed this limit bad things happen. We overwrite kernel mode process data  
-structures.  
-  
-  
-The structure of the Kernel helps us, because kernel functions are never:  
-  
-  
-  
-  
-  
-*recursive  
-*  
-  
-*intercalling more than N times.  
-*  
-  
-  
-  
-Once we know N, and we know the average of static variables for all kernel  
-functions, we can estimate a stack limit.  
-  
-  
-If you want to try the problem out, you can create a module with a function  
-inside calling itself many times. After a fixed number of times, the kernel  
-module will hang because of a page fault exception handler (typically write  
-to a read-only page).  
-  
-!!5.3 Softirq  
-  
-  
-  
-When an IRQ comes, task switching is deferred until later to get better  
-performance. Some Task jobs (that could have to be done just after the IRQ  
-and that could take much CPU in interrupt time, like building up a TCP/IP packet)  
-are queued and will be done at scheduling time (once a time-slice will end).  
-  
-  
-In recent kernels (2.4.x) the softirq mechanisms are given to a kernel_thread:  
-''ksoftirqd_CPUn''. n stands for the number of CPU executing kernel_thread  
-(in a monoprocessor system ''ksoftirqd_CPU0'' uses PID 3).  
-  
-!Preparing Softirq  
-  
-!Enabling Softirq  
-  
-  
-''cpu_raise_softirq'' is a routine that will wake_up ''ksoftirqd_CPU0''  
-kernel thread, to let it manage the enqueued job.  
-  
-  
-  
-  
-|cpu_raise_softirq  
-|__cpu_raise_softirq  
-|wakeup_softirqd  
-|wake_up_process  
-  
-  
-  
-  
-  
-  
-*cpu_raise_softirq [[kernel/softirq.c]  
-*  
-  
-*__cpu_raise_softirq [[include/linux/interrupt.h]  
-*  
-  
-*wakeup_softirq [[kernel/softirq.c]  
-*  
-  
-*wake_up_process [[kernel/sched.c]  
-*  
-  
-  
-  
-''__cpu_raise_softirq'' routine will set right bit in the vector describing  
-softirq pending.  
-  
-  
-''wakeup_softirq'' uses ''wakeup_process'' to wake up ''ksoftirqd_CPU0''  
-kernel thread.  
-  
-!Executing Softirq  
-  
-  
-TODO: describing data structures involved in softirq mechanism.  
-  
-  
-When kernel thread ''ksoftirqd_CPU0'' has been woken up, it will execute  
-queued jobs  
-  
-  
-The code of ''ksoftirqd_CPU0'' is (main endless loop):  
-  
-  
-  
-  
-for (;;) {  
-if (!softirq_pending(cpu))  
-schedule();  
-__set_current_state(TASK_RUNNING);  
-while (softirq_pending(cpu)) {  
-do_softirq();  
-if (current->need_resched)  
-schedule  
-}  
-__set_current_state(TASK_INTERRUPTIBLE)  
-}  
-  
-  
-  
-  
-  
-  
-*ksoftirqd [[kernel/softirq.c]  
-*  
-  
-  
-  
-  
-  
-  
-  
-  
-!!5.4 Kernel Threads  
-  
-  
-  
-Even though Linux is a monolithic OS, a few ''kernel threads'' exist to  
-do housekeeping work.  
-  
-  
-These Tasks don't utilize USER memory; they share KERNEL memory. They also  
-operate at the highest privilege (RING 0 on a i386 architecture) like any other  
-kernel mode piece of code.  
-  
-  
-Kernel threads are created by ''kernel_thread [[arch/i386/kernel/process]''  
-function, which calls ''clone'' [[arch/i386/kernel/process.c] system  
-call from assembler (which is a ''fork'' like system call):  
-  
-  
-  
-  
-int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)  
-{  
-long retval, d0;  
-__asm__ __volatile__(  
-"movl %%esp,%%esi\n\t"  
-"int $0x80\n\t" /* Linux/i386 system call */  
-"cmpl %%esp,%%esi\n\t" /* child or parent? */  
-"je 1f\n\t" /* parent - jump */  
-/* Load the argument into eax, and push it. That way, it does  
-* not matter whether the called function is compiled with  
-* -mregparm or not. */  
-"movl %4,%%eax\n\t"  
-"pushl %%eax\n\t"  
-"call *%5\n\t" /* call fn */  
-"movl %3,%\n\t" /* exit */  
-"int $0x80\n"  
-"1:\t"  
-:"=&a" (retval), "=&S" (d0)  
-:"" (__NR_clone), "i" (__NR_exit),  
-"r" (arg), "r" (fn),  
-"b" (flags | CLONE_VM)  
-: "memory");  
-return retval;  
-}  
-  
-  
-  
-Once called, we have a new Task (usually with very low PID number, like  
-2,3, etc.) waiting for a very slow resource, like swap or usb event. A very  
-slow resource is used because we would have a task switching overhead otherwise.  
-  
-  
-Below is a list of most common kernel threads (from ''ps x'' command):  
-  
-  
-  
-  
-PID COMMAND  
-1 init  
-2 keventd  
-3 kswapd  
-4 kreclaimd  
-5 bdflush  
-6 kupdated  
-7 kacpid  
-67 khubd  
-  
-  
-  
-'init' kernel thread is the first process created, at boot time. It will  
-call all other User Mode Tasks (from file /etc/inittab) like console daemons,  
-tty daemons and network daemons (''rc'' scripts).  
-  
-!Example of Kernel Threads: kswapd [[mm/vmscan.c].  
-  
-  
-''kswapd'' is created by ''clone() [[arch/i386/kernel/process.c]''  
-  
-  
-Initialisation routines:  
-  
-  
-  
-  
-|do_initcalls  
-|kswapd_init  
-|kernel_thread  
-|syscall fork (in assembler)  
-  
-  
-  
-do_initcalls [[init/main.c]  
-  
-  
-kswapd_init [[mm/vmscan.c]  
-  
-  
-kernel_thread [[arch/i386/kernel/process.c]  
-  
-!!5.5 Kernel Modules  
-  
-  
-!Overview  
-  
-  
-Linux Kernel modules are pieces of code (examples: fs, net, and hw driver)  
-running in kernel mode that you can add at runtime.  
-  
-  
-The Linux core cannot be modularized: scheduling and interrupt management  
-or core network, and so on.  
-  
-  
-Under "/lib/modules/KERNEL_VERSION/" you can find all the modules installed  
-on your system.  
-  
-!Module loading and unloading  
-  
-  
-To load a module, type the following:  
-  
-  
-  
-  
-insmod MODULE_NAME parameters  
-example: insmod ne io=0x300 irq=9  
-  
-  
-  
-NOTE: You can use modprobe in place of insmod if you want the kernel automatically  
-search some parameter (for example when using PCI driver, or if you have specified  
-parameter under /etc/conf.modules file).  
-  
-  
-To unload a module, type the following:  
-  
-  
-  
-  
-rmmod MODULE_NAME  
-  
-  
-!Module definition  
-  
-  
-A module always contains:  
-  
-  
-  
-  
-  
-#"init_module" function, executed at insmod (or modprobe) command  
-#  
-  
-#"cleanup_module" function, executed at rmmod command  
-#  
-  
-  
-  
-If these functions are not in the module, you need to add 2 macros to specify  
-what functions will act as init and exit module:  
-  
-  
-  
-  
-  
-#module_init(FUNCTION_NAME)  
-#  
-  
-#module_exit(FUNCTION_NAME)  
-#  
-  
-  
-  
-NOTE: a module can "see" a kernel variable only if it has been exported (with  
-macro EXPORT_SYMBOL).  
-  
-!A useful trick for adding flexibility to your kernel  
-  
-  
-  
-  
-// kernel sources side  
-void (*foo_function_pointer)(void *);  
-if (foo_function_pointer)  
-(foo_function_pointer)(parameter);  
-// module side  
-extern void (*foo_function_pointer)(void *);  
-void my_function(void *parameter) {  
-//My code  
-}  
-int init_module() {  
-foo_function_pointer = &my_function;  
-}  
-int cleanup_module() {  
-foo_function_pointer = NULL;  
-}  
-  
-  
-  
-This simple trick allows you to have very high flexibility in your Kernel,  
-because only when you load the module you'll make "my_function" routine execute.  
-This routine will do everything you want to do: for example ''rshaper'' module,  
-which controls bandwidth input traffic from the network, works in this kind  
-of matter.  
-  
-  
-Notice that the whole module mechanism is possible thanks to some global  
-variables exported to modules, such as head list (allowing you to extend the  
-list as much as you want). Typical examples are fs, generic devices (char,  
-block, net, telephony). You have to prepare the kernel to accept your new module;  
-in some cases you have to create an infrastructure (like telephony one, that  
-was recently created) to be as standard as possible.  
-  
-!!5.6 Proc directory  
-  
-  
-  
-Proc fs is located in the /proc directory, which is a special directory  
-allowing you to talk directly with kernel.  
-  
-  
-Linux uses ''proc'' directory to support direct kernel communications:  
-this is necessary in many cases, for example when you want see main processes  
-data structures or enable ''proxy-arp'' feature on one interface and not in  
-others, you want to change max number of threads, or if you want to debug some  
-bus state, like ISA or PCI, to know what cards are installed and what I/O addresses  
-and IRQs are assigned to them.  
-  
-  
-  
-  
-|-- bus  
-| |-- pci  
-| | |-- 00  
-| | | |-- 00.  
-| | | |-- 01.  
-| | | |-- 07.  
-| | | |-- 07.1  
-| | | |-- 07.2  
-| | | |-- 07.3  
-| | | |-- 07.4  
-| | | |-- 07.5  
-| | | |-- 09.  
-| | | |-- 0a.  
-| | | `-- 0f.  
-| | |-- 01  
-| | | `-- 00.  
-| | `-- devices  
-| `-- usb  
-|-- cmdline  
-|-- cpuinfo  
-|-- devices  
-|-- dma  
-|-- dri  
-| `--  
-| |-- bufs  
-| |-- clients  
-| |-- mem  
-| |-- name  
-| |-- queues  
-| |-- vm  
-| `-- vma  
-|-- driver  
-|-- execdomains  
-|-- filesystems  
-|-- fs  
-|-- ide  
-| |-- drivers  
-| |-- hda -> ide0/hda  
-| |-- hdc -> ide1/hdc  
-| |-- ide0  
-| | |-- channel  
-| | |-- config  
-| | |-- hda  
-| | | |-- cache  
-| | | |-- capacity  
-| | | |-- driver  
-| | | |-- geometry  
-| | | |-- identify  
-| | | |-- media  
-| | | |-- model  
-| | | |-- settings  
-| | | |-- smart_thresholds  
-| | | `-- smart_values  
-| | |-- mate  
-| | `-- model  
-| |-- ide1  
-| | |-- channel  
-| | |-- config  
-| | |-- hdc  
-| | | |-- capacity  
-| | | |-- driver  
-| | | |-- identify  
-| | | |-- media  
-| | | |-- model  
-| | | `-- settings  
-| | |-- mate  
-| | `-- model  
-| `-- via  
-|-- interrupts  
-|-- iomem  
-|-- ioports  
-|-- irq  
-| |--  
-| |-- 1  
-| |-- 10  
-| |-- 11  
-| |-- 12  
-| |-- 13  
-| |-- 14  
-| |-- 15  
-| |-- 2  
-| |-- 3  
-| |-- 4  
-| |-- 5  
-| |-- 6  
-| |-- 7  
-| |-- 8  
-| |-- 9  
-| `-- prof_cpu_mask  
-|-- kcore  
-|-- kmsg  
-|-- ksyms  
-|-- loadavg  
-|-- locks  
-|-- meminfo  
-|-- misc  
-|-- modules  
-|-- mounts  
-|-- mtrr  
-|-- net  
-| |-- arp  
-| |-- dev  
-| |-- dev_mcast  
-| |-- ip_fwchains  
-| |-- ip_fwnames  
-| |-- ip_masquerade  
-| |-- netlink  
-| |-- netstat  
-| |-- packet  
-| |-- psched  
-| |-- raw  
-| |-- route  
-| |-- rt_acct  
-| |-- rt_cache  
-| |-- rt_cache_stat  
-| |-- snmp  
-| |-- sockstat  
-| |-- softnet_stat  
-| |-- tcp  
-| |-- udp  
-| |-- unix  
-| `-- wireless  
-|-- partitions  
-|-- pci  
-|-- scsi  
-| |-- ide-scsi  
-| | `--  
-| `-- scsi  
-|-- self -> 2069  
-|-- slabinfo  
-|-- stat  
-|-- swaps  
-|-- sys  
-| |-- abi  
-| | |-- defhandler_coff  
-| | |-- defhandler_elf  
-| | |-- defhandler_lcall7  
-| | |-- defhandler_libcso  
-| | |-- fake_utsname  
-| | `-- trace  
-| |-- debug  
-| |-- dev  
-| | |-- cdrom  
-| | | |-- autoclose  
-| | | |-- autoeject  
-| | | |-- check_media  
-| | | |-- debug  
-| | | |-- info  
-| | | `-- lock  
-| | `-- parport  
-| | |-- default  
-| | | |-- spintime  
-| | | `-- timeslice  
-| | `-- parport0  
-| | |-- autoprobe  
-| | |-- autoprobe0  
-| | |-- autoprobe1  
-| | |-- autoprobe2  
-| | |-- autoprobe3  
-| | |-- base-addr  
-| | |-- devices  
-| | | |-- active  
-| | | `-- lp  
-| | | `-- timeslice  
-| | |-- dma  
-| | |-- irq  
-| | |-- modes  
-| | `-- spintime  
-| |-- fs  
-| | |-- binfmt_misc  
-| | |-- dentry-state  
-| | |-- dir-notify-enable  
-| | |-- dquot-nr  
-| | |-- file-max  
-| | |-- file-nr  
-| | |-- inode-nr  
-| | |-- inode-state  
-| | |-- jbd-debug  
-| | |-- lease-break-time  
-| | |-- leases-enable  
-| | |-- overflowgid  
-| | `-- overflowuid  
-| |-- kernel  
-| | |-- acct  
-| | |-- cad_pid  
-| | |-- cap-bound  
-| | |-- core_uses_pid  
-| | |-- ctrl-alt-del  
-| | |-- domainname  
-| | |-- hostname  
-| | |-- modprobe  
-| | |-- msgmax  
-| | |-- msgmnb  
-| | |-- msgmni  
-| | |-- osrelease  
-| | |-- ostype  
-| | |-- overflowgid  
-| | |-- overflowuid  
-| | |-- panic  
-| | |-- printk  
-| | |-- random  
-| | | |-- boot_id  
-| | | |-- entropy_avail  
-| | | |-- poolsize  
-| | | |-- read_wakeup_threshold  
-| | | |-- uuid  
-| | | `-- write_wakeup_threshold  
-| | |-- rtsig-max  
-| | |-- rtsig-nr  
-| | |-- sem  
-| | |-- shmall  
-| | |-- shmmax  
-| | |-- shmmni  
-| | |-- sysrq  
-| | |-- tainted  
-| | |-- threads-max  
-| | `-- version  
-| |-- net  
-| | |-- 802  
-| | |-- core  
-| | | |-- hot_list_length  
-| | | |-- lo_cong  
-| | | |-- message_burst  
-| | | |-- message_cost  
-| | | |-- mod_cong  
-| | | |-- netdev_max_backlog  
-| | | |-- no_cong  
-| | | |-- no_cong_thresh  
-| | | |-- optmem_max  
-| | | |-- rmem_default  
-| | | |-- rmem_max  
-| | | |-- wmem_default  
-| | | `-- wmem_max  
-| | |-- ethernet  
-| | |-- ipv4  
-| | | |-- conf  
-| | | | |-- all  
-| | | | | |-- accept_redirects  
-| | | | | |-- accept_source_route  
-| | | | | |-- arp_filter  
-| | | | | |-- bootp_relay  
-| | | | | |-- forwarding  
-| | | | | |-- log_martians  
-| | | | | |-- mc_forwarding  
-| | | | | |-- proxy_arp  
-| | | | | |-- rp_filter  
-| | | | | |-- secure_redirects  
-| | | | | |-- send_redirects  
-| | | | | |-- shared_media  
-| | | | | `-- tag  
-| | | | |-- default  
-| | | | | |-- accept_redirects  
-| | | | | |-- accept_source_route  
-| | | | | |-- arp_filter  
-| | | | | |-- bootp_relay  
-| | | | | |-- forwarding  
-| | | | | |-- log_martians  
-| | | | | |-- mc_forwarding  
-| | | | | |-- proxy_arp  
-| | | | | |-- rp_filter  
-| | | | | |-- secure_redirects  
-| | | | | |-- send_redirects  
-| | | | | |-- shared_media  
-| | | | | `-- tag  
-| | | | |-- eth0  
-| | | | | |-- accept_redirects  
-| | | | | |-- accept_source_route  
-| | | | | |-- arp_filter  
-| | | | | |-- bootp_relay  
-| | | | | |-- forwarding  
-| | | | | |-- log_martians  
-| | | | | |-- mc_forwarding  
-| | | | | |-- proxy_arp  
-| | | | | |-- rp_filter  
-| | | | | |-- secure_redirects  
-| | | | | |-- send_redirects  
-| | | | | |-- shared_media  
-| | | | | `-- tag  
-| | | | |-- eth1  
-| | | | | |-- accept_redirects  
-| | | | | |-- accept_source_route  
-| | | | | |-- arp_filter  
-| | | | | |-- bootp_relay  
-| | | | | |-- forwarding  
-| | | | | |-- log_martians  
-| | | | | |-- mc_forwarding  
-| | | | | |-- proxy_arp  
-| | | | | |-- rp_filter  
-| | | | | |-- secure_redirects  
-| | | | | |-- send_redirects  
-| | | | | |-- shared_media  
-| | | | | `-- tag  
-| | | | `-- lo  
-| | | | |-- accept_redirects  
-| | | | |-- accept_source_route  
-| | | | |-- arp_filter  
-| | | | |-- bootp_relay  
-| | | | |-- forwarding  
-| | | | |-- log_martians  
-| | | | |-- mc_forwarding  
-| | | | |-- proxy_arp  
-| | | | |-- rp_filter  
-| | | | |-- secure_redirects  
-| | | | |-- send_redirects  
-| | | | |-- shared_media  
-| | | | `-- tag  
-| | | |-- icmp_echo_ignore_all  
-| | | |-- icmp_echo_ignore_broadcasts  
-| | | |-- icmp_ignore_bogus_error_responses  
-| | | |-- icmp_ratelimit  
-| | | |-- icmp_ratemask  
-| | | |-- inet_peer_gc_maxtime  
-| | | |-- inet_peer_gc_mintime  
-| | | |-- inet_peer_maxttl  
-| | | |-- inet_peer_minttl  
-| | | |-- inet_peer_threshold  
-| | | |-- ip_autoconfig  
-| | | |-- ip_conntrack_max  
-| | | |-- ip_default_ttl  
-| | | |-- ip_dynaddr  
-| | | |-- ip_forward  
-| | | |-- ip_local_port_range  
-| | | |-- ip_no_pmtu_disc  
-| | | |-- ip_nonlocal_bind  
-| | | |-- ipfrag_high_thresh  
-| | | |-- ipfrag_low_thresh  
-| | | |-- ipfrag_time  
-| | | |-- neigh  
-| | | | |-- default  
-| | | | | |-- anycast_delay  
-| | | | | |-- app_solicit  
-| | | | | |-- base_reachable_time  
-| | | | | |-- delay_first_probe_time  
-| | | | | |-- gc_interval  
-| | | | | |-- gc_stale_time  
-| | | | | |-- gc_thresh1  
-| | | | | |-- gc_thresh2  
-| | | | | |-- gc_thresh3  
-| | | | | |-- locktime  
-| | | | | |-- mcast_solicit  
-| | | | | |-- proxy_delay  
-| | | | | |-- proxy_qlen  
-| | | | | |-- retrans_time  
-| | | | | |-- ucast_solicit  
-| | | | | `-- unres_qlen  
-| | | | |-- eth0  
-| | | | | |-- anycast_delay  
-| | | | | |-- app_solicit  
-| | | | | |-- base_reachable_time  
-| | | | | |-- delay_first_probe_time  
-| | | | | |-- gc_stale_time  
-| | | | | |-- locktime  
-| | | | | |-- mcast_solicit  
-| | | | | |-- proxy_delay  
-| | | | | |-- proxy_qlen  
-| | | | | |-- retrans_time  
-| | | | | |-- ucast_solicit  
-| | | | | `-- unres_qlen  
-| | | | |-- eth1  
-| | | | | |-- anycast_delay  
-| | | | | |-- app_solicit  
-| | | | | |-- base_reachable_time  
-| | | | | |-- delay_first_probe_time  
-| | | | | |-- gc_stale_time  
-| | | | | |-- locktime  
-| | | | | |-- mcast_solicit  
-| | | | | |-- proxy_delay  
-| | | | | |-- proxy_qlen  
-| | | | | |-- retrans_time  
-| | | | | |-- ucast_solicit  
-| | | | | `-- unres_qlen  
-| | | | `-- lo  
-| | | | |-- anycast_delay  
-| | | | |-- app_solicit  
-| | | | |-- base_reachable_time  
-| | | | |-- delay_first_probe_time  
-| | | | |-- gc_stale_time  
-| | | | |-- locktime  
-| | | | |-- mcast_solicit  
-| | | | |-- proxy_delay  
-| | | | |-- proxy_qlen  
-| | | | |-- retrans_time  
-| | | | |-- ucast_solicit  
-| | | | `-- unres_qlen  
-| | | |-- route  
-| | | | |-- error_burst  
-| | | | |-- error_cost  
-| | | | |-- flush  
-| | | | |-- gc_elasticity  
-| | | | |-- gc_interval  
-| | | | |-- gc_min_interval  
-| | | | |-- gc_thresh  
-| | | | |-- gc_timeout  
-| | | | |-- max_delay  
-| | | | |-- max_size  
-| | | | |-- min_adv_mss  
-| | | | |-- min_delay  
-| | | | |-- min_pmtu  
-| | | | |-- mtu_expires  
-| | | | |-- redirect_load  
-| | | | |-- redirect_number  
-| | | | `-- redirect_silence  
-| | | |-- tcp_abort_on_overflow  
-| | | |-- tcp_adv_win_scale  
-| | | |-- tcp_app_win  
-| | | |-- tcp_dsack  
-| | | |-- tcp_ecn  
-| | | |-- tcp_fack  
-| | | |-- tcp_fin_timeout  
-| | | |-- tcp_keepalive_intvl  
-| | | |-- tcp_keepalive_probes  
-| | | |-- tcp_keepalive_time  
-| | | |-- tcp_max_orphans  
-| | | |-- tcp_max_syn_backlog  
-| | | |-- tcp_max_tw_buckets  
-| | | |-- tcp_mem  
-| | | |-- tcp_orphan_retries  
-| | | |-- tcp_reordering  
-| | | |-- tcp_retrans_collapse  
-| | | |-- tcp_retries1  
-| | | |-- tcp_retries2  
-| | | |-- tcp_rfc1337  
-| | | |-- tcp_rmem  
-| | | |-- tcp_sack  
-| | | |-- tcp_stdurg  
-| | | |-- tcp_syn_retries  
-| | | |-- tcp_synack_retries  
-| | | |-- tcp_syncookies  
-| | | |-- tcp_timestamps  
-| | | |-- tcp_tw_recycle  
-| | | |-- tcp_window_scaling  
-| | | `-- tcp_wmem  
-| | `-- unix  
-| | `-- max_dgram_qlen  
-| |-- proc  
-| `-- vm  
-| |-- bdflush  
-| |-- kswapd  
-| |-- max-readahead  
-| |-- min-readahead  
-| |-- overcommit_memory  
-| |-- page-cluster  
-| `-- pagetable_cache  
-|-- sysvipc  
-| |-- msg  
-| |-- sem  
-| `-- shm  
-|-- tty  
-| |-- driver  
-| | `-- serial  
-| |-- drivers  
-| |-- ldisc  
-| `-- ldiscs  
-|-- uptime  
-`-- version  
-  
-  
-  
-In the directory there are also all the tasks using PID as file names (you  
-have access to all Task information, like path of binary file, memory used,  
-and so on).  
-  
-  
-The interesting point is that you cannot only see kernel values (for example,  
-see info about any task or about network options enabled of your TCP/IP stack)  
-but you are also able to modify some of it, typically that ones under /proc/sys  
-directory:  
-  
-  
-  
-  
-/proc/sys/  
-acpi  
-dev  
-debug  
-fs  
-proc  
-net  
-vm  
-kernel  
-  
-  
-!/proc/sys/kernel  
-  
-  
-Below are very important and well-know kernel values, ready to be modified:  
-  
-  
-  
-  
-overflowgid  
-overflowuid  
-random  
-threads-max // Max number of threads, typically 16384  
-sysrq // kernel hack: you can view istant register values and more  
-sem  
-msgmnb  
-msgmni  
-msgmax  
-shmmni  
-shmall  
-shmmax  
-rtsig-max  
-rtsig-nr  
-modprobe // modprobe file location  
-printk  
-ctrl-alt-del  
-cap-bound  
-panic  
-domainname // domain name of your Linux box  
-hostname // host name of your Linux box  
-version // date info about kernel compilation  
-osrelease // kernel version (i.e. 2.4.5)  
-ostype // Linux!  
-  
-  
-!/proc/sys/net  
-  
-  
-This can be considered the most useful proc subdirectory. It allows you  
-to change very important settings for your network kernel configuration.  
-  
-  
-  
-  
-core  
-ipv4  
-ipv6  
-unix  
-ethernet  
-802  
-  
-  
-!/proc/sys/net/core  
-  
-  
-Listed below are general net settings, like "netdev_max_backlog" (typically  
-300), the length of all your network packets. This value can limit your network  
-bandwidth when receiving packets, Linux has to wait up to scheduling time to  
-flush buffers (due to bottom half mechanism), about 1000/HZ ms  
-  
-  
-  
-  
-300 * 100 = 30 000  
-packets HZ(Timeslice freq) packets/s  
-30 000 * 1000 = 30 M  
-packets average (Bytes/packet) throughput Bytes/s  
-  
-  
-  
-If you want to get higher throughput, you need to increase netdev_max_backlog,  
-by typing:  
-  
-  
-  
-  
-echo 4000 > /proc/sys/net/core/netdev_max_backlog  
-  
-  
-  
-Note: Warning for some HZ values: under some architecture (like alpha or  
-arm-tbox) it is 1000, so you can have 300 MBytes/s of average throughput.  
-  
-!/proc/sys/net/ipv4  
-  
-  
-"ip_forward", enables or disables ip forwarding in your Linux box. This is  
-a generic setting for all devices, you can specify each device you choose.  
-  
-!/proc/sys/net/ipv4/conf/interface  
-  
-  
-I think this is the most useful /proc entry, because it allows you to change  
-some net settings to support wireless networks (see  
-Wireless-HOWTO for more information).  
-  
-  
-Here are some examples of when you could use this setting:  
-  
-  
-  
-  
-  
-*"forwarding", to enable ip forwarding for your interface  
-*  
-  
-*"proxy_arp", to enable proxy arp feature. For more see Proxy arp HOWTO under  
-Linux Documentation Project and  
-Wireless-HOWTO for proxy arp use in Wireless networks.  
-*  
-  
-*"send_redirects" to avoid interface to send ICMP_REDIRECT (as before, see  
-Wireless-HOWTO for more).  
-*  
-  
-----  
-  
-!!6. Linux Multitasking  
-  
-!!6.1 Overview  
-  
-  
-  
-This section will analyze data structures--the mechanism used to manage  
-multitasking environment under Linux.  
-  
-!Task States  
-  
-  
-A Linux Task can be one of the following states (according to [[include/linux.h]):  
-  
-  
-  
-  
-  
-#TASK_RUNNING, it means that it is in the "Ready List"  
-#  
-  
-#TASK_INTERRUPTIBLE, task waiting for a signal or a resource (sleeping)  
-#  
-  
-#TASK_UNINTERRUPTIBLE, task waiting for a resource (sleeping), it is in  
-same "Wait Queue"  
-#  
-  
-#TASK_ZOMBIE, task child without father  
-#  
-  
-#TASK_STOPPED, task being debugged  
-#  
-  
-  
-!Graphical Interaction  
-  
-  
-  
-  
-______________ CPU Available ______________  
-| | ----------------> | |  
-| TASK_RUNNING | | Real Running |  
-|______________| <---------------- |______________|  
-CPU Busy  
-| /|\  
-Waiting for | | Resource  
-Resource | | Available  
-\|/ |  
-______________________  
-| |  
-| TASK_INTERRUPTIBLE / |  
-| TASK-UNINTERRUPTIBLE |  
-|______________________|  
-Main Multitasking Flow  
-  
-  
-!!6.2 Timeslice  
-  
-  
-!PIT 8253 Programming  
-  
-  
-Each 10 ms (depending on HZ value) an IRQ0 comes, which helps us in a multitasking  
-environment. This signal comes from PIC 8259 (in arch 386+) which is connected  
-to PIT 8253 with a clock of 1.19318 MHz.  
-  
-  
-  
-  
-_____ ______ ______  
-| CPU |<------| 8259 |------| 8253 |  
-|_____| IRQ0 |______| |___/|\|  
-|_____ CLK 1.193.180 MHz  
-// From include/asm/param.h  
-#ifndef HZ  
-#define HZ 100  
-#endif  
-// From include/asm/timex.h  
-#define CLOCK_TICK_RATE 1193180 /* Underlying HZ */  
-// From include/linux/timex.h  
-#define LATCH ((CLOCK_TICK_RATE + HZ/2) / HZ) /* For divider */  
-// From arch/i386/kernel/i8259.c  
-outb_p(0x34,0x43); /* binary, mode 2, LSB/MSB, ch 0 */  
-outb_p(LATCH & 0xff , 0x40); /* LSB */  
-outb(LATCH >> 8 , 0x40); /* MSB */  
-  
-  
-  
-So we program 8253 (PIT, Programmable Interval Timer) with LATCH = (1193180/HZ)  
-= 11931.8 when HZ=100 (default). LATCH indicates the frequency divisor factor.  
-  
-  
-LATCH = 11931.8 gives to 8253 (in output) a frequency of 1193180 / 11931.8  
-= 100 Hz, so period = 10ms  
-  
-  
-So Timeslice = 1/HZ.  
-  
-  
-With each Timeslice we temporarily interrupt current process execution  
-(without task switching), and we do some housekeeping work, after which we'll  
-return back to our previous process.  
-  
-!Linux Timer IRQ ICA  
-  
-  
-  
-  
-Linux Timer IRQ  
-IRQ 0 [[Timer]  
-|  
-\|/  
-|IRQ0x00_interrupt // wrapper IRQ handler  
-|SAVE_ALL ---  
-|do_IRQ | wrapper routines  
-|handle_IRQ_event ---  
-|handler() -> timer_interrupt // registered IRQ 0 handler  
-|do_timer_interrupt  
-|do_timer  
-|jiffies++;  
-|update_process_times  
-|if (--counter <= ) { // if time slice ended then  
-|counter = ; // reset counter  
-|need_resched = 1; // prepare to reschedule  
-|}  
-|do_softirq  
-|while (need_resched) { // if necessary  
-|schedule // reschedule  
-|handle_softirq  
-|}  
-|RESTORE_ALL  
-  
-  
-  
-Functions can be found under:  
-  
-  
-  
-  
-  
-*IRQ0x00_interrupt, SAVE_ALL [[include/asm/hw_irq.h]  
-*  
-  
-*do_IRQ, handle_IRQ_event [[arch/i386/kernel/irq.c]  
-*  
-  
-*timer_interrupt, do_timer_interrupt [[arch/i386/kernel/time.c]  
-*  
-  
-*do_timer, update_process_times [[kernel/timer.c]  
-*  
-  
-*do_softirq [[kernel/soft_irq.c]  
-*  
-  
-*RESTORE_ALL, while loop [[arch/i386/kernel/entry.S]  
-*  
-  
-  
-  
-Notes:  
-  
-  
-  
-  
-  
-#Function "IRQ0x00_interrupt" (like others IRQ0xXY_interrupt) is directly  
-pointed by IDT (Interrupt Descriptor Table, similar to Real Mode Interrupt  
-Vector Table, see Cap 11 for more), so EVERY interrupt coming to the processor  
-is managed by "IRQ0x#NR_interrupt" routine, where #NR is the interrupt  
-number. We refer to it as "wrapper irq handler".  
-#  
-  
-#wrapper routines are executed, like "do_IRQ","handle_IRQ_event" [[arch/i386/kernel/irq.c].  
-#  
-  
-#After this, control is passed to official IRQ routine (pointed by "handler()"),  
-previously registered with "request_irq" [[arch/i386/kernel/irq.c],  
-in this case "timer_interrupt" [[arch/i386/kernel/time.c].  
-#  
-  
-#"timer_interrupt" [[arch/i386/kernel/time.c] routine is executed  
-and, when it ends,  
-#  
-  
-#control backs to some assembler routines [[arch/i386/kernel/entry.S].  
-#  
-  
-  
-  
-Description:  
-  
-  
-To manage Multitasking, Linux (like every other Unix) uses a ''counter''  
-variable to keep track of how much CPU was used by the task. So, on each IRQ  
-, the counter is decremented (point 4) and, when it reaches , we need to  
-switch task to manage timesharing (point 4 "need_resched" variable is set to  
-1, then, in point 5 assembler routines control "need_resched" and call, if needed,  
-"schedule" [[kernel/sched.c]).  
-  
-!!6.3 Scheduler  
-  
-  
-  
-The scheduler is the piece of code that chooses what Task has to be executed  
-at a given time.  
-  
-  
-Any time you need to change running task, select a candidate. Below is  
-the ''schedule [[kernel/sched.c]'' function.  
-  
-  
-  
-  
-|schedule  
-|do_softirq // manages post-IRQ work  
-|for each task  
-|calculate counter  
-|prepare_to__switch // does anything  
-|switch_mm // change Memory context (change CR3 value)  
-|switch_to (assembler)  
-|SAVE ESP  
-|RESTORE future_ESP  
-|SAVE EIP  
-|push future_EIP *** push parameter as we did a call  
-|jmp __switch_to (it does some TSS work)  
-|__switch_to()  
-..  
-|ret *** ret from call using future_EIP in place of call address  
-new_task  
-  
-  
-!!6.4 Bottom Half, Task Queues. and Tasklets  
-  
-  
-!Overview  
-  
-  
-In classic Unix, when an IRQ comes (from a device), Unix makes "task switching"  
-to interrogate the task that requested the device.  
-  
-  
-To improve performance, Linux can postpone the non-urgent work until later,  
-to better manage high speed event.  
-  
-  
-This feature is managed since kernel 1.x by the "bottom half" (BH). The irq  
-handler "marks" a bottom half, to be executed later, in scheduling time.  
-  
-  
-In the latest kernels there is a "task queue"that is more dynamic than BH  
-and there is also a "tasklet" to manage multiprocessor environments.  
-  
-  
-BH schema is:  
-  
-  
-  
-  
-  
-#Declaration  
-#  
-  
-#Mark  
-#  
-  
-#Execution  
-#  
-  
-  
-!Declaration  
-  
-  
-  
-  
-#define DECLARE_TASK_QUEUE(q) LIST_HEAD(q)  
-#define LIST_HEAD(name) \  
-struct list_head name = LIST_HEAD_INIT(name)  
-struct list_head {  
-struct list_head *next, *prev;  
-};  
-#define LIST_HEAD_INIT(name) { &(name), &(name) }  
-''DECLARE_TASK_QUEUE'' [[include/linux/tqueue.h, include/linux/list.h]  
-  
-  
-  
-"DECLARE_TASK_QUEUE(q)" macro is used to declare a structure named "q" managing  
-task queue.  
-  
-!Mark  
-  
-  
-Here is the ICA schema for "mark_bh" [[include/linux/interrupt.h]  
-function:  
-  
-  
-  
-  
-|mark_bh(NUMBER)  
-|tasklet_hi_schedule(bh_task_vec + NUMBER)  
-|insert into tasklet_hi_vec  
-|__cpu_raise_softirq(HI_SOFTIRQ)  
-|soft_active |= (1 << HI_SOFTIRQ)  
-''mark_bh''[[include/linux/interrupt.h]  
-  
-  
-  
-For example, when an IRQ handler wants to "postpone" some work, it would  
-"mark_bh(NUMBER)", where NUMBER is a BH declarated (see section before).  
-  
-!Execution  
-  
-  
-We can see this calling from "schedule" [[kernel/sched.c] function:  
-  
-  
-  
-  
-/* Do "administrative" work here while we don't hold any locks */  
-|if (softirq_active(this_cpu) & softirq_mask(this_cpu))  
-|do_softirq();  
-|for (active=softirq_active & softirq_mask;active;active >>= 1)  
-|softirq_vec++->action();  
-''schedule'' [[kernel/sched.c]  
-  
-  
-!!6.5 Very low level routines  
-  
-  
-  
-set_intr_gate  
-  
-  
-set_trap_gate  
-  
-  
-set_task_gate (not used).  
-  
-  
-(*interrupt)[[NR_IRQS](void) = { IRQ0x00_interrupt, IRQ0x01_interrupt,  
-..}  
-  
-  
-NR_IRQS = 224 [[kernel 2.4.2]  
-  
-!!6.6 Task Switching  
-  
-  
-!When does Task switching occur?  
-  
-  
-Now we'll see how the Linux Kernel switchs from one task to another.  
-  
-  
-Task Switching is needed in many cases, such as the following:  
-  
-  
-  
-  
-  
-*when !TimeSlice ends, we need to give access to some other task  
-*  
-  
-*when a task decide to access a resource, it sleeps for it, so we have to  
-choose another task  
-*  
-  
-*when a task waits for a pipe, we have to give access to other task, which  
-would write to pipe  
-*  
-  
-  
-!Task Switching  
-  
-  
-  
-  
-TASK SWITCHING TRICK  
-#define switch_to(prev,next,last) do { \  
-asm volatile("pushl %%esi\n\t" \  
-"pushl %%edi\n\t" \  
-"pushl %%ebp\n\t" \  
-"movl %%esp,%\n\t" /* save ESP */ \  
-"movl %3,%%esp\n\t" /* restore ESP */ \  
-"movl $1f,%1\n\t" /* save EIP */ \  
-"pushl %4\n\t" /* restore EIP */ \  
-"jmp __switch_to\n" \  
-"1:\t" \  
-"popl %%ebp\n\t" \  
-"popl %%edi\n\t" \  
-"popl %%esi\n\t" \  
-:"=m" (prev->thread.esp),"=m" (prev->thread.eip), \  
-"=b" (last) \  
-:"m" (next->thread.esp),"m" (next->thread.eip), \  
-"a" (prev), "d" (next), \  
-"b" (prev)); \  
-} while ()  
-  
-  
-  
-Trick is here:  
-  
-  
-  
-  
-  
-#''pushl %4'' which puts future_EIP into the stack  
-#  
-  
-#''jmp __switch_to'' which execute ''__switch_to'' function, but in opposite  
-of ''call'' we will return to valued pushed in point 1 (so new Task!)  
-#  
-  
-  
-  
-  
-  
-U S E R M O D E K E R N E L M O D E  
-| | | | | | | |  
-| | | | Timer | | | |  
-| | | Normal | IRQ | | | |  
-| | | Exec |------>|Timer_Int.| | |  
-| | | | | | .. | | |  
-| | | \|/ | |schedule()| | Task1 Ret|  
-| | | | |_switch_to|<-- | Address |  
-|__________| |__________| | | | | |  
-| | |S | |  
-Task1 Data/Stack Task1 Code | | |w | |  
-| | T|i | |  
-| | a|t | |  
-| | | | | | s|c | |  
-| | | | Timer | | k|h | |  
-| | | Normal | IRQ | | |i | |  
-| | | Exec |------>|Timer_Int.| |n | |  
-| | | | | | .. | |g | |  
-| | | \|/ | |schedule()| | | Task2 Ret|  
-| | | | |_switch_to|<-- | Address |  
-|__________| |__________| |__________| |__________|  
-Task2 Data/Stack Task2 Code Kernel Code Kernel Data/Stack  
-  
-  
-!!6.7 Fork  
-  
-  
-!Overview  
-  
-  
-Fork is used to create another task. We start from a Task Parent, and we  
-copy many data structures to Task Child.  
-  
-  
-  
-  
-| |  
-| .. |  
-Task Parent | |  
-| | | |  
-| fork |---------->| CREATE |  
-| | /| NEW |  
-|_________| / | TASK |  
-/ | |  
---- / | |  
---- / | .. |  
-/ | |  
-Task Child /  
-| | /  
-| fork |<-/  
-| |  
-|_________|  
-Fork !SysCall  
-  
-  
-!What is not copied  
-  
-  
-New Task just created (''Task Child'') is almost equal to Parent (''Task  
-Parent''), there are only few differences:  
-  
-  
-  
-  
-  
-#obviously PID  
-#  
-  
-#child ''fork()'' will return , while parent ''fork()'' will return PID  
-of Task Child, to distinguish them each other in User Mode  
-#  
-  
-#All child data pages are marked ''READ + EXECUTE'', no "WRITE'' (while parent  
-has WRITE right for its own pages) so, when a write request comes, a ''Page  
-Fault'' exception is generated which will create a new independent page: this  
-mechanism is called ''Copy on Write'' (see Cap.10 for more).  
-#  
-  
-  
-!Fork ICA  
-  
-  
-  
-  
-|sys_fork  
-|do_fork  
-|alloc_task_struct  
-|__get_free_pages  
-|p->state = TASK_UNINTERRUPTIBLE  
-|copy_flags  
-|p->pid = get_pid  
-|copy_files  
-|copy_fs  
-|copy_sighand  
-|copy_mm // should manage !CopyOnWrite (I part)  
-|allocate_mm  
-|mm_init  
-|pgd_alloc -> get_pgd_fast  
-|get_pgd_slow  
-|dup_mmap  
-|copy_page_range  
-|ptep_set_wrprotect  
-|clear_bit // set page to read-only  
-|copy_segments // For LDT  
-|copy_thread  
-|childregs->eax =  
-|p->thread.esp = childregs // child fork returns  
-|p->thread.eip = ret_from_fork // child starts from fork exit  
-|retval = p->pid // parent fork returns child pid  
-|SET_LINKS // insertion of task into the list pointers  
-|nr_threads++ // Global variable  
-|wake_up_process(p) // Now we can wake up just created child  
-|return retval  
-fork ICA  
-  
-  
-  
-  
-  
-  
-*sys_fork [[arch/i386/kernel/process.c]  
-*  
-  
-*do_fork [[kernel/fork.c]  
-*  
-  
-*alloc_task_struct [[include/asm/processor.c]  
-*  
-  
-*__get_free_pages [[mm/page_alloc.c]  
-*  
-  
-*get_pid [[kernel/fork.c]  
-*  
-  
-*copy_files  
-*  
-  
-*copy_fs  
-*  
-  
-*copy_sighand  
-*  
-  
-*copy_mm  
-*  
-  
-*allocate_mm  
-*  
-  
-*mm_init  
-*  
-  
-*pgd_alloc -> get_pgd_fast [[include/asm/pgalloc.h]  
-*  
-  
-*get_pgd_slow  
-*  
-  
-*dup_mmap [[kernel/fork.c]  
-*  
-  
-*copy_page_range [[mm/memory.c]  
-*  
-  
-*ptep_set_wrprotect [[include/asm/pgtable.h]  
-*  
-  
-*clear_bit [[include/asm/bitops.h]  
-*  
-  
-*copy_segments [[arch/i386/kernel/process.c]  
-*  
-  
-*copy_thread  
-*  
-  
-*SET_LINKS [[include/linux/sched.h]  
-*  
-  
-*wake_up_process [[kernel/sched.c]  
-*  
-  
-  
-!Copy on Write  
-  
-  
-To implement Copy on Write for Linux:  
-  
-  
-  
-  
-  
-#Mark all copied pages as read-only, causing a Page Fault when a child tries  
-to write to them.  
-#  
-  
-#Page Fault handler creates a new page for the Task caused exception.  
-#  
-  
-  
-  
-  
-  
-| Page  
-| Fault  
-| Exception  
-|  
-|  
------------> |do_page_fault  
-|handle_mm_fault  
-|handle_pte_fault  
-|do_wp_page  
-|alloc_page // Allocate a new page  
-|break_cow  
-|copy_cow_page // Copy old page to new one  
-|establish_pte // reconfig Page Table pointers  
-|set_pte  
-Page Fault ICA  
-  
-  
-  
-  
-  
-  
-*do_page_fault [[arch/i386/mm/fault.c]  
-*  
-  
-*handle_mm_fault [[mm/memory.c]  
-*  
-  
-*handle_pte_fault  
-*  
-  
-*do_wp_page  
-*  
-  
-*alloc_page [[include/linux/mm.h]  
-*  
-  
-*break_cow [[mm/memory.c]  
-*  
-  
-*copy_cow_page  
-*  
-  
-*establish_pte  
-*  
-  
-*set_pte [[include/asm/pgtable-3level.h]  
-*  
-  
-----  
-  
-!!7. Linux Memory Management  
-  
-!!7.1 Overview  
-  
-  
-  
-Linux uses segmentation + pagination, which simplifies notation.  
-  
-!Segments  
-  
-  
-Linux uses only 4 segments:  
-  
-  
-  
-  
-  
-*2 segments (code and data/stack) for KERNEL SPACE from [[0xC000 0000]  
-(3 GB) to [[0xFFFF FFFF] (4 GB)  
-*  
-  
-*2 segments (code and data/stack) for USER SPACE from [[] (0 GB)  
-to [[0xBFFF FFFF] (3 GB)  
-*  
-  
-  
-  
-  
-  
-__  
-4 GB--->| | |  
-| Kernel | | Kernel Space (Code + Data/Stack)  
-| | __|  
-3 GB--->|----------------| __  
-| | |  
-| | |  
-2 GB--->| | |  
-| Tasks | | User Space (Code + Data/Stack)  
-| | |  
-1 GB--->| | |  
-| | |  
-|________________| __|  
-0x00000000  
-Kernel/User Linear addresses  
-  
-  
-!!7.2 Specific i386 implementation  
-  
-  
-  
-Again, Linux implements Pagination using 3 Levels of Paging, but in i386  
-architecture only 2 of them are really used:  
-  
-  
-  
-  
-------------------------------------------------------------------  
-L I N E A R A D D R E S S  
-------------------------------------------------------------------  
-\___/ \___/ \_____/  
-PD offset PF offset Frame offset  
-[[10 bits] [[10 bits] [[12 bits]  
-| | |  
-| | ----------- |  
-| | | Value |----------|---------  
-| | | | |---------| /|\ | |  
-| | | | | | | | |  
-| | | | | | | Frame offset |  
-| | | | | | \|/ |  
-| | | | |---------|<------ |  
-| | | | | | | |  
-| | | | | | | x 4096 |  
-| | | PF offset|_________|------- |  
-| | | /|\ | | |  
-PD offset |_________|----- | | | _________|  
-/|\ | | | | | | |  
-| | | | \|/ | | \|/  
-_____ | | | ------>|_________| PHYSICAL ADDRESS  
-| | \|/ | | x 4096 | |  
-| CR3 |-------->| | | |  
-|_____| | ....... | | ....... |  
-| | | |  
-Page Directory Page File  
-Linux i386 Paging  
-  
-  
-!!7.3 Memory Mapping  
-  
-  
-  
-Linux manages Access Control with Pagination only, so different Tasks will  
-have the same segment addresses, but different CR3 (register used to store  
-Directory Page Address), pointing to different Page Entries.  
-  
-  
-In User mode a task cannot overcome 3 GB limit (0 x C0 00 00 00), so only  
-the first 768 page directory entries are meaningful (768*4MB = 3GB).  
-  
-  
-When a Task goes in Kernel Mode (by System call or by IRQ) the other 256  
-pages directory entries become important, and they point to the same page files  
-as all other Tasks (which are the same as the Kernel).  
-  
-  
-Note that Kernel (and only kernel) Linear Space is equal to Kernel Physical  
-Space, so:  
-  
-  
-  
-  
-________________ _____  
-|Other !KernelData|___ | | |  
-|----------------| | |__| |  
-| Kernel |\ |____| Real Other |  
-3 GB --->|----------------| \ | Kernel Data |  
-| |\ \ | |  
-| __|_\_\____|__ Real |  
-| Tasks | \ \ | Tasks |  
-| __|___\_\__|__ Space |  
-| | \ \ | |  
-| | \ \|----------------|  
-| | \ |Real !KernelSpace|  
-|________________| \|________________|  
-Logical Addresses Physical Addresses  
-  
-  
-  
-Linear Kernel Space corresponds to Physical Kernel Space translated 3  
-GB down (in fact page tables are something like { "00000000", "00000001" },  
-so they operate no virtualization, they only report physical addresses they  
-take from linear ones).  
-  
-  
-Notice that you'll not have an "addresses conflict" between Kernel and User  
-spaces because we can manage physical addresses with Page Tables.  
-  
-!!7.4 Low level memory allocation  
-  
-  
-!Boot Initialization  
-  
-  
-We start from kmem_cache_init (launched by start_kernel [[init/main.c]  
-at boot up).  
-  
-  
-  
-  
-|kmem_cache_init  
-|kmem_cache_estimate  
-  
-  
-  
-kmem_cache_init [[mm/slab.c]  
-  
-  
-kmem_cache_estimate  
-  
-  
-Now we continue with mem_init (also launched by start_kernel[[init/main.c])  
-  
-  
-  
-  
-|mem_init  
-|free_all_bootmem  
-|free_all_bootmem_core  
-  
-  
-  
-mem_init [[arch/i386/mm/init.c]  
-  
-  
-free_all_bootmem [[mm/bootmem.c]  
-  
-  
-free_all_bootmem_core  
-  
-!Run-time allocation  
-  
-  
-Under Linux, when we want to allocate memory, for example during "copy_on_write"  
-mechanism (see Cap.10), we call:  
-  
-  
-  
-  
-|copy_mm  
-|allocate_mm = kmem_cache_alloc  
-|__kmem_cache_alloc  
-|kmem_cache_alloc_one  
-|alloc_new_slab  
-|kmem_cache_grow  
-|kmem_getpages  
-|__get_free_pages  
-|alloc_pages  
-|alloc_pages_pgdat  
-|__alloc_pages  
-|rmqueue  
-|reclaim_pages  
-  
-  
-  
-Functions can be found under:  
-  
-  
-  
-  
-  
-*copy_mm [[kernel/fork.c]  
-*  
-  
-*allocate_mm [[kernel/fork.c]  
-*  
-  
-*kmem_cache_alloc [[mm/slab.c]  
-*  
-  
-*__kmem_cache_alloc  
-*  
-  
-*kmem_cache_alloc_one  
-*  
-  
-*alloc_new_slab  
-*  
-  
-*kmem_cache_grow  
-*  
-  
-*kmem_getpages  
-*  
-  
-*__get_free_pages [[mm/page_alloc.c]  
-*  
-  
-*alloc_pages [[mm/numa.c]  
-*  
-  
-*alloc_pages_pgdat  
-*  
-  
-*__alloc_pages [[mm/page_alloc.c]  
-*  
-  
-*rm_queue  
-*  
-  
-*reclaim_pages [[mm/vmscan.c]  
-*  
-  
-  
-  
-TODO: Understand Zones  
-  
-!!7.5 Swap  
-  
-  
-!Overview  
-  
-  
-Swap is managed by the kswapd daemon (kernel thread).  
-  
-!kswapd  
-  
-  
-As other kernel threads, kswapd has a main loop that wait to wake up.  
-  
-  
-  
-  
-|kswapd  
-|// initialization routines  
-|for (;;) { // Main loop  
-|do_try_to_free_pages  
-|recalculate_vm_stats  
-|refill_inactive_scan  
-|run_task_queue  
-|interruptible_sleep_on_timeout // we sleep for a new swap request  
-|}  
-  
-  
-  
-  
-  
-  
-*kswapd [[mm/vmscan.c]  
-*  
-  
-*do_try_to_free_pages  
-*  
-  
-*recalculate_vm_stats [[mm/swap.c]  
-*  
-  
-*refill_inactive_scan [[mm/vmswap.c]  
-*  
-  
-*run_task_queue [[kernel/softirq.c]  
-*  
-  
-*interruptible_sleep_on_timeout [[kernel/sched.c]  
-*  
-  
-  
-!When do we need swapping?  
-  
-  
-Swapping is needed when we have to access a page that is not in physical  
-memory.  
-  
-  
-Linux uses ''kswapd'' kernel thread to carry out this purpose. When the  
-Task receives a page fault exception we do the following:  
-  
-  
-  
-  
-| Page Fault Exception  
-| cause by all these conditions:  
-| a-) User page  
-| b-) Read or write access  
-| c-) Page not present  
-|  
-|  
------------> |do_page_fault  
-|handle_mm_fault  
-|pte_alloc  
-|pte_alloc_one  
-|__get_free_page = __get_free_pages  
-|alloc_pages  
-|alloc_pages_pgdat  
-|__alloc_pages  
-|wakeup_kswapd // We wake up kernel thread kswapd  
-Page Fault ICA  
-  
-  
-  
-  
-  
-  
-*do_page_fault [[arch/i386/mm/fault.c]  
-*  
-  
-*handle_mm_fault [[mm/memory.c]  
-*  
-  
-*pte_alloc  
-*  
-  
-*pte_alloc_one [[include/asm/pgalloc.h]  
-*  
-  
-*__get_free_page [[include/linux/mm.h]  
-*  
-  
-*__get_free_pages [[mm/page_alloc.c]  
-*  
-  
-*alloc_pages [[mm/numa.c]  
-*  
-  
-*alloc_pages_pgdat  
-*  
-  
-*__alloc_pages  
-*  
-  
-*wakeup_kswapd [[mm/vmscan.c]  
-*  
-  
-----  
-  
-!!8. Linux Networking  
-  
-!!8.1 How Linux networking is managed?  
-  
-  
-  
-There exists a device driver for each kind of NIC. Inside it, Linux will  
-ALWAYS call a standard high level routing: "netif_rx [[net/core/dev.c]",  
-which will controls what 3 level protocol the frame belong to, and it will  
-call the right 3 level function (so we'll use a pointer to the function to  
-determine which is right).  
-  
-!!8.2 TCP example  
-  
-  
-  
-We'll see now an example of what happens when we send a TCP packet to Linux,  
-starting from ''netif_rx [[net/core/dev.c]'' call.  
-  
-!Interrupt management: "netif_rx"  
-  
-  
-  
-  
-|netif_rx  
-|__skb_queue_tail  
-|qlen++  
-|* simple pointer insertion *  
-|cpu_raise_softirq  
-|softirq_active(cpu) |= (1 << NET_RX_SOFTIRQ) // set bit NET_RX_SOFTIRQ in the BH vector  
-  
-  
-  
-Functions:  
-  
-  
-  
-  
-  
-*__skb_queue_tail [[include/linux/skbuff.h]  
-*  
-  
-*cpu_raise_softirq [[kernel/softirq.c]  
-*  
-  
-  
-!Post Interrupt management: "net_rx_action"  
-  
-  
-Once IRQ interaction is ended, we need to follow the next part of the frame  
-life and examine what NET_RX_SOFTIRQ does.  
-  
-  
-We will next call ''net_rx_action [[net/core/dev.c]'' according  
-to "net_dev_init [[net/core/dev.c]".  
-  
-  
-  
-  
-|net_rx_action  
-|skb = __skb_dequeue (the exact opposite of __skb_queue_tail)  
-|for (ptype = first_protocol; ptype < max_protocol; ptype++) // Determine  
-|if (skb->protocol == ptype) // what is the network protocol  
-|ptype->func -> ip_rcv // according to ''struct ip_packet_type [[net/ipv4/ip_output.c]''  
-**** NOW WE KNOW THAT PACKET IS IP ****  
-|ip_rcv  
-|NF_HOOK (ip_rcv_finish)  
-|ip_route_input // search from routing table to determine function to call  
-|skb->dst->input -> ip_local_deliver // according to previous routing table check, destination is local machine  
-|ip_defrag // reassembles IP fragments  
-|NF_HOOK (ip_local_deliver_finish)  
-|ipprot->handler -> tcp_v4_rcv // according to ''tcp_protocol [[include/net/protocol.c]''  
-**** NOW WE KNOW THAT PACKET IS TCP ****  
-|tcp_v4_rcv  
-|sk = __tcp_v4_lookup  
-|tcp_v4_do_rcv  
-|switch(sk->state)  
-*** Packet can be sent to the task which uses relative socket ***  
-|case TCP_ESTABLISHED:  
-|tcp_rcv_established  
-|__skb_queue_tail // enqueue packet to socket  
-|sk->data_ready -> sock_def_readable  
-|wake_up_interruptible  
-*** Packet has still to be handshaked by 3-way TCP handshake ***  
-|case TCP_LISTEN:  
-|tcp_v4_hnd_req  
-|tcp_v4_search_req  
-|tcp_check_req  
-|syn_recv_sock -> tcp_v4_syn_recv_sock  
-|__tcp_v4_lookup_established  
-|tcp_rcv_state_process  
-*** 3-Way TCP Handshake ***  
-|switch(sk->state)  
-|case TCP_LISTEN: // We received SYN  
-|conn_request -> tcp_v4_conn_request  
-|tcp_v4_send_synack // Send SYN + ACK  
-|tcp_v4_synq_add // set SYN state  
-|case TCP_SYN_SENT: // we received SYN + ACK  
-|tcp_rcv_synsent_state_process  
-tcp_set_state(TCP_ESTABLISHED)  
-|tcp_send_ack  
-|tcp_transmit_skb  
-|queue_xmit -> ip_queue_xmit  
-|ip_queue_xmit2  
-|skb->dst->output  
-|case TCP_SYN_RECV: // We received ACK  
-|if (ACK)  
-|tcp_set_state(TCP_ESTABLISHED)  
-  
-  
-  
-Functions can be found under:  
-  
-  
-  
-  
-  
-*net_rx_action [[net/core/dev.c]  
-*  
-  
-*__skb_dequeue [[include/linux/skbuff.h]  
-*  
-  
-*ip_rcv [[net/ipv4/ip_input.c]  
-*  
-  
-*NF_HOOK -> nf_hook_slow [[net/core/netfilter.c]  
-*  
-  
-*ip_rcv_finish [[net/ipv4/ip_input.c]  
-*  
-  
-*ip_route_input [[net/ipv4/route.c]  
-*  
-  
-*ip_local_deliver [[net/ipv4/ip_input.c]  
-*  
-  
-*ip_defrag [[net/ipv4/ip_fragment.c]  
-*  
-  
-*ip_local_deliver_finish [[net/ipv4/ip_input.c]  
-*  
-  
-*tcp_v4_rcv [[net/ipv4/tcp_ipv4.c]  
-*  
-  
-*__tcp_v4_lookup  
-*  
-  
-*tcp_v4_do_rcv  
-*  
-  
-*tcp_rcv_established [[net/ipv4/tcp_input.c]  
-*  
-  
-*__skb_queue_tail [[include/linux/skbuff.h]  
-*  
-  
-*sock_def_readable [[net/core/sock.c]  
-*  
-  
-*wake_up_interruptible [[include/linux/sched.h]  
-*  
-  
-*tcp_v4_hnd_req [[net/ipv4/tcp_ipv4.c]  
-*  
-  
-*tcp_v4_search_req  
-*  
-  
-*tcp_check_req  
-*  
-  
-*tcp_v4_syn_recv_sock  
-*  
-  
-*__tcp_v4_lookup_established  
-*  
-  
-*tcp_rcv_state_process [[net/ipv4/tcp_input.c]  
-*  
-  
-*tcp_v4_conn_request [[net/ipv4/tcp_ipv4.c]  
-*  
-  
-*tcp_v4_send_synack  
-*  
-  
-*tcp_v4_synq_add  
-*  
-  
-*tcp_rcv_synsent_state_process [[net/ipv4/tcp_input.c]  
-*  
-  
-*tcp_set_state [[include/net/tcp.h]  
-*  
-  
-*tcp_send_ack [[net/ipv4/tcp_output.c]  
-*  
-  
-  
-  
-Description:  
-  
-  
-  
-  
-  
-*First we determine protocol type (IP, then TCP)  
-*  
-  
-*NF_HOOK (function) is a wrapper routine that first manages the network  
-filter (for example firewall), then it calls ''function''.  
-*  
-  
-*After we manage 3-way TCP Handshake which consists of:  
-*  
-  
-  
-  
-  
-  
-SERVER (LISTENING) CLIENT (CONNECTING)  
-SYN  
-<-------------------  
-SYN + ACK  
-------------------->  
-ACK  
-<-------------------  
-3-Way TCP handshake  
-  
-  
-  
-  
-  
-  
-*In the end we only have to launch "tcp_rcv_established [[net/ipv4/tcp_input.c]"  
-which gives the packet to the user socket and wakes it up.  
-*  
-  
-----  
-  
-!!9. Linux File System  
-  
-  
-TODO  
-----  
-  
-!!10. Useful Tips  
-  
-!!10.1 Stack and Heap  
-  
-  
-!Overview  
-  
-  
-Here we view how "stack" and "heap" are allocated in memory  
-  
-!Memory allocation  
-  
-  
-  
-  
-FF.. | | <-- bottom of the stack  
-/|\ | | |  
-higher | | | | stack  
-values | | | \|/ growing  
-| |  
-XX.. | | <-- top of the stack [[Stack Pointer]  
-| |  
-| |  
-| |  
-00.. |_________________| <-- end of stack [[Stack Segment]  
-Stack  
-  
-  
-  
-Memory address values start from 00.. (which is also where Stack Segment  
-begins) and they grow going toward FF.. value.  
-  
-  
-XX.. is the actual value of the Stack Pointer.  
-  
-  
-Stack is used by functions for:  
-  
-  
-  
-  
-  
-#global variables  
-#  
-  
-#local variables  
-#  
-  
-#return address  
-#  
-  
-  
-  
-For example, for a classical function:  
-  
-  
-  
-  
-|int foo_function (parameter_1, parameter_2, ..., parameter_n) {  
-|variable_1 declaration;  
-|variable_2 declaration;  
-..  
-|variable_n declaration;  
-|// Body function  
-|dynamic variable_1 declaration;  
-|dynamic variable_2 declaration;  
-..  
-|dynamic variable_n declaration;  
-|// Code is inside Code Segment, not Data/Stack segment!  
-|return (ret-type) value; // often it is inside some register, for i386 eax register is used.  
-|}  
-we have  
-| |  
-| 1. parameter_1 pushed | \  
-S | 2. parameter_2 pushed | | Before  
-T | ................... | | the calling  
-A | n. parameter_n pushed | /  
-C | ** Return address ** | -- Calling  
-K | 1. local variable_1 | \  
-| 2. local variable_2 | | After  
-| ................. | | the calling  
-| n. local variable_n | /  
-| |  
-... ... Free  
-... ... stack  
-| |  
-H | n. dynamic variable_n | \  
-E | ................... | | Allocated by  
-A | 2. dynamic variable_2 | | malloc & kmalloc  
-P | 1. dynamic variable_1 | /  
-|_______________________|  
-Typical stack usage  
-Note: variables order can be different depending on hardware architecture.  
-  
-  
-!!10.2 Application vs Process  
-  
-  
-!Base definition  
-  
-  
-We have to distinguish 2 concepts:  
-  
-  
-  
-  
-  
-*Application: that is the useful code we want to execute  
-*  
-  
-*Process: that is the IMAGE on memory of the application (it depends on  
-memory strategy used, segmentation and/or Pagination).  
-*  
-  
-  
-  
-Often Process is also called Task or Thread.  
-  
-!!10.3 Locks  
-  
-  
-!Overview  
-  
-  
-2 kind of locks:  
-  
-  
-  
-  
-  
-#intraCPU  
-#  
-  
-#interCPU  
-#  
-  
-  
-!!10.4 Copy_on_write  
-  
-  
-  
-Copy_on_write is a mechanism used to reduce memory usage. It postpones  
-memory allocation until the memory is really needed.  
-  
-  
-For example, when a task executes the "fork()" system call (to create another  
-task), we still use the same memory pages as the parent, in read only mode.  
-When the new task WRITES into the old page, it causes an exception and the  
-page is copied and marked "rw" (read, write).  
-  
-  
-  
-  
-1-) Page X is shared between Task Parent and Task Child  
-Task Parent  
-| | RW Access ______  
-| |---------->|Page X|  
-|_________| |______|  
-/|\  
-|  
-Task Child |  
-| | R Access |  
-| |----------------  
-|_________|  
-2-) Write request from Task Child  
-Task Parent  
-| | RW Access ______  
-| |---------->|Page X|  
-|_________| |______|  
-/|\  
-|  
-Task Child |  
-| | W Access |  
-| |----------------  
-|_________|  
-3-) Final Configuration: Task Parent and Task Child have an independent copy of the Page, X and Y  
-Task Parent  
-| | RW Access ______  
-| |---------->|Page X|  
-|_________| |______|  
-Task Child  
-| | RW Access ______  
-| |---------->|Page Y|  
-|_________| |______|  
-  
-----  
-  
-!!11. 80386 specific details  
-  
-!!11.1 Boot procedure  
-  
-  
-  
-  
-  
-bbootsect.s [[arch/i386/boot]  
-setup.S (+video.S)  
-head.S (+misc.c) [[arch/i386/boot/compressed]  
-start_kernel [[init/main.c]  
-  
-  
-!!11.2 80386 (and more) Descriptors  
-  
-  
-!Overview  
-  
-  
-Descriptors are data structure used by Intel microprocessor i386+ to virtualize  
-memory.  
-  
-!Kind of descriptors  
-  
-  
-  
-  
-  
-*GDT (Global Descriptor Table)  
-*  
-  
-*LDT (Local Descriptor Table)  
-*  
-  
-*IDT (Interrupt Descriptor Table)  
-*  
-  
-----  
-  
-!!12. IRQ  
-  
-!!12.1 Overview  
-  
-  
-  
-IRQ is an asyncronous signal sent to microprocessor to advertise a requested  
-work is completed  
-  
-!!12.2 Interaction schema  
-  
-  
-  
-  
-  
-|<--> IRQ() [[Timer]  
-|<--> IRQ(1) [[Device 1]  
-| ..  
-|<--> IRQ(n) [[Device n]  
-_____________________________|  
-/|\ /|\ /|\  
-| | |  
-\|/ \|/ \|/  
-Task(1) Task(2) .. Task(N)  
-IRQ - Tasks Interaction Schema  
-  
-  
-!What happens?  
-  
-  
-A typical O.S. uses many IRQ signals to interrupt normal process execution  
-and does some housekeeping work. So:  
-  
-  
-  
-  
-  
-#IRQ (i) occurs and Task(j) is interrupted  
-#  
-  
-#IRQ(i)_handler is executed  
-#  
-  
-#control backs to Task(j) interrupted  
-#  
-  
-  
-  
-Under Linux, when an IRQ comes, first the IRQ wrapper routine (named "interrupt0x??")  
-is called, then the "official" IRQ(i)_handler will be executed. This allows some  
-duties like timeslice preemption.  
-----  
-  
-!!13. Utility functions  
-  
-!!13.1 list_entry [[include/linux/list.h]  
-  
-  
-  
-Definition:  
-  
-  
-  
-  
-#define list_entry(ptr, type, member) \  
-((type *)((char *)(ptr)-(unsigned long)(&((type *))->member)))  
-  
-  
-  
-Meaning:  
-  
-  
-"list_entry" macro is used to retrieve a parent struct pointer, by using  
-only one of internal struct pointer.  
-  
-  
-Example:  
-  
-  
-  
-  
-struct __wait_queue {  
-unsigned int flags;  
-struct task_struct * task;  
-struct list_head task_list;  
-};  
-struct list_head {  
-struct list_head *next, *prev;  
-};  
-// and with type definition:  
-typedef struct __wait_queue wait_queue_t;  
-// we'll have  
-wait_queue_t *out list_entry(tmp, wait_queue_t, task_list);  
-// where tmp point to list_head  
-  
-  
-  
-So, in this case, by means of *tmp pointer [[list_head] we retrieve  
-an *out pointer [[wait_queue_t].  
-  
-  
-  
-  
-____________ <---- *out [[we calculate that]  
-|flags | /|\  
-|task *--> | |  
-|task_list |<---- list_entry  
-| prev * -->| | |  
-| next * -->| | |  
-|____________| ----- *tmp [[we have this]  
-  
-  
-!!13.2 Sleep  
-  
-  
-!Sleep code  
-  
-  
-Files:  
-  
-  
-  
-  
-  
-*kernel/sched.c  
-*  
-  
-*include/linux/sched.h  
-*  
-  
-*include/linux/wait.h  
-*  
-  
-*include/linux/list.h  
-*  
-  
-  
-  
-Functions:  
-  
-  
-  
-  
-  
-*interruptible_sleep_on  
-*  
-  
-*interruptible_sleep_on_timeout  
-*  
-  
-*sleep_on  
-*  
-  
-*sleep_on_timeout  
-*  
-  
-  
-  
-Called functions:  
-  
-  
-  
-  
-  
-*init_waitqueue_entry  
-*  
-  
-*__add_wait_queue  
-*  
-  
-*list_add  
-*  
-  
-*__list_add  
-*  
-  
-*__remove_wait_queue  
-*  
-  
-  
-  
-!InterCallings Analysis:  
-  
-  
-  
-  
-|sleep_on  
-|init_waitqueue_entry --  
-|__add_wait_queue | enqueuing request to resource list  
-|list_add |  
-|__list_add --  
-|schedule --- waiting for request to be executed  
-|__remove_wait_queue --  
-|list_del | dequeuing request from resource list  
-|__list_del --  
-  
-  
-  
-Description:  
-  
-  
-Under Linux each resource (ideally an object shared between many users  
-and many processes), , has a queue to manage ALL tasks requesting it.  
-  
-  
-This queue is called "wait queue" and it consists of many items we'll call  
-the"wait queue element":  
-  
-  
-  
-  
-*** wait queue structure [[include/linux/wait.h] ***  
-struct __wait_queue {  
-unsigned int flags;  
-struct task_struct * task;  
-struct list_head task_list;  
-}  
-struct list_head {  
-struct list_head *next, *prev;  
-};  
-  
-  
-  
-Graphic working:  
-  
-  
-  
-  
-*** wait queue element ***  
-/|\  
-|  
-<--[[prev *, flags, task *, next *]-->  
-*** wait queue list ***  
-/|\ /|\ /|\ /|\  
-| | | |  
---> <--[[task1]--> <--[[task2]--> <--[[task3]--> .... <--[[taskN]--> <--  
-| |  
-|__________________________________________________________________|  
-*** wait queue head ***  
-task1 <--[[prev *, lock, next *]--> taskN  
-  
-  
-  
-"wait queue head" point to first (with next *) and last (with prev *) elements  
-of the "wait queue list".  
-  
-  
-When a new element has to be added, "__add_wait_queue" [[include/linux/wait.h]  
-is called, after which the generic routine "list_add" [[include/linux/wait.h],  
-will be executed:  
-  
-  
-  
-  
-*** function list_add [[include/linux/list.h] ***  
-// classic double link list insert  
-static __inline__ void __list_add (struct list_head * new, \  
-struct list_head * prev, \  
-struct list_head * next) {  
-next->prev = new;  
-new->next = next;  
-new->prev = prev;  
-prev->next = new;  
-}  
-  
-  
-  
-To complete the description, we see also "__list_del" [[include/linux/list.h]  
-function called by "list_del" [[include/linux/list.h] inside "remove_wait_queue"  
-[[include/linux/wait.h]:  
-  
-  
-  
-  
-*** function list_del [[include/linux/list.h] ***  
-// classic double link list delete  
-static __inline__ void __list_del (struct list_head * prev, struct list_head * next) {  
-next->prev = prev;  
-prev->next = next;  
-}  
-  
-  
-!Stack consideration  
-  
-  
-A typical list (or queue) is usually managed allocating it into the Heap  
-(see Cap.10 for Heap and Stack definition and about where variables are allocated).  
-Otherwise here, we statically allocate Wait Queue data in a local variable  
-(Stack), then function is interrupted by scheduling, in the end, (returning  
-from scheduling) we'll erase local variable.  
-  
-  
-  
-  
-new task <----| task1 <------| task2 <------|  
-| | |  
-| | |  
-|..........| | |..........| | |..........| |  
-|wait.flags| | |wait.flags| | |wait.flags| |  
-|wait.task_|____| |wait.task_|____| |wait.task_|____|  
-|wait.prev |--> |wait.prev |--> |wait.prev |-->  
-|wait.next |--> |wait.next |--> |wait.next |-->  
-|.. | |.. | |.. |  
-|schedule()| |schedule()| |schedule()|  
-|..........| |..........| |..........|  
-|__________| |__________| |__________|  
-Stack Stack Stack  
-  
-----  
-  
-!!14. Static variables  
-  
-!!14.1 Overview  
-  
-  
-  
-Linux is written in ''C'' language, and as every application has:  
-  
-  
-  
-  
-  
-#Local variables  
-#  
-  
-#Module variables (inside the source file and relative only to that module)  
-#  
-  
-#Global/Static variables present in only 1 copy (the same for all modules)  
-#  
-  
-  
-  
-When a Static variable is modified by a module, all other modules will  
-see the new value.  
-  
-  
-Static variables under Linux are very important, cause they are the only  
-kind to add new support to kernel: they typically are pointers to the head  
-of a list of registered elements, which can be:  
-  
-  
-  
-  
-  
-*added  
-*  
-  
-*deleted  
-*  
-  
-*maybe modified  
-*  
-  
-  
-  
-  
-  
-_______ _______ _______  
-Global variable -------> |Item(1)| -> |Item(2)| -> |Item(3)| ..  
-|_______| |_______| |_______|  
-  
-  
-!!14.2 Main variables  
-  
-  
-!Current  
-  
-  
-  
-  
-________________  
-Current ----------------> | Actual process |  
-|________________|  
-  
-  
-  
-Current points to ''task_struct'' structure, which contains all data about  
-a process like:  
-  
-  
-  
-  
-  
-*pid, name, state, counter, policy of scheduling  
-*  
-  
-*pointers to many data structures like: files, vfs, other processes, signals...  
-*  
-  
-  
-  
-Current is not a real variable, it is  
-  
-  
-  
-  
-static inline struct task_struct * get_current(void) {  
-struct task_struct *current;  
-__asm__("andl %%esp,%; ":"=r" (current) : "" (~8191UL));  
-return current;  
-}  
-#define current get_current()  
-  
-  
-  
-Above lines just takes value of ''esp'' register (stack pointer) and get  
-it available like a variable, from which we can point to our task_struct structure.  
-  
-  
-From ''current'' element we can access directly to any other process (ready,  
-stopped or in any other state) kernel data structure, for example changing  
-STATE (like a I/O driver does), PID, presence in ready list or blocked list,  
-etc.  
-  
-  
-  
-  
-  
-  
-  
-!Registered filesystems  
-  
-  
-  
-  
-______ _______ ______  
-file_systems ------> | ext2 | -> | msdos | -> | ntfs |  
-[[fs/super.c] |______| |_______| |______|  
-  
-  
-  
-When you use command like ''modprobe some_fs'' you will add a new entry  
-to file systems list, while removing it (by using ''rmmod'') will delete it.  
-  
-!Mounted filesystems  
-  
-  
-  
-  
-______ _______ ______  
-mount_hash_table ---->| / | -> | /usr | -> | /var |  
-[[fs/namespace.c] |______| |_______| |______|  
-  
-  
-  
-When you use ''mount'' command to add a fs, the new entry will be inserted  
-in the list, while an ''umount'' command will delete the entry.  
-  
-!Registered Network Packet Type  
-  
-  
-  
-  
-______ _______ ______  
-ptype_all ------>| ip | -> | x25 | -> | ipv6 |  
-[[net/core/dev.c] |______| |_______| |______|  
-  
-  
-  
-For example, if you add support for IPv6 (loading relative module) a new  
-entry will be added in the list.  
-  
-!Registered Network Internet Protocol  
-  
-  
-  
-  
-______ _______ _______  
-inet_protocol_base ----->| icmp | -> | tcp | -> | udp |  
-[[net/ipv4/protocol.c] |______| |_______| |_______|  
-  
-  
-  
-Also others packet type have many internal protocols in each list (like  
-IPv6).  
-  
-  
-  
-  
-______ _______ _______  
-inet6_protos ----------->|icmpv6| -> | tcpv6 | -> | udpv6 |  
-[[net/ipv6/protocol.c] |______| |_______| |_______|  
-  
-  
-!Registered Network Device  
-  
-  
-  
-  
-______ _______ _______  
-dev_base --------------->| lo | -> | eth0 | -> | ppp0 |  
-[[drivers/core/Space.c] |______| |_______| |_______|  
-  
-  
-!Registered Char Device  
-  
-  
-  
-  
-______ _______ ________  
-chrdevs ---------------->| lp | -> | keyb | -> | serial |  
-[[fs/devices.c] |______| |_______| |________|  
-  
-  
-  
-''chrdevs'' is not a pointer to a real list, but it is a standard vector.  
-  
-!Registered Block Device  
-  
-  
-  
-  
-______ ______ ________  
-bdev_hashtable --------->| fd | -> | hd | -> | scsi |  
-[[fs/block_dev.c] |______| |______| |________|  
-  
-  
-  
-''bdev_hashtable'' is an hash vector.  
-----  
-  
-!!15. Glossary  
-----  
-  
-!!16 . Links  
-  
-  
-  
-Official Linux kernels and patches download site  
-  
-  
-Great documentation about Linux Kernel  
-  
-  
-Official Kernel Mailing list  
-  
-  
-----  
+Describe [HowToKernelAnalysisHOWTO ] here.