Differences between current version and previous revision of HowToUnixandInternetFundamentalsHOWTO.
Other diffs: Previous Major Revision, Previous Author, or view the Annotated Edit History
Newer page: | version 3 | Last edited on Monday, October 25, 2004 4:08:33 am | by StuartYeates | |
Older page: | version 2 | Last edited on Friday, June 7, 2002 1:07:45 am | by perry | Revert |
@@ -1,1706 +1 @@
-The Unix and Internet Fundamentals HOWTO
-!!!The Unix and Internet Fundamentals HOWTO
-!Eric Raymond
-
- esr@thyrsus.com
-
-
-
-__Revision History__Revision 2.52002-02-02Revised by: esrCorrected description of IP.Revision 2.42001-06-12Revised by: esrWhere to find more.Revision 2.32001-05-21Revised by: esrIntroduction to bus types.
-Polish translation link.Revision 2.22001-02-05Revised by: esrNew section on how DNS is organized. Corrected for new
-location of document. Various copy-edit fixes.Revision 2.12000-11-29Revised by: esrCorrect explanation of twos-complement numbers. Various
-copy-edit fixes.Revision 2.02000-08-05Revised by: esrFirst !DocBook version. Detailed description of memory hierarchy.Revision 1.72000-03-06Revised by: esrCorrect and expanded the section on file permissions.Revision 1.41999-09-25Revised by: esrBe more precise about what kernel does vs. what init does.Revision 1.31999-06-27Revised by: esrThe sections `What happens when you log in?' and `File
-ownership, permissions and security'.Revision 1.21998-12-26Revised by: esrThe section `How does my computer store things in memory?'.Revision 1.01998-10-29Revised by: esrInitial revision.
-
-
-
-
-
-This document describes the working basics of PC-class computers, Unix-like
-operating systems, and the Internet in non-technical language.
-
-
-
-
-
-
-----; __Table of Contents__; 1. Introduction: ; 1.1. Purpose of this document; 1.2. New versions of this document; 1.3. Feedback and corrections; 1.4. Related resources; 2. Basic anatomy of your computer; 3. What happens when you switch on a computer?; 4. What happens when you log in?; 5. What happens when you run programs from the shell?; 6. How do input devices and interrupts work?; 7. How does my computer do several things at once?; 8. How does my computer keep processes from stepping on each other?: ; 8.1. Virtual memory: the simple version; 8.2. Virtual memory: the detailed version; 8.3. The Memory Management Unit; 9. How does my computer store things in memory?: ; 9.1. Numbers; 9.2. Characters; 10. How does my computer store things on disk?: ; 10.1. Low-level disk and file system structure; 10.2. File names and directories; 10.3. Mount points; 10.4. How a file gets looked up; 10.5. File ownership, permissions and security; 10.6. How things can go wrong; 11. How do computer languages work?: ; 11.1. Compiled languages; 11.2. Interpreted languages; 11.3. P-code languages; 12. How does the Internet work?: ; 12.1. Names and locations; 12.2. The Domain Name System; 12.3. Packets and routers; 12.4. TCP and IP; 12.5. HTTP, an application protocol; 13. To Learn More----
-!!!1. Introduction
-!!1.1. Purpose of this document
-
-This document is intended to help Linux and Internet users who are
-learning by doing. While this is a great way to acquire specific skills,
-sometimes it leaves peculiar gaps in one's knowledge of the basics -- gaps
-which can make it hard to think creatively or troubleshoot effectively,
-from lack of a good mental model of what is really going on.
-
-
-
-I'll try to describe in clear, simple language how it all works. The
-presentation will be tuned for people using Unix or Linux on PC-class
-hardware. Nevertheless, I'll usually refer simply to `Unix'
here, as most
-of what I will describe is constant across platforms and across Unix
-variants.
-
-
-
-I'm going to assume you're using an Intel PC. The details differ
-slightly if you're running an Alpha or PowerPC or some other Unix box, but
-the basic concepts are the same.
-
-
-
-I won't repeat things, so you'll have to pay attention, but that
-also means you'll learn from every word you read. It's a good idea to just
-skim when you first read this; you should come back and reread it a few
-times after you've digested what you have learned.
-
-
-
-This is an evolving document. I intend to keep adding sections in
-response to user feedback, so you should come back and review it
-periodically.
-
-----
-!!1.2. New versions of this document
-
-New versions of the Unix and Internet Fundamentals HOWTO will be
-periodically posted to comp.os.linux.help and comp.os.linux.announce and news.answers. They will also be uploaded to various Linux WWW and
-FTP sites, including the LDP home page.
-
-
-
-You can view the latest version of this on the World Wide Web via the URL
-http://www.linuxdoc.org/HOWTO/Unix-and-Internet-Fundamentals-HOWTO/index.html.
-
-
-
-This document has been translated into Polish.
-
-----
-!!1.3. Feedback and corrections
-
-If you have questions or comments about this document, please feel
-free to mail Eric S. Raymond, at esr@thyrsus.com. I welcome any suggestions or criticisms. I
-especially welcome hyperlinks to more detailed explanations of individual
-concepts. If you find a mistake with this document, please let me know so
-I can correct it in the next version. Thanks.
-
-----
-!!1.4. Related resources
-
-If you're reading this in order to learn how to hack, you should also
-read the How To Become A Hacker FAQ. It has links to some other useful
-resources.
-
-----
-!!!2. Basic anatomy of your computer
-
-Your computer has a processor chip inside it that does the actual
-computing. It has internal memory (what DOS/Windows people call ``RAM''
-and Unix people often call ``core''; the Unix term is a folk memory from
-when RAM consisted of ferrite-core donuts). The processor and memory live
-on the
-''motherboard'',
-which is the heart of your computer.
-
-
-
-Your computer has a screen and keyboard. It has hard drives and
-floppy disks. Some of these devices are run by ''controller
-cards'' that plug into the motherboard and help the computer
-drive them; others are run by specialized chipsets directly on the
-motherboard that fulfill the same function as a controller card. Your
-keyboard is too simple to need a separate card; the controller is built
-into the keyboard chassis itself.
-
-
-
-We'll go into some of the details of how these devices work later. For
-now, here are a few basic things to keep in mind about how they work
-together:
-
-
-
-All the parts of your computer inside the case are connected by a
-''bus''.
-Physically, the bus is what you plug your controller cards into (the video
-card, the disk controller, a sound card if you have one). The bus is the
-data highway between your processor, your screen, your disk, and everything
-else.
-
-
-
-(If you've seen references to `ISA', `PCI', and `PCMCIA' in connection
-with PCs and have not understood them, these are bus types. ISA is, except
-in minor details, the same bus that was used on IBM's original PCs in 1980;
-it is passing out of use now. PCI, for Peripheral Component
-Interconnection, is the bus used on most modern PCs, and on modern
-Macintoshes as well. PCMCIA is a variant of ISA with smaller physical
-connectors used on laptop computers.)
-
-
-
-The processor, which makes everything else go, can't actually see any of
-the other pieces directly; it has to talk to them over the bus. The only
-other subsystem that it has really fast, immediate access to is memory (the
-core). In order for programs to run, then, they have to be ''in
-core'' (in memory).
-
-
-
-When your computer reads a program or data off the disk, what actually
-happens is that the processor uses the bus to send a disk read request
-to your disk controller. Some time later the disk controller uses the
-bus to signal the processor that it has read the data and put it in a
-certain location in memory. The processor can then use the bus to look
-at that data.
-
-
-
-Your keyboard and screen also communicate with the processor via the
-bus, but in simpler ways. We'll discuss those later on. For now, you know
-enough to understand what happens when you turn on your computer.
-
-----
-!!!3. What happens when you switch on a computer?
-
-A computer without a program running is just an inert hunk of
-electronics. The first thing a computer has to do when it is turned on is
-start up a special program called an ''operating
-system''. The operating system's job is to help other computer
-programs to work by handling the messy details of controlling the
-computer's hardware.
-
-
-
-The process of bringing up the operating system is called ''booting'' (originally this was
-''bootstrapping'' and alluded to the process of pulling
-yourself up ``by your bootstraps''). Your computer knows how to boot
-because instructions for booting are built into one of its chips, the BIOS
-(or Basic Input/Output System) chip.
-
-
-
-The BIOS chip tells it to look in a fixed place, usually on the
-lowest-numbered hard disk (the ''boot disk'') for a
-special program called a ''boot loader'' (under Linux the
-boot loader is called LILO). The boot loader is pulled into memory and
-started. The boot loader's job is to start the real operating
-system.
-
-
-
-The loader does this by looking for a
-''kernel'',
-loading it into memory, and starting it. When you boot Linux and see
-"LILO" on the screen followed by a bunch of dots, it is loading the kernel.
-(Each dot means it has loaded another ''disk
-block'' of kernel code.)
-
-
-
-(You may wonder why the BIOS doesn't load the kernel directly -- why the
-two-step process with the boot loader? Well, the BIOS isn't very smart.
-In fact it's very stupid, and Linux doesn't use it at all after boot time.
-It was originally written for primitive 8-bit PCs with tiny disks, and
-literally can't access enough of the disk to load the kernel directly. The
-boot loader step also lets you start one of several operating systems off
-different places on your disk, in the unlikely event that Unix isn't good
-enough for you.)
-
-
-
-Once the kernel starts, it has to look around, find the rest of the
-hardware, and get ready to run programs. It does this by poking not at
-ordinary memory locations but rather at ''I/O ports'' --
-special bus addresses that are likely to have device controller cards
-listening at them for commands. The kernel doesn't poke at random; it has
-a lot of built-in knowledge about what it's likely to find where, and how
-controllers will respond if they're present. This process is called
-''autoprobing''.
-
-
-
-Most of the messages you see at boot time are the kernel autoprobing
-your hardware through the I/O ports, figuring out what it has available to
-it and adapting itself to your machine. The Linux kernel is extremely good
-at this, better than most other Unixes and ''much'' better
-than DOS or Windows. In fact, many Linux old-timers think the cleverness
-of Linux's boot-time probes (which made it relatively easy to install) was
-a major reason it broke out of the pack of free-Unix experiments to attract
-a critical mass of users.
-
-
-
-But getting the kernel fully loaded and running isn't the end of the
-boot process; it's just the first stage (sometimes called ''run
-level 1''). After this first stage, the kernel hands control to a
-special process called `init' which spawns several housekeeping
-processes.
-
-
-
-The init process's first job is usually to check to make sure your disks
-are OK. Disk file systems are fragile things; if they've been damaged by a
-hardware failure or a sudden power outage, there are good reasons to take
-recovery steps before your Unix is all the way up. We'll go into some of
-this later on when we talk about how file systems can
-go wrong.
-
-
-
-Init's next step is to start several ''daemons''. A
-daemon is a program like a print spooler, a mail listener or a WWW server
-that lurks in the background, waiting for things to do. These special
-programs often have to coordinate several requests that could conflict.
-They are daemons because it's often easier to write one program that runs
-constantly and knows about all requests than it would be to try to make
-sure that a flock of copies (each processing one request and all running at
-the same time) don't step on each other. The particular collection of
-daemons your system starts may vary, but will almost always include a print
-spooler (a gatekeeper daemon for your printer).
-
-
-
-The next step is to prepare for users. Init starts a copy of a
-program called __getty__ to watch your console (and maybe
-more copies to watch dial-in serial ports). This program is what issues
-the __login__ prompt to your console. Once all daemons and
-getty processes for each terminal are started, we're at ''run level
-2''. At this level, you can log in and run programs.
-
-
-
-But we're not done yet. The next step is to start up various daemons
-that support networking and other services. Once that's done, we're at
-''run level 3'' and the system is fully ready for
-use.
-
-----
-!!!4. What happens when you log in?
-
-When you log in (give a name to __getty__) you
-identify yourself to the computer. It then runs a program called
-(naturally enough) __login__, which takes your password and
-checks to see if you are authorized to be using the machine. If you
-aren't, your login attempt will be rejected. If you are, login does a few
-housekeeping things and then starts up a command interpreter, the
-''shell''. (Yes, __getty__ and
-__login__ could be one program. They're separate for
-historical reasons not worth going into here.)
-
-
-
-Here's a bit more about what the system does before giving you a shell
-(you'll need to know this later when we talk about file permissions).
-You identify yourself with a login name and password. That login name is
-looked up in a file called /etc/passwd, which is a sequence of lines each
-describing a user account.
-
-
-
-One of these fields is an encrypted version of the account password
-(sometimes the encrypted fields are actually kept in a second /etc/shadow
-file with tighter permissions; this makes password cracking harder). What
-you enter as an account password is encrypted in exactly the same way, and
-the __login__ program checks to see if they match. The
-security of this method depends on the fact that, while it's easy to go
-from your clear password to the encrypted version, the reverse is very
-hard. Thus, even if someone can see the encrypted version of your
-password, they can't use your account. (It also means that if you forget
-your password, there's no way to recover it, only to change it to something
-else you choose.)
-
-
-
-Once you have successfully logged in, you get all the privileges
-associated with the individual account you are using. You may also be
-recognized as part of a
-''group''.
-A group is a named collection of users set up by the system administrator.
-Groups can have privileges independently of their members' privileges. A
-user can be a member of multiple groups. (For details about how Unix
-privileges work, see the section below on permissions.)
-
-
-
-(Note that although you will normally refer to users and groups by
-name, they are actually stored internally as numeric IDs. The password
-file maps your account name to a user ID; the
-/etc/group
-file maps group names to numeric group IDs. Commands that deal with
-accounts and groups do the translation automatically.)
-
-
-
-Your account entry also contains your ''home
-directory'', the place in the Unix file system where
-your personal files will live. Finally, your account entry also sets your
-''shell'',
-the command interpreter that __login__ will start up to
-accept your commmands.
-
-----
-!!!5. What happens when you run programs from the shell?
-
-The shell is Unix's interpreter for the commands you type in; it's
-called a shell because it wraps around and hides the operating system
-kernel. It's an important feature of Unix that the shell and kernel are
-separate programs communicating through a small set of system calls.
-This makes it possible for there to be multiple shells, suiting different
-tastes in interfaces.
-
-
-
-The normal shell gives you the '$' prompt that you see after logging in
-(unless you've customized it to be something else). We won't talk about
-shell syntax and the easy things you can see on the screen here; instead
-we'll take a look behind the scenes at what's happening from the
-computer's point of view.
-
-
-
-After boot time and before you run a program, you can think of your
-computer as containing a zoo of processes that are all waiting for
-something to do. They're all waiting on ''events''. An
-event can be you pressing a key or moving a mouse. Or, if your machine is
-hooked to a network, an event can be a data packet coming in over that
-network.
-
-
-
-The kernel is one of these processes. It's a special one, because it
-controls when the other ''user processes'' can run, and it
-is normally the only process with direct access to the machine's hardware.
-In fact, user processes have to make requests to the kernel when they want
-to get keyboard input, write to your screen, read from or write to disk, or
-do just about anything other than crunching bits in memory. These requests
-are known as ''system calls''.
-
-
-
-Normally all I/O goes through the kernel so it can schedule the
-operations and prevent processes from stepping on each other. A few
-special user processes are allowed to slide around the kernel, usually by
-being given direct access to I/O ports. X servers (the programs that
-handle other programs' requests to do screen graphics on most Unix boxes)
-are the most common example of this. But we haven't gotten to an X server
-yet; you're looking at a shell prompt on a character console.
-
-
-
-The shell is just a user process, and not a particularly special one.
-It waits on your keystrokes, listening (through the kernel) to the keyboard
-I/O port. As the kernel sees them, it echoes them to your screen. When
-the kernel sees an `Enter' it passes your line of text to the shell. The
-shell tries to interpret those keystrokes as commands.
-
-
-
-Let's say you type `ls' and Enter to invoke the Unix directory
-lister. The shell applies its built-in rules to figure out that you want to
-run the executable command in the file `/bin/ls'. It makes a system call
-asking the kernel to start /bin/ls as a new ''child
-process'' and give it access to the screen and keyboard through
-the kernel. Then the shell goes to sleep, waiting for ls to finish.
-
-
-
-When __/bin/ls__ is done, it tells the kernel it's
-finished by issuing an ''exit'' system call. The kernel
-then wakes up the shell and tells it it can continue running. The shell
-issues another prompt and waits for another line of input.
-
-
-
-Other things may be going on while your `ls' is executing, however
-(we'll have to suppose that you're listing a very long directory). You
-might switch to another virtual console, log in there, and start a game of
-Quake, for example. Or, suppose you're hooked up to the Internet. Your
-machine might be sending or receiving mail while __/bin/ls__
-runs.
-
-----
-!!!6. How do input devices and interrupts work?
-
-Your keyboard is a very simple input device; simple because it
-generates small amounts of data very slowly (by a computer's standards).
-When you press or release a key, that event is signalled up the keyboard
-cable to raise a ''hardware
-interrupt''.
-
-
-
-It's the operating system's job to watch for such interrupts. For
-each possible kind of interrupt, there will be an ''interrupt
-handler'', a part of the operating system that stashes
-away any data associated with them (like your keypress/keyrelease value)
-until it can be processed.
-
-
-
-What the interrupt handler for your keyboard actually does is post the
-key value into a system area near the bottom of memory. There, it will
-be available for inspection when the operating system passes control to
-whichever program is currently supposed to be reading from the keyboard.
-
-
-
-More complex input devices like disk or network cards work in a similar
-way. Earlier, I referred to a disk controller using the bus to signal that
-a disk request has been fulfilled. What actually happens is that the disk
-raises an interrupt. The disk interrupt handler then copies the retrieved
-data into memory, for later use by the program that made the request.
-
-
-
-Every kind of interrupt has an associated ''priority
-level''.
-Lower-priority interrupts (like keyboard events) have to wait on
-higher-priority interrupts (like clock ticks or disk events). Unix is
-designed to give high priority to the kinds of events that need to be
-processed rapidly in order to keep the machine's response smooth.
-
-
-
-In your operating system's boot-time messages, you may see references
-to ''IRQ''
-numbers. You may be aware that one of the common ways to misconfigure
-hardware is to have two different devices try to use the same IRQ, without
-understanding exactly why.
-
-
-
-Here's the answer. IRQ is short for "Interrupt Request". The operating
-system needs to know at startup time which numbered interrupts each
-hardware device will use, so it can associate the proper handlers with each
-one. If two different devices try use the same IRQ, interrupts will
-sometimes get dispatched to the wrong handler. This will usually at least
-lock up the device, and can sometimes confuse the OS badly enough that it
-will flake out or crash.
-
-----
-!!!7. How does my computer do several things at once?
-
-It doesn't, actually. Computers can only do one task (or
-''process'') at a time. But a computer can change tasks
-very rapidly, and fool slow human beings into thinking it's doing several
-things at once. This is called
-''timesharing''.
-
-
-
-One of the kernel's jobs is to manage timesharing. It has a part
-called the
-''scheduler''
-which keeps information inside itself about all the other (non-kernel)
-processes in your zoo. Every 1/60th of a second, a timer goes off in the
-kernel, generating a clock interrupt. The scheduler stops whatever process
-is currently running, suspends it in place, and hands control to another
-process.
-
-
-
-1/60th of a second may not sound like a lot of time. But on today's
-microprocessors it's enough to run tens of thousands of machine
-instructions, which can do a great deal of work. So even if you have many
-processes, each one can accomplish quite a bit in each of its
-timeslices.
-
-
-
-In practice, a program may not get its entire timeslice. If an
-interrupt comes in from an I/O device, the kernel effectively stops the
-current task, runs the interrupt handler, and then returns to the current
-task. A storm of high-priority interrupts can squeeze out normal
-processing; this misbehavior is called ''thrashing'' and
-is fortunately very hard to induce under modern Unixes.
-
-
-
-In fact, the speed of programs is only very seldom limited by the
-amount of machine time they can get (there are a few exceptions to this
-rule, such as sound or 3-D graphics generation). Much more often, delays
-are caused when the program has to wait on data from a disk drive or
-network connection.
-
-
-
-An operating system that can routinely support many simultaneous
-processes is called "multitasking". The Unix family of operating systems
-was designed from the ground up for multitasking and is very good at it --
-much more effective than Windows or the Mac OS, which have had multitasking
-bolted into it as an afterthought and do it rather poorly. Efficient,
-reliable multitasking is a large part of what makes Linux superior for
-networking, communications, and Web service.
-
-----
-!!!8. How does my computer keep processes from stepping on each other?
-
-The kernel's scheduler takes care of dividing processes in time.
-Your operating system also has to divide them in space, so that processes
-can't step on each others' working memory. Even if you assume that all
-programs are trying to be cooperative, you don't want a bug in one of them
-to be able to corrupt others. The things your operating system does to
-solve this problem are called ''memory
-management''.
-
-
-
-Each process in your zoo needs its own area of memory, as a place to
-run its code from and keep variables and results in. You can think of this
-set as consisting of a read-only ''code
-segment''
-(containing the process's instructions) and a writeable ''data
-segment''
-(containing all the process's variable storage). The data segment is truly
-unique to each process, but if two processes are running the same code Unix
-automatically arranges for them to share a single code segment as an
-efficiency measure.
-
-----
-!!8.1. Virtual memory: the simple version
-
-Efficiency is important, because memory is expensive. Sometimes you
-don't have enough to hold the entirety of all the programs the machine is
-running, especially if you are using a large program like an X server. To
-get around this, Unix uses a technique called
-''virtual memory''. It doesn't try to hold all the code and data
-for a process in memory. Instead, it keeps around only a relatively small
-''working set''; the rest of the process's state is left in a
-special ''swap space'' area on your hard disk.
-
-
-
-Note that in the past, that "Sometimes" last paragraph ago was
-"Almost always" -- the size of memory was typically small relative to the
-size of running programs, so swapping was frequent. Memory is far less
-expensive nowadays and even low-end machines have quite a lot of it. On
-modern single-user machines with 64MB of memory and up, it's possible to
-run X and a typical mix of jobs without ever swapping after they're
-initially loded into core.
-
-----
-!!8.2. Virtual memory: the detailed version
-
-Actually, the last section oversimplified things a bit. Yes,
-programs see most of your memory as one big flat bank of addresses bigger
-than physical memory, and disk swapping is used to maintain that illusion.
-But your hardware actually has no fewer than five different kinds of memory
-in it, and the differences between them can matter a good deal when
-programs have to be tuned for maximum speed. To really understand what
-goes on in your machine, you should learn how all of them work.
-
-
-
-The five kinds of memory are these: processor registers, internal (or
-on-chip) cache, external (or off-chip) cache, main memory, and disk. And
-the reason there are so many kinds is simple: speed costs money. I have
-listed these kinds of memory in decreasing order of access time and
-increasing order of cost. Register memory is the fastest and most
-expensive and can be random-accessed about a billion times a second, while
-disk is the slowest and cheapest and can do about 100 random accesses a
-second.
-
-
-
-Here's a full list reflecting early-2000 speeds for a typical desktop
-machine. While speed and capacity will go up and prices will drop, you can
-expect these ratios to remain fairly constant -- and it's those ratios that
-shape the memory hierarchy.
-
-
-
-
-
-; Disk:
-
-Size: 13000MB Accesses: 100KB/sec
-
-; Main memory:
-
-Size: 256MB Accesses: 100M/sec
-
-; External cache:
-
-Size: 512KB Accesses: 250M/sec
-
-; Internal Cache:
-
-Size: 32KB Accesses: 500M/sec
-
-; Processor:
-
-Size: 28 bytes Accesses: 1000M/sec
-
-
-
-We can't build everything out of the fastest kinds of memory. It
-would be way too expensive -- and even if it weren't, fast memory is
-volatile. That is, it loses its marbles when the power goes off. Thus,
-computers have to have hard disks or other kinds of non-volatile storage
-that retains data when the power goes off. And there's a huge mismatch
-between the speed of processors and the speed of disks. The middle three
-levels of the memory hierarchy (''internal
-cache'', ''external
-cache'', and main memory) basically exist to bridge
-that gap.
-
-
-
-Linux and other Unixes have a feature called ''virtual
-memory''.
-What this means is that the operating system behaves as though it has much
-more main memory than it actually does. Your actual physical main memory
-behaves like a set of windows or caches on a much larger "virtual" memory
-space, most of which at any given time is actually stored on disk in a
-special zone called the ''swap
-area''. Out of
-sight of user programs, the OS is moving blocks of data (called "pages")
-between memory and disk to maintain this illusion. The end result is that
-your virtual memory is much larger but not too much slower than real
-memory.
-
-
-
-How much slower virtual memory is than physical depends on how well
-the operating system's swapping algorithms match the way your programs use
-virtual memory. Fortunately, memory reads and writes that are close
-together in time also tend to cluster in memory space. This tendency is
-called
-''locality'',
-or more formally ''locality of
-reference'' -- and it's a good thing. If memory
-references jumped around virtual space at random, you'd typically have to
-do a disk read and write for each new reference and virtual memory would be
-as slow as a disk. But because programs do actually exhibit strong
-locality, your operating system can do relatively few swaps per
-reference.
-
-
-
-It's been found by experience that the most effective method for a
-broad class of memory-usage patterns is very simple; it's called LRU or the
-"least recently used" algorithm. The virtual-memory system grabs disk
-blocks into its ''working set'' as it needs them. When it runs out of physical
-memory for the working set, it dumps the least-recently-used block. All
-Unixes, and most other virtual-memory operating systems, use minor
-variations on LRU.
-
-
-
-Virtual memory is the first link in the bridge between disk and
-processor speeds. It's explicitly managed by the OS. But there is still a
-major gap between the speed of physical main memory and the speed at which
-a processor can access its register memory. The external and internal
-caches address this, using a technique similar to virtual memory as I've
-described it.
-
-
-
-Just as the physical main memory behaves like a set of windows or
-caches on the disk's swap area, the external cache acts as windows on main
-memory. External cache is faster (250M accesses per sec, rather than 100M)
-and smaller. The hardware (specifically, your computer's memory
-controller) does the LRU thing in the external cache on blocks of data
-fetched from the main memory. For historical reasons, the unit of cache
-swapping is called a "line" rather than a page.
-
-
-
-But we're not done. The internal cache gives us the final step-up in
-effective speed by caching portions of the external cache. It is faster
-and smaller yet -- in fact, it lives right on the processor chip.
-
-
-
-If you want to make your programs really fast, it's useful to know
-these details. Your programs get faster when they have stronger locality,
-because that makes the caching work better. The easiest way to make
-programs fast is therefore to make them small. If a program isn't slowed
-down by lots of disk I/O or waits on network events, it will usually run at
-the speed of the smallest cache that it will fit inside.
-
-
-
-If you can't make your whole program small, some effort to tune the
-speed-critical portions so they have stronger locality can pay off.
-Details on techniques for doing such tuning are beyond the scope of this
-tutorial; by the time you need them, you'll be intimate enough with some
-compiler to figure out many of them yourself.
-
-----
-!!8.3. The Memory Management Unit
-
-Even when you have enough physical core to avoid swapping, the part
-of the operating system called the ''memory manager''
-still has important work to do. It has to make sure that programs can only
-alter their own data segments -- that is, prevent erroneous or malicious
-code in one program from garbaging the data in another. To do this, it
-keeps a table of data and code segments. The table is updated whenever a
-process either requests more memory or releases memory (the latter usually
-when it exits).
-
-
-
-This table is used to pass commands to a specialized part of the
-underlying hardware called an
-''MMU'' or
-''memory management unit''. Modern processor chips have MMUs
-built right onto them. The MMU has the special ability to put fences
-around areas of memory, so an out-of-bound reference will be refused and
-cause a special interrupt to be raised.
-
-
-
-If you ever see a Unix message that says "Segmentation fault", "core
-dumped" or something similar, this is exactly what has happened; an attempt
-by the running program to access memory (core) outside its segment has
-raised a fatal interrupt. This indicates a bug in the program code; the
-''core dump'' it leaves behind is diagnostic information
-intended to help a programmer track it down.
-
-
-
-There is another aspect to protecting processes from each other besides
-segregating the memory they access. You also want to be able to control
-their file accesses so a buggy or malicious program can't corrupt critical
-pieces of the system. This is why Unix has file permissions which we'll discuss later.
-
-----
-!!!9. How does my computer store things in memory?
-
-You probably know that everything on a computer is stored as strings of
-bits (binary digits; you can think of them as lots of little on-off
-switches). Here we'll explain how those bits are used to represent the
-letters and numbers that your computer is crunching.
-
-
-
-Before we can go into this, you need to understand about the
-''word size'' of your computer. The word size is the
-computer's preferred size for moving units of information around;
-technically it's the width of your processor's
-''registers'',
-which are the holding areas your processor uses to do arithmetic and
-logical calculations. When people write about computers having bit sizes
-(calling them, say, ``32-bit'' or ``64-bit'' computers), this is what they
-mean.
-
-
-
-Most computers (including 386, 486, and Pentium PCs) have a word
-size of 32 bits. The old 286 machines had a word size of 16. Old-style
-mainframes often had 36-bit words. A few processors (like the Alpha from
-what used to be DEC and is now Compaq) have 64-bit words. The 64-bit word
-will become more common over the next five years; Intel is planning to
-replace the Pentium series with a 64-bit chip called the `Itanium'.
-
-
-
-The computer views your memory as a sequence of words numbered from
-zero up to some large value dependent on your memory size. That value is
-limited by your word size, which is why programs on older machines like
-286s had to go through painful contortions to address large amounts of
-memory. I won't describe them here; they still give older programmers
-nightmares.
-
-----
-!!9.1. Numbers
-
-Integer numbers are represented as either words or pairs of words,
-depending on your processor's word size. One 32-bit machine word is the
-most common integer representation.
-
-
-
-Integer arithmetic is close to but not actually mathematical
-base-two. The low-order bit is 1, next 2, then 4 and so forth as in pure
-binary. But signed numbers are represented in
-''twos-complement''
-notation. The highest-order bit is a ''sign
-bit'' which
-makes the quantity negative, and every negative number can be obtained from
-the corresponding positive value by inverting all the bits and adding one.
-This is why integers on a 32-bit machine have the range -2^31 to 2^31 - 1
-1 (where ^ is the `power' operation, 2^3 = 8). That 32nd bit is being used
-for sign.
-
-
-
-Some computer languages give you access to ''unsigned
-arithmetic'' which is straight base 2 with zero and
-positive numbers only.
-
-
-
-Most processors and some languages can do operations in
-''floating-point''
-numbers (this capability is built into all recent processor chips).
-Floating-point numbers give you a much wider range of values than integers
-and let you express fractions. The ways in which this is done vary and are
-rather too complicated to discuss in detail here, but the general idea is
-much like so-called `scientific notation', where one might write (say)
-1.234 * 10^23; the encoding of the number is split into a
-''mantissa''
-(1.234) and the exponent part (23) for the power-of-ten multiplier (which
-means the number multiplied out would have 20 zeros on it, 23 minus the
-three decimal places).
-
-----
-!!9.2. Characters
-
-Characters are normally represented as strings of seven bits each in
-an encoding called ASCII (American Standard Code for Information
-Interchange). On modern machines, each of the 128 ASCII characters is the
-low seven bits of an
-''octet''
-or 8-bit byte; octets are packed into memory words so that (for example) a
-six-character string only takes up two memory words. For an ASCII code
-chart, type `man 7 ascii' at your Unix prompt.
-
-
-
-The preceding paragraph was misleading in two ways. The minor one is
-that the term `octet' is formally correct but seldom actually used; most
-people refer to an octet as
-''byte'' and
-expect bytes to be eight bits long. Strictly speaking, the term `byte' is
-more general; there used to be, for example, 36-bit machines with 9-bit
-bytes (though there probably never will be again).
-
-
-
-The major one is that not all the world uses ASCII. In fact, much of
-the world can't -- ASCII, while fine for American English, lacks many
-accented and other special characters needed by users of other languages.
-Even British English has trouble with the lack of a pound-currency
-sign.
-
-
-
-There have been several attempts to fix this problem. All use the extra
-high bit that ASCII doesn't, making it the low half of a 256-character set.
-The most widely-used of these is the so-called `Latin-1' character set
-(more formally called ISO 8859-1). This is the default character set for
-Linux, HTML, and X. Microsoft Windows uses a mutant version of Latin-1
-that adds a bunch of characters such as right and left double quotes in
-places proper Latin-1 leaves unassigned for historical reasons (for a
-scathing account of the trouble this causes, see the demoroniser
-page).
-
-
-
-Latin-1 handles western European languages, including English,
-French, German, Spanish, Italian, Dutch, Norwegian, Swedish, Danish.
-However, this isn't good enough either, and as a result there is a whole
-series of Latin-2 through -9 character sets to handle things like Greek,
-Arabic, Hebrew, Esperanto, and Serbo-Croatian. For details, see the ISO alphabet soup page.
-
-
-
-The ultimate solution is a huge standard called Unicode (and its
-identical twin ISO/IEC 10646-1:1993). Unicode is identical to Latin-1 in
-its lowest 256 slots. Above these in 16-bit space it includes Greek,
-Cyrillic, Armenian, Hebrew, Arabic, Devanagari, Bengali, Gurmukhi,
-Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Thai, Lao, Georgian,
-Tibetan, Japanese Kana, the complete set of modern Korean Hangul, and a
-unified set of Chinese/Japanese/Korean (CJK) ideographs. For details, see
-the Unicode Home Page.
-
-----
-!!!10. How does my computer store things on disk?
-
-When you look at a hard disk under Unix, you see a tree of named
-directories and files. Normally you won't need to look any deeper than
-that, but it does become useful to know what's going on underneath if you
-have a disk crash and need to try to salvage files. Unfortunately, there's
-no good way to describe disk organization from the file level downwards, so
-I'll have to describe it from the hardware up.
-
-----
-!!10.1. Low-level disk and file system structure
-
-The surface area of your disk, where it stores data, is divided up
-something like a dartboard -- into circular tracks which are then
-pie-sliced into sectors. Because tracks near the outer edge have more area
-than those close to the spindle at the center of the disk, the outer tracks
-have more sector slices in them than the inner ones. Each sector (or
-''disk block'') has the same size, which under modern Unixes
-is generally 1 binary K (1024 8-bit words). Each disk block has a unique
-address or ''disk block number''.
-
-
-
-Unix divides the disk into ''disk
-partitions''. Each partition is a continuous span of
-blocks that's used separately from any other partition, either as a file
-system or as swap space. The original reasons for partitions had to do
-with crash recovery in a world of much slower and more error-prone disks;
-the boundaries between them reduce the fraction of your disk likely to
-become inaccessible or corrupted by a random bad spot on the disk.
-Nowadays, it's more important that partitions can be declared read-only
-(preventing an intruder from modifying critical system files) or shared
-over a network through various means we won't discuss here. The
-lowest-numbered partition on a disk is often treated specially, as a
-''boot partition'' where you can put a kernel to be
-booted.
-
-
-
-Each partition is either ''swap
-space'' (used
-to implement virtual memory) or a ''file system'' used to hold files. Swap-space partitions are
-just treated as a linear sequence of blocks. File systems, on the other
-hand, need a way to map file names to sequences of disk blocks. Because
-files grow, shrink, and change over time, a file's data blocks will not be
-a linear sequence but may be scattered all over its partition (from
-wherever the operating system can find a free block when it needs
-one). This scattering effect is called
-''fragmentation''.
-
-----
-!!10.2. File names and directories
-
-Within each file system, the mapping from names to blocks is handled
-through a structure called an
-''i-node''.
-There's a pool of these things near the ``bottom'' (lowest-numbered blocks)
-of each file system (the very lowest ones are used for housekeeping and
-labeling purposes we won't describe here). Each i-node describes one file.
-File data blocks (including directories) live above the i-nodes (in
-higher-numbered blocks).
-
-
-
-Every i-node contains a list of the disk block numbers in the file it
-describes. (Actually this is a half-truth, only correct for small files,
-but the rest of the details aren't important here.) Note that the i-node
-does ''not'' contain the name of the file.
-
-
-
-Names of files live in ''directory
-structures''. A directory structure just maps names to
-i-node numbers. This is why, in Unix, a file can have multiple true names
-(or ''hard links''); they're just multiple directory entries that
-happen to point to the same i-node.
-
-----
-!!10.3. Mount points
-
-In the simplest case, your entire Unix file system lives in just one
-disk partition. While you'll see this arrangement on some small personal
-Unix systems, it's unusual. More typical is for it to be spread across
-several disk partitions, possibly on different physical disks. So, for
-example, your system may have one small partition where the kernel lives, a
-slightly larger one where OS utilities live, and a much bigger one where
-user home directories live.
-
-
-
-The only partition you'll have access to immediately after system
-boot is your ''root partition'',
-which is (almost always) the one you booted from. It holds the root
-directory of the file system, the top node from which everything else
-hangs.
-
-
-
-The other partitions in the system have to be attached to this root
-in order for your entire, multiple-partition file system to be accessible.
-About midway through the boot process, your Unix will make these non-root
-partitions accessible. It will
-''mount''
-each one onto a directory on the root partition.
-
-
-
-For example, if you have a Unix directory called `/usr', it is probably
-a mount point to a partition that contains many programs installed with
-your Unix but not required during initial boot.
-
-----
-!!10.4. How a file gets looked up
-
-Now we can look at the file system from the top down. When you open
-a file (such as, say,
-/home/esr/WWW/ldp/fundamentals.sgml) here is what
-happens:
-
-
-
-Your kernel starts at the root of your Unix file system (in the root
-partition). It looks for a directory there called `home'. Usually `home'
-is a mount point to a large user partition elsewhere, so it will go there.
-In the top-level directory structure of that user partition, it will look
-for a entry called `esr' and extract an i-node number. It will go to that
-i-node, notice that its associated file data blocks are a directory
-structure, and look up `WWW'. Extracting ''that'' i-node,
-it will go to the corresponding subdirectory and look up `ldp'. That will
-take it to yet another directory i-node. Opening that one, it will find an
-i-node number for `fundamentals.sgml'. That i-node is not a directory, but
-instead holds the list of disk blocks associated with the file.
-
-----
-!!10.5. File ownership, permissions and security
-
-To keep programs from accidentally or
-maliciously stepping on data they shouldn't, Unix has
-''permission''
-features. These were originally designed to support timesharing by
-protecting multiple users on the same machine from each other, back in the
-days when Unix ran mainly on expensive shared minicomputers.
-
-
-
-In order to understand file permissions, you need to recall the
-description of users and groups in the section What happens when you log in?. Each file has an owning user and an
-owning group. These are initially those of the file's creator; they can be
-changed with the programs
-chown(1) and
-chgrp(1).
-
-
-
-The basic permissions that can be associated with a file are `read'
-(permission to read data from it), `write' (permission to modify it) and
-`execute' (permission to run it as a program). Each file has three sets of
-permissions; one for its owning user, one for any user in its owning group,
-and one for everyone else. The `privileges' you get when you log in are
-just the ability to do read, write, and execute on those files for which
-the permission bits match your user ID or one of the groups you are
-in, or files that have been made accessible to the world.
-
-
-
-To see how these may interact and how Unix displays them, let's look
-at some file listings on a hypothetical Unix system. Here's one:
-
-
-snark:~$ ls -l notes
--rw-r--r-- 1 esr users 2993 Jun 17 11:00 notes
-
-This is an ordinary data file. The listing tells us that it's
-owned by the user `esr' and was created with the owning group `users'.
-Probably the machine we're on puts every ordinary user in this group by
-default; other groups you commonly see on timesharing machines are `staff',
-`admin', or `wheel' (for obvious reasons, groups are not very important
-on single-user workstations or PCs). Your Unix may use a different default
-group, perhaps one named after your user ID.
-
-
-
-The string `-rw-r--r--' represents the permission bits for the file. The
-very first dash is the position for the directory bit; it would show `d' if
-the file were a directory. After that, the first three places are user
-permissions, the second three group permissions, and the third are
-permissions for others (often called `world' permissions). On this file,
-the owning user `esr' may read or write the file, other people in the
-`users' group may read it, and everybody else in the world may read it.
-This is a pretty typical set of permissions for an ordinary data file.
-
-
-
-Now let's look at a file with very different permissions. This file
-is GCC, the GNU C compiler.
-
-
-snark:~$ ls -l /usr/bin/gcc
--rwxr-xr-x 3 root bin 64796 Mar 21 16:41 /usr/bin/gcc
-
-This file belongs to a user called `root' and a group called `bin';
-it can be written (modified) only by root, but read or executed by anyone.
-This is a typical ownership and set of permissions for a pre-installed
-system command. The `bin' group exists on some Unixes to group together
-system commands (the name is a historical relic, short for `binary'). Your
-Unix might use a `root' group instead (not quite the same as the `root'
-user!).
-
-
-
-The `root' user is the conventional name for numeric user ID , a
-special, privileged account that can override all privileges. Root access
-is useful but dangerous; a typing mistake while you're logged in as root
-can clobber critical system files that the same command executed from an
-ordinary user account could not touch.
-
-
-
-Because the root account is so powerful, access to it should be guarded
-very carefully. Your root password is the single most critical piece of
-security information on your system, and it is what any crackers and
-intruders who ever come after you will be trying to get.
-
-
-
-About passwords: Don't write them down -- and don't pick a passwords
-that can easily be guessed, like the first name of your
-girlfriend/boyfriend/spouse. This is an astonishingly common bad practice
-that helps crackers no end. In general, don't pick any word in the
-dictionary; there are programs called ''dictionary
-crackers'' that look for likely passwords by running through word
-lists of common choices. A good technique is to pick a combination
-consisting of a word, a digit, and another word, such as `shark6cider' or
-`jump3joy'; that will make the search space too large for a dictionary
-cracker. Don't use these examples, though -- crackers might expect that
-after reading this document and put them in their dictionaries.
-
-
-
-Now let's look at a third case:
-
-
-snark:~$ ls -ld ~
-drwxr-xr-x 89 esr users 9216 Jun 27 11:29 /home2/esr
-snark:~$
-
-This file is a directory (note the `d' in the first permissions
-slot). We see that it can be written only by esr, but read and executed by
-anybody else.
-
-
-
-Read permission gives you the ability to list the directory -- that
-is, to see the names of files and directories it contains. Write permission
-gives you the ability to create and delete files in the directory. If you
-remember that the directory includes a list of the names of the files and
-subdirectories it contains, these rules will make sense.
-
-
-
-Execute permission on a directory means you can get through the
-directory to open the files and directories below it. In effect, it gives
-you permission to access the i-nodes in the directory. A directory with
-execute completely turned off would be useless.
-
-
-
-Occasionally you'll see a directory that is world-executable but not
-world-readable; this means a random user can get to files and directories
-beneath it, but only by knowing their exact names (the directory cannot be
-listed).
-
-
-
-It's important to remember that read, write, or execute permission on a
-directory is independent of the permissions on the files and directories
-beneath. In particular, write access on a directory means you can
-create new files or delete existing files there, but does not
-automatically give you write access to existing files.
-
-
-
-Finally, let's look at the permissions of the login program itself.
-
-
-snark:~$ ls -l /bin/login
--rwsr-xr-x 1 root bin 20164 Apr 17 12:57 /bin/login
-
-This has the permissions we'd expect for a system command -- except
-for that 's' where the owner-execute bit ought to be. This is the visible
-manifestation of a special permission called the `set-user-id' or
-''setuid bit''.
-
-
-
-The setuid bit is normally attached to programs that need to give
-ordinary users the privileges of root, but in a controlled way. When it is
-set on an executable program, you get the privileges of the owner of that
-program file while the program is running on your behalf, whether or not
-they match your own.
-
-
-
-Like the root account itself, setuid programs are useful but
-dangerous. Anyone who can subvert or modify a setuid program owned by root
-can use it to spawn a shell with root privileges. For this reason, opening
-a file to write it automatically turns off its setuid bit on most Unixes.
-Many attacks on Unix security try to exploit bugs in setuid programs in
-order to subvert them. Security-conscious system administrators are
-therefore extra-careful about these programs and reluctant to install new
-ones.
-
-
-
-There are a couple of important details we glossed over when
-discussing permissions above; namely, how the owning group and permissions
-are assigned when a file or directory is first created. The group is an
-issue because users can be members of multiple groups, but one of them
-(specified in the user's /etc/passwd entry) is the
-user's ''default group'' and will normally own files created by the
-user.
-
-
-
-The story with initial permission bits is a little more complicated.
-A program that creates a file will normally specify the permissions it is
-to start with. But these will be modified by a variable in the user's
-environment called the
-''umask''.
-The umask specifies which permission bits to ''turn off''
-when creating a file; the most common value, and the default on most
-systems, is -------w- or 002, which turns off the world-write bit. See the
-documentation of the umask command on your shell's manual page for
-details.
-
-
-
-Initial directory group is also a bit complicated. On some Unixes a new
-directory gets the default group of the creating user (this in the System V
-convention); on others, it gets the owning group of the parent directory
-in which it's created (this is the BSD convention). On some modern Unixes,
-including Linux, the latter behavior can be selected by setting the
-set-group-ID on the directory (chmod g+s).
-
-----
-!!10.6. How things can go wrong
-
-Earlier it was hinted that file systems can be fragile things.
-Now we know that to get to a file you have to hopscotch through what may be
-an arbitrarily long chain of directory and i-node references. Now suppose
-your hard disk develops a bad spot?
-
-
-
-If you're lucky, it will only trash some file data. If you're
-unlucky, it could corrupt a directory structure or i-node number and leave
-an entire subtree of your system hanging in limbo -- or, worse, result in a
-corrupted structure that points multiple ways at the same disk block or
-i-node. Such corruption can be spread by normal file operations, trashing
-data that was not in the original bad spot.
-
-
-
-Fortunately, this kind of contingency has become quite uncommon as disk
-hardware has become more reliable. Still, it means that your Unix will
-want to integrity-check the file system periodically to make sure nothing
-is amiss. Modern Unixes do a fast integrity check on each partition at
-boot time, just before mounting it. Every few reboots they'll do a much
-more thorough check that takes a few minutes longer.
-
-
-
-If all of this sounds like Unix is terribly complex and
-failure-prone, it may be reassuring to know that these boot-time checks
-typically catch and correct normal problems ''before''
-they become really disastrous. Other operating systems don't have these
-facilities, which speeds up booting a bit but can leave you much more
-seriously screwed when attempting to recover by hand (and that's assuming
-you have a copy of Norton Utilities or whatever in the first
-place...).
-
-
-
-One of the trends in current Unix designs is ''journalling
-file systems''. These arrange traffic to the disk so that
-it's guaranteed to be in a consistent state that can be recovered when the
-system comes back up. This will speed up the boot-time integrity check a
-lot.
-
-----
-!!!11. How do computer languages work?
-
-We've already discussed how programs
-are run. Every program ultimately has to execute as a stream of
-bytes that are instructions in your computer's ''machine
-language''. But human beings don't deal with machine
-language very well; doing so has become a rare, black art even among
-hackers.
-
-
-
-Almost all Unix code except a small amount of direct
-hardware-interface support in the kernel itself is nowadays written in a
-''high-level language''. (The
-`high-level' in this term is a historical relic meant to distinguish these
-from `low-level' ''assembler
-languages'', which are basically thin wrappers around
-machine code.)
-
-
-
-There are several different kinds of high-level languages. In order
-to talk about these, you'll find it useful to bear in mind that the
-''source code'' of a program (the
-human-created, editable version) has to go through some kind of translation
-into machine code that the machine can actually run.
-
-----
-!!11.1. Compiled languages
-
-The most conventional kind of language is a ''compiled
-language''. Compiled languages get translated into
-runnable files of binary machine code by a special program called
-(logically enough) a
-''compiler''.
-Once the binary has been generated, you can run it directly without looking
-at the source code again. (Most software is delivered as compiled binaries
-made from code you don't see.)
-
-
-
-Compiled languages tend to give excellent performance and have the most
-complete access to the OS, but also to be difficult to program in.
-
-
-
-C, the language in which Unix itself is written, is by far the most
-important of these (with its variant C++). FORTRAN is another compiled
-language still used among engineers and scientists but years older and much
-more primitive. In the Unix world no other compiled languages are in
-mainstream use. Outside it, COBOL is very widely used for financial and
-business software.
-
-
-
-There used to be many other compiler languages, but most of them have
-either gone extinct or are strictly research tools. If you are a new
-Unix developer using a compiled language, it is overwhelmingly likely
-to be C or C++.
-
-----
-!!11.2. Interpreted languages
-
-An ''interpreted
-language'' depends on an interpreter program that reads
-the source code and translates it on the fly into computations and system
-calls. The source has to be re-interpreted (and the interpreter present)
-each time the code is executed.
-
-
-
-Interpreted languages tend to be slower than compiled languages, and
-often have limited access to the underlying operating system and hardware.
-On the other hand, they tend to be easier to program and more forgiving of
-coding errors than compiled languages.
-
-
-
-Many Unix utilities, including the shell and bc(1) and sed(1) and awk(1),
-are effectively small interpreted languages. BASICs are usually
-interpreted. So is Tcl. Historically, the most important interpretive
-language has been LISP (a major improvement over most of its successors).
-Today, Unix shells and the Lisp that lives inside the Emacs editor are
-probably the most important pure interpreted languages.
-
-----
-!!11.3. P-code languages
-
-Since 1990 a kind of hybrid language that uses both compilation and
-interpretation has become increasingly important. P-code languages are
-like compiled languages in that the source is translated to a compact
-binary form which is what you actually execute, but that form is not
-machine code. Instead it's
-''pseudocode''
-(or
-''p-code''),
-which is usually a lot simpler but more powerful than a real machine
-language. When you run the program, you interpret the p-code.
-
-
-
-P-code can run nearly as fast as a compiled binary (p-code interpreters
-can be made quite simple, small and speedy). But p-code languages can keep
-the flexibility and power of a good interpreter.
-
-
-
-Important p-code languages include Python, Perl, and Java.
-
-----
-!!!12. How does the Internet work?
-
-To help you understand how the Internet works, we'll look at the things
-that happen when you do a typical Internet operation -- pointing a browser
-at the front page of this document at its home on the Web at the Linux
-Documentation Project. This document is
-
-
-ttp://www.linuxdoc.org/HOWTO/Unix-and-Internet-Fundamentals-HOWTO/index.html
-
-which means it lives in the file
-LDP/HOWTO/Unix-and-Internet-Fundamentals-HOWTO/index.html under the World Wide Web
-export directory of the host www.linuxdoc.org.
-
-----
-!!12.1. Names and locations
-
-The first thing your browser has to do is to establish a network
-connection to the machine where the document lives. To do that, it first
-has to find the network location of the
-''host''
-www.linuxdoc.org (`host' is short for `host machine' or `network host';
-www.linuxdoc.org is a typical
-''hostname'').
-The corresponding location is actually a number called an ''IP
-address''
-(we'll explain the `IP' part of this term later).
-
-
-
-To do this, your browser queries a program called a
-''name server''. The name server
-may live on your machine, but it's more likely to run on a service machine
-that yours talks to. When you sign up with an ISP, part of your setup
-procedure will almost certainly involve telling your Internet software the
-IP address of a nameserver on the ISP's network.
-
-
-
-The name servers on different machines talk to each other, exchanging
-and keeping up to date all the information needed to resolve hostnames (map
-them to IP addresses). Your nameserver may query three or four different
-sites across the network in the process of resolving www.linuxdoc.org, but
-this usually happens very quickly (as in less than a second). We'll look
-at how nameservers detail in the next section.
-
-
-
-The nameserver will tell your browser that www.linuxdoc.org's IP
-address is 152.19.254.81; knowing this, your machine will be able to
-exchange bits with www.linuxdoc.org directly.
-
-----
-!!12.2. The Domain Name System
-
-The whole network of programs and databases that cooperates to
-translate hostnames to IP addresses is called `DNS' (Domain Name System).
-When you see references to a `DNS server', that means what we just called
-a nameserver. Now I'll explain how the overall system works.
-
-
-
-Internet hostnames are composed of parts separated by dots. A
-''domain'' is a collection of machines that share a common name suffix.
-Domains can live inside other domains. For example, the machine
-www.linuxdoc.org lives in the .linuxdoc.org subdomain of the .org
-domain.
-
-
-
-Each domain is defined by an ''authoritative name
-server'' that knows the IP addresses of the other machines in the
-domain. The authoritative (or `primary') name server may have backups in
-case it goes down; if you see references to a ''secondary name
-server'' or (`secondary DNS') it's talking about one of those. These
-secondaries typically refresh their information from their primaries every
-few hours, so a change made to the hostname-to-IP mapping on the primary
-will automatically be propagated.
-
-
-
-Now here's the important part. The nameservers for a domain do
-''not'' have to know the locations of all the machines in
-other domains (including their own subdomains); they only have to know the
-location of the nameservers. In our example, the authoritative name server
-for the .org domain knows the IP address of the nameserver for .linuxdoc.org,
-but ''not'' the address of all the other machines in
-linuxdoc.org.
-
-
-
-The domains in the DNS system are arranged like a big inverted tree.
-At the top are the root servers. Everybody knows the IP addresses of the
-root servers; they're wired into your DNS software.
-The root servers know the IP addresses of the nameservers for the
-top-level domains like .com and .org, but not the addresses of machines
-inside those domains. Each top-level domain server knows where the
-nameservers for the domains directly beneath it are, and so forth.
-
-
-
-DNS is carefully designed so that each machine can get away with the
-minimum amount of knowledge it needs to have about the shape of the tree,
-and local changes to subtrees can be made simply by changing one
-authoritative server's database of name-to-IP-address mappings.
-
-
-
-When you query for the IP address of www.linuxdoc.org, what actually
-happens is this: First, your nameserver asks a root server to tell it
-where it can find a nameserver for .org. Once it knows that, it then asks
-the .org server to tell it the IP address of a .linuxdoc.org nameserver.
-Once it has that, it asks the .linuxdoc.org nameserver to tell it the
-address of the host www.linuxdoc.org.
-
-
-
-Most of the time, your nameserver doesn't actually have to work that
-hard. Nameservers do a lot of cacheing; when yours resolves a hostname, it
-keeps the association with the resulting IP address around in memory for a
-while. This is why, when you surf to a new website, you'll usually only
-see a message from your browser about "Looking up" the host for the first
-page you fetch. Eventually the name-to-address mapping expires and your
-DNS has to re-query - this is important so you don't have invalid
-information hanging around forever when a hostname changes addresses. Your
-cached IP address for a site is also thrown out if the host is
-unreachable.
-
-----
-!!12.3. Packets and routers
-
-What the browser wants to do is send a command to the Web server on
-www.linuxdoc.org that looks like this:
-
-
-GET /LDP/HOWTO/Fundamentals.html HTTP/1.
-
-Here's how that happens. The command is made into a
-''packet'',
-a block of bits like a telegram that is wrapped with three important
-things; the ''source address'' (the IP address of your machine), the
-''destination address'' (152.19.254.81), and a ''service
-number''
-or ''port number'' (80, in this case) that indicates that it's a
-World Wide Web request.
-
-
-
-Your machine then ships the packet down the wire (your connection to
-your ISP, or local network) until it gets to a specialized machine called a
-''router''.
-The router has a map of the Internet in its memory -- not always a complete
-one, but one that completely describes your network neighborhood and knows
-how to get to the routers for other neighborhoods on the Internet.
-
-
-
-Your packet may pass through several routers on the way to its
-destination. Routers are smart. They watch how long it takes for other
-routers to acknowledge having received a packet. They also use that
-information to direct traffic over fast links. They use it to notice when
-another routers (or a cable) have dropped off the network, and compensate
-if possible by finding another route.
-
-
-
-There's an urban legend that the Internet was designed to survive
-nuclear war. This is not true, but the Internet's design is extremely good
-at getting reliable performance out of flaky hardware in an uncertain
-world. This is directly due to the fact that its intelligence is
-distributed through thousands of routers rather than concentrated in a few
-massive and vulnerable switches (like the phone network). This means that
-failures tend to be well localized and the network can route around
-them.
-
-
-
-Once your packet gets to its destination machine, that machine uses the
-service number to feed the packet to the web server. The web server can
-tell where to reply to by looking at the command packet's source IP
-address. When the web server returns this document, it will be broken up
-into a number of packets. The size of the packets will vary according to
-the transmission media in the network and the type of service.
-
-----
-!!12.4. TCP and IP
-
-To understand how multiple-packet transmissions are handled, you need to
-know that the Internet actually uses two protocols, stacked one on top
-of the other.
-
-
-
-The lower level,
-''IP''
-(Internet Protocol), is responsible for labeling
-individual packets with the source address and destination address of two
-computers exchanging information over a network.
-For example, when you access http://www.linuxdoc.org, the packets you send
-will have your computer's IP address, such as 192.168.1.101, and the IP
-address of the www.linuxdoc.org computer, 152.2.210.81. These addresses
-work in much the same way that your home address works when someone sends
-you a letter. The post office can read the address and determine where
-you are and how best to route the letter to you, much like a router does
-for Internet traffic.
-
-
-
-The upper level,
-''TCP''
-(Transmission Control Protocol), gives you reliability. When two machines
-negotiate a TCP connection (which they do using IP), the receiver knows to
-send acknowledgements of the packets it sees back to the sender. If the
-sender doesn't see an acknowledgement for a packet within some timeout
-period, it resends that packet. Furthermore, the sender gives each TCP
-packet a sequence number, which the receiver can use you reassemble packets
-in case they show up out of order. (This can easily happen if network
-links go up or down during a connection.)
-
-
-
-TCP/IP packets also contain a checksum to enable detection of data
-corrupted by bad links. (The checksum is computed from the rest of the
-packet in such a way that if the either the rest of the packet or the
-checksum is corrupted, redoing the computation and comparing is very likely
-to indicate an error.) So, from the point of view of anyone using TCP/IP
-and nameservers, it looks like a reliable way to pass streams of bytes
-between hostname/service-number pairs. People who write network protocols
-almost never have to think about all the packetizing, packet reassembly,
-error checking, checksumming, and retransmission that goes on below that
-level.
-
-----
-!!12.5. HTTP, an application protocol
-
-Now let's get back to our example. Web browsers and servers speak an
-''application protocol'' that runs on top of TCP/IP, using it simply
-as a way to pass strings of bytes back and forth. This protocol is called
-''HTTP''
-(Hyper-Text Transfer Protocol) and we've already seen one command in it --
-the GET shown above.
-
-
-
-When the GET command goes to www.linuxdoc.org's webserver with service
-number 80, it will be dispatched to a ''server
-daemon'' listening on port 80. Most Internet services
-are implemented by server daemons that do nothing but wait on ports,
-watching for and executing incoming commands.
-
-
-
-If the design of the Internet has one overall rule, it's that all the
-parts should be as simple and human-accessible as possible. HTTP, and its
-relatives (like the Simple Mail Transfer Protocol,
-''SMTP'',
-that is used to move electronic mail between hosts) tend to use simple
-printable-text commands that end with a carriage-return/line feed.
-
-
-
-This is marginally inefficient; in some circumstances you could get more
-speed by using a tightly-coded binary protocol. But experience has shown
-that the benefits of having commands be easy for human beings to describe
-and understand outweigh any marginal gain in efficiency that you might get
-at the cost of making things tricky and opaque.
-
-
-
-Therefore, what the server daemon ships back to you via TCP/IP is also
-text. The beginning of the response will look something like this (a few
-headers have been suppressed):
-
-
-HTTP/1.1 200 OK
-Date: Sat, 10 Oct 1998 18:43:35 GMT
-Server: Apache/1.2.6 Red Hat
-Last-Modified: Thu, 27 Aug 1998 17:55:15 GMT
-Content-Length: 2982
-Content-Type: text/html
-
-These headers will be followed by a blank line and the text of the
-web page (after which the connection is dropped). Your browser just
-displays that page. The headers tell it how (in particular, the
-Content-Type header tells it the returned data is really HTML).
-
-----
-!!!13. To Learn More
-
-There is a Reading List
-HOWTO that lists books you can read to learn more about the
-topics we have touched on here. You might also want to read the
-How To Become A
-Hacker document
.
+Describe [HowToUnixandInternetFundamentalsHOWTO]
here.