Penguin
Blame: NetworkTroubleshooting
EditPageHistoryDiffInfoLikePages
Annotated edit history of NetworkTroubleshooting version 4, including all changes. View license author blame.
Rev Author # Line
1 JimSansing 1 !!Key points
2
3 * The main point to remember about networks is that they are serial transmission media. That means that data flows one bit at a time from point A to point B. So, to troubleshoot a network problem, start at point A and check that everything is working on the way to point B.
4
5 * Modern networks have functions to help people use them, such as domain name servers, which translate between network addresses and human readable host names. But when troubleshooting network problems, these servers may be inaccessible. So use the most basic method of communication for testing access and collecting status information. The commands to find the network addresses that will be needed are in the procedures below.
6
7 * Networks may be slow, or services unavailable. There is usually a timeout set for networks commands, which may have to run to completion before a program can continue. So, patience is required for troubleshooting network problems.
8
4 IanMcDonald 9 * Network protocols expect things to go wrong, so retries are built in most of the time. But during troubleshooting, this is not always a good thing. If possible, override the retry value with a count parameter. Otherwise, it may be necessary to interrupt a program that is not responding.
1 JimSansing 10
11 !!Parts list
12
13 * Application software: The local host is point A. However, there are many points of failure before data can leave the local host.
14 * Files:
15 - /home/user_id/application_path/application_config_files
16
17 * Email client: Email clients are usually configured to automatically poll the email server for new mail. If the connection is lost, this causes error messages, which might need to be responded to before the program can continue. If it happens while a message is being transferred, then the client might get hung and have to be forced to end.
18 * Web browser: Web browsers are most susceptible to domain name server problems. Each computer is assigned a local domain name server, but it only keeps track of a few host names. When a web site name is requested that is not in its list, it must query other servers to find it. This may take long enough for the web browser to timeout. Or the domain name server that tracks the requested name may be down. So, web browsers are very poor indicators of network problems.
19 * Games: Online games are similar to email in that a single server is contacted. However, there is a higher volume and more constant flow of data. So, network problems become apparent more quickly. However, because game servers often are further away from the local host than email servers, network congestion is more possible along the client/server path and may cause the game performance to be affected.
20 * Other applications: Audio and video streaming use a protocol that is called "unreliable". This is not as bad as it sounds, but it does mean that occasionally some data can be lost. This is why network radio sometimes sounds so poor.
21
22 * Network stack
23 * Files:
24 - /etc/HOSTNAME
25 - /etc/services
26
27 * API layer: The connection between a client application and the network software is made through a "socket". On the client side, the Application Program Interface to the network software is usually not a problem. However, it can happen that as many connections as are allowed by a single program are in use, so no more connections can be made. This can be checked using the netstat command described below.
4 IanMcDonald 28 * TCP/UDP layer: The Transmission Control Protocol is what the network software uses to make sure that all data sent is received. The User Datagram Protocol is similar, but it is called "unreliable" because the receiving host does not confirm that data was received. Both use network ports to identify a "session" between two hosts. When a session is active, that pair of ports cannot be used between the same two hosts for another session. Active sessions can be displayed using the netstat command described below.
29 * IP layer: The Internet Protocol is what the network software uses to determine how to establish a session between two hosts. It uses network addresses to identify the hosts and a route table to know where to send or "route" the data. The network address of the local host is usually in the /etc/hosts table if a fixed address is used. The route table can be displayed using the netstat command described below.
1 JimSansing 30
31 * Interface and Local Network
32 * Files:
33 - /etc/sysconfig/network
34 - /etc/init.d/network
35
36 * Hardware/Driver: The cable that connects the local host to the network is attached to a piece of hardware that can get data from one end of the cable to the other. There are many such connections between the local host and the server host, but usually problems with those connections must be handled by others. The driver is simply the software that is used to control the hardware.%%% The configuration of the hardware and its driver are found in the file, /etc/sysconfig/network. The driver is started and stopped using the /etc/init.d/network command. The status of the hardware and its driver can be displayed using the ifconfig command described below.
37 - Serial port/PPP: This hardware is the slowest network connection. The cable is connected to the serial port on the local host and to a modem at the other end. The modem is connected to a telephone jack. Often modems are built into the local host, and it is connected directly to the telephone jack. The Point-to-Point Protocol is used to send data between the local host and the Internet Service Provider's (ISP) modem. The ISP then has a faster connection to the network.
38 - NIC card/Ethernet driver: A Network Interface Card uses software for faster networks. Most NICs use the Ethernet protocol, although there are several others available. The Ethernet protocol is used to send data between the local host and a router, which is connected to the network. On home networks, the cable from the NIC is usually connected to a cable modem or a Digital Subscriber Line (DSL) modem, which is connected to a router at the cable television or telephone company office.
39 - Wifi card/Wifi driver: There is no cable connected to a Wifi card. However, the Wifi card communicates with an "access point" which usually has an Ethernet NIC that is connected the same as a local host Ethernet NIC.
4 IanMcDonald 40 - Home networks usually have a router between the local hosts and the cable or DSL modem. It is possible to plug more than one local host into the router, but most networks use a hub or a switch to allow a single router connector or "port" to be used. There may be an application available to display the status of the router. The same may be true for switches. However, hubs usually only display a little bit of status information on the front panel.%%% These devices usually have very little control capability, and are reset by powering them off and then back on. It is advisable to keep them physically close together, and if they are all plugged into a single power strip, then resetting them simultaneously is as simple as turning off and on the main power strip switch.
1 JimSansing 41
42 * Network
43 * Files:
44 - /etc/hosts
45 - /etc/resolv.conf
46
47 * Connection to network: The ISP provides the hardware and software to get access to any host connected to the world wide web, if everything is working correctly. To simplify making connections, domain name servers translate between network addresses and host names. For example, by issuing the ''host'' command, the host names can have their addresses displayed. The address of the ISP's domain name servers is in the file, /etc/resolv.conf. Any hosts defined in the file, /etc/hosts, will be used without requesting them to be translated by the domain name servers. So, it is a good idea to use this table for hosts on the local network, and even a few of the ISP hosts. The names in the /etc/hosts file do not have to match the actual name of the host defined. So instead of using the real domain server name of, ns02.someplace.st.us02.comcast.net, the name dns1 could be used.
4 IanMcDonald 48 * Network path: Under the TCP/IP protocol, data is broken into "packets" of various sizes. The protocol is designed to choose any path that is available for each individual packet. This means that while some packets may be lost, there is usually a way for the entire message to reach its destination. However, sometimes there is a point of failure that it is impossible to bypass. There are several tools, described below, to determine if the path to the server is available.
1 JimSansing 49
50 * Destination host
51 * Files:
52 - /etc/ssh/sshd_config
53 - /etc/hosts.deny
54 - /etc/hosts.allow
55
56 * Hardware: The hardware for servers and their networks is essentially the same as that at the local host, just more of it. This means complexity, which means more chances for errors.
57 * Network stack: The server's network software must answer requests from hundreds or thousands of client hosts. If the clients happen to make their requests at nearly the same time, the server can become so busy that the network software rejects some of the connection requests.
58 * Server application: The server application can also become overloaded and responds so slowly that the client times out.%%% Another application problem can be that the configuration file is not set correctly for the functionality being attempted. The sshd configuration is famous for its gotchas. The defaults work 90% of the time, but when unusual uses are required, it is often difficult to get it right. Using the ''ssh -v'' argument while attempting to connect can help.
59 * Permissions: Sometimes the permissions set for files and directories are insufficient for the user id that the application runs under. This usually happens when a new application is being brought up. But it can also happen when the application allows different access privileges, and a users access is changed. Applications often consist of multiple programs, and error messages passed between them can become meaningless, making this problem especially difficult to track down.%%% Also, the /etc/hosts.deny and /etc/hosts.allow specify which client hosts are even allowed to establish sessions, and with which application. If this is not configured correctly, the client will usually receive a message indicating "access denied".
60
61 !!Tools
62
63 * Status commands: To determine the status of the local host, use the following command:
64
65 * hostname: This is used to set or display the hostname. Sometimes applications need the fully qualified domain to be defined for the hostname, which is how /etc/HOSTNAME should be defined. Using ''hostname -d'' will display the domain name to determine if it is defined properly.
66 * ifconfig: This is used to configure the interface and display status. It should be used cautiously for configuration. Some Linux distributions keep configuration meta-data in special files, and if they are not updated properly, then the driver will not work correctly.%%% Using ''ifconfig -a'' will display the current status of interfaces, including their IP address. The status of UP simply means that the driver is able to contact the NIC. The number of received and transmitted packets is a better indicator of network activity. However, even that can be misleading, because there are packets that are used for control information that are counted, and the counts may not include any data packets.
67 * netstat: This is one of the best indicators of network status. It has arguments to display many different network functions. The most common are ''netstat -a'' and ''netstat -r''.%%% Using ''netstat -p'' displays the programs using sockets. This can sometimes indicate that a client application needs to be shut down and restarted.%%% Using ''netstat -a'' displays all sockets in use. This includes streams, which is a way for the local host to exchange information between programs. There are a lot of these, so using ''netstat -a | grep tcp'' will just display tcp ports. Some of these are simply waiting for connections, and their status is LISTEN. Others are connected sessions and their status is ESTABLISHED.%%% Another status is CLOSE_WAIT. The TCP/IP protocol is designed to receive packets whenever they arrive. So even though all data of a session has been sent and received, it is not ended immediately, in case there are stray packets that will arrive late. Sometimes sessions can get hung in this state. If there are a lot of these that do not go away after a few minutes, they may cause problems and require restarting the interface with the /etc/init.d/network script.%%% Using ''netstat -r'' displays the route table. For local hosts, this is very short. But incorrect route tables on routers can create havoc on a network.
68
69 * Logs
70
71 * /var/logs/messages: Most network status is discovered using status commands. However, sometimes the messages logs can provide information on what led up to a problem, which can help diagnose it.
72 * Application logs: This may or may not exist, and the information that is written to them is often useful only to developers. However, it is always good to know about all sources of information for troubleshooting.
73
74 * Monitors
75
76 * tcpdump: This is a host based network "sniffer". It will capture network data that is seen by the NIC in file for later playback, or display it in realtime. There are many arguments to control its display format and filters to control what is captured. Use Ctrl-C to break out of it. For quick troubleshooting, it is usually sufficient to simply run it as ''tcpdump -nX -s1500 -i eth0''
77 * ethereal: This is a program similar to tcpdump and can read data captured by tcpdump. The advantage of ethereal is that it has many protocol definition modules that break packets apart and display each field with a text description.%%% As a side note, ethereal is especially useful if ports and/or data is unrecognized. Ethereal will often recognize the protocol which makes it a learning tool. But if it does not, then it is worth investigating the data to determine if it is a worm or virus.
78
79 * Testing commands
80
81 * ping: This is a way of testing whether a host and its NIC are active. Using ''ping -c <number>'' will limit the number of attempts. Otherwise use Ctrl-C to break out of it. Because ping has been misused and abused, many routers and firewalls are configured to discard it. So, just because there is no response, it does not mean that there is a network problem. But if there is a response, then the problem is probably a configuration or application issue.
4 IanMcDonald 82 * traceroute: This sets the distance that packets will travel through the network to one hop, and then gradually increases it. Each router or host that receives packets returns an error message that the traceroute program can use to determine that device's identity. This way, the list of routers along the path to the destination can be seen. Typically, if there is no response, three asterisks are displayed. If this continues, then it is probably where the problem is. Unfortunately, this is another utility that has been misused and abused, so many devices do not respond to the traceroute packets. Sometimes, the list will continue after one or more lines of asterisks. But usually, the packets are not being allowed to continue along the network path. It can be useful to use ''traceroute -n'' as then it does not try to resolve names.
1 JimSansing 83 * telnet/ssh/ftp: An attempt to make a telnet, ssh, or ftp connection will usually at least result in a message indicating access denied if the host isaccessible through the network. So this method can be used where ping and traceroute are disallowed.
84
85 !!Procedure
86
87 * Start from closest point
88
89 * Check processes: Sometimes multiple instances of the same process can create resource conflicts.%%% Example: Someone clicked on the mozilla startup icon twice
90 <pre>
91 > ps -ef | head -1; ps -ef | grep moz
92 UID PID PPID C STIME TTY TIME CMD
93 an_id 4775 3924 0 Jun22 ? 00:00:00 /bin/sh /usr/bin/mozilla
94 an_id 4786 4775 0 Jun22 ? 00:11:41 /opt/mozilla/lib/mozilla-bin
95 an_id 7976 3924 0 Jun22 ? 00:00:00 /bin/sh /usr/bin/mozilla
96 an_id 8103 7976 0 Jun22 ? 00:11:41 /opt/mozilla/lib/mozilla-bin
97 </pre>
98 * Check TCP/IP stack: If it is up, there should be some services active.%%%
99 Example:
100 <pre>
101 > netstat -a | grep tcp
102 tcp 0 0 localhost:smtp *:* LISTEN
103 tcp 1 0 localhost:34020 localhost:ipp CLOSE_WAIT
104 tcp 0 0 localhost:32803 *:* LISTEN
105 tcp 0 0 *:ssh *:* LISTEN
106 tcp 0 0 localhost:smtp *:* LISTEN
107 </pre>
108 * Check interface: You may need to execute this as root.%%% Example:
109 <pre>
110 > ifconfig -a
111 eth0 Link encap:Ethernet HWaddr 00:E0:81:29:8E:58
112 inet addr:192.168.1.5 Bcast:192.168.1.255 Mask:255.255.255.0
113 inet6 addr: fe80::2e0:81ff:fe29:8e58/64 Scope:Link
114 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
115 RX packets:480368 errors:0 dropped:0 overruns:0 frame:0
116 TX packets:24715 errors:0 dropped:0 overruns:0 carrier:0
117 collisions:2141 txqueuelen:1000
118 RX bytes:250359054 (238.7 Mb) TX bytes:2946463 (2.8 Mb)
119 Interrupt:11
120
121 lo Link encap:Local Loopback
122 inet addr:127.0.0.1 Mask:255.0.0.0
123 inet6 addr: ::1/128 Scope:Host
124 UP LOOPBACK RUNNING MTU:16436 Metric:1
125 RX packets:403 errors:0 dropped:0 overruns:0 frame:0
126 TX packets:403 errors:0 dropped:0 overruns:0 carrier:0
127 collisions:0 txqueuelen:0
128 RX bytes:27106 (26.4 Kb) TX bytes:27106 (26.4 Kb)
129 </pre>
130 * Check the route table.%%% Example:
131 <pre>
132 > netstat -rn
133 Kernel IP routing table
134 Destination Gateway Genmask Flags MSS Window irtt Iface
135 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
136 10.11.12.13 0.0.0.0 255.255.0.0 U 0 0 0 eth0
137 127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 lo
138 0.0.0.0 192.168.1.1 0.0.0.0 UG 0 0 0 eth0
139 </pre>
140 * Check the local network.%%% Example:
141 <pre>
142 > ping 192.168.1.1
143 PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
144 64 bytes from 192.168.1.1: icmp_seq=1 ttl=150 time=0.701 ms
145 64 bytes from 192.168.1.1: icmp_seq=2 ttl=150 time=0.674 ms
146 </pre>
147 * Check the domain name server.%%% Example:
148 <pre>
149 > cat /etc/resolv.conf
150 nameserver 10.11.12.10
151 nameserver 10.11.12.9
152 > ping 10.11.12.10
153 PING 10.11.12.10 (10.11.12.10) 56(84) bytes of data.
154 64 bytes from 10.11.12.10: icmp_seq=1 ttl=242 time=14.8 ms
155 64 bytes from 10.11.12.10: icmp_seq=2 ttl=242 time=13.0 ms
156 </pre>
157 <pre>
158 > host linux.org
159 linux.org has address 198.182.196.48
160 > host redhat.com
161 redhat.com has address 209.132.177.50
162 > host suse.com
163 suse.com has address 195.135.220.3
164 </pre>
165
166 * Check the network path.%%% Example 1:
167 <pre>
168 > traceroute something.com
169 traceroute to something.com (10.11.12.13), 30 hops max, 40 byte packets
170 1 local_router (192.168.1.1) 2.688 ms 4.170 ms 5.757 ms
171 2 next_router (10.1.1.1) 3.626 ms 5.143 ms 6.912 ms
172 3 * * *
173 4 something.com (10.11.12.13) 6.743 ms 7.954 ms 9.051 ms
174 </pre>
175 Example 2: Try to telnet to redhat.com on port 80. Use Ctrl-] to break out of telnet.
176 <pre>
177 > telnet redhat.com 80
178 Trying 209.132.177.50...
179 Connected to redhat.com.
180 Escape character is '^]'.
181 (press Ctrl-])
182 telnet> quit
183 Connection closed.
184 </pre>
185
186 !!RFCs:
187
188 The Requests for Comment that define the TCP/IP protocol are found at the following site. They are well worth a read after a little bit of experience working with network issues has been gained.
189
190 * IP: ftp://ftp.rfc-editor.org/in-notes/rfc791.txt
191 * TCP: ftp://ftp.rfc-editor.org/in-notes/rfc793.txt
2 JimSansing 192
3 JimSansing 193 ----
2 JimSansing 194 CategoryNetworking
The following authors of this page have not agreed to the WlugWikiLicense. As such copyright to all content on this page is retained by the original authors.
  • JimSansing
The following authors of this page have agreed to the WlugWikiLicense.

PHP Warning

lib/plugin/WlugLicense.php (In template 'html'):99: Warning: Invalid argument supplied for foreach()

lib/plugin/WlugLicense.php (In template 'html'):111: Warning: in_array() [<a href='function.in-array'>function.in-array</a>]: Wrong datatype for second argument

lib/plugin/WlugLicense.php (In template 'html'):111: Warning: in_array() [<a href='function.in-array'>function.in-array</a>]: Wrong datatype for second argument