Penguin

Key points

  • The main point to remember about networks is that they are serial transmission media. That means that data flows one bit at a time from point A to point B. So, to troubleshoot a network problem, start at point A and check that everything is working on the way to point B.
  • Modern networks have functions to help people use them, such as domain name servers, which translate between network addresses and human readable host names. But when troubleshooting network problems, these servers may be inaccessible. So use the most basic method of communication for testing access and collecting status information. The commands to find the network addresses that will be needed are in the procedures below.
  • Networks may be slow, or services unavailable. There is usually a timeout set for networks commands, which may have to run to completion before a program can continue. So, patience is required for troubleshooting network problems.
  • Network protocols expect things to go wrong, so retries are built in most of the time. But during troubleshooting, this is not always a good thing. If possible, override the retry value with a count parameter. Otherwise, it may be necessary to interrupt a program that is not responding.

Parts list

  • Application software: The local host is point A. However, there are many points of failure before data can leave the local host.

    • Files:

      • /home/user_id/application_path/application_config_files
    • Email client: Email clients are usually configured to automatically poll the email server for new mail. If the connection is lost, this causes error messages, which might need to be responded to before the program can continue. If it happens while a message is being transferred, then the client might get hung and have to be forced to end.
    • Web browser: Web browsers are most susceptible to domain name server problems. Each computer is assigned a local domain name server, but it only keeps track of a few host names. When a web site name is requested that is not in its list, it must query other servers to find it. This may take long enough for the web browser to timeout. Or the domain name server that tracks the requested name may be down. So, web browsers are very poor indicators of network problems.
    • Games: Online games are similar to email in that a single server is contacted. However, there is a higher volume and more constant flow of data. So, network problems become apparent more quickly. However, because game servers often are further away from the local host than email servers, network congestion is more possible along the client/server path and may cause the game performance to be affected.
    • Other applications: Audio and video streaming use a protocol that is called "unreliable". This is not as bad as it sounds, but it does mean that occasionally some data can be lost. This is why network radio sometimes sounds so poor.
  • Network stack

    • Files:

      • /etc/HOSTNAME
      • /etc/services
    • API layer: The connection between a client application and the network software is made through a "socket". On the client side, the Application Program Interface to the network software is usually not a problem. However, it can happen that as many connections as are allowed by a single program are in use, so no more connections can be made. This can be checked using the netstat command described below.
    • TCP/UDP layer: The Transmission Control Protocol is what the network software uses to make sure that all data sent is received. The User Datagram Protocol is similar, but it is called "unreliable" because the receiving host does not confirm that data was received. Both use network ports to identify a "session" between two hosts. When a session is active, that pair of ports cannot be used between the same two hosts for another session. Active sessions can be displayed using the netstat command described below.
    • IP layer: The Internet Protocol is what the network software uses to determine how to establish a session between two hosts. It uses network addresses to identify the hosts and a route table to know where to send or "route" the data. The network address of the local host is usually in the /etc/hosts table if a fixed address is used. The route table can be displayed using the netstat command described below.
  • Interface and Local Network

    • Files:

      • /etc/sysconfig/network
      • /etc/init.d/network
    • Hardware/Driver: The cable that connects the local host to the network is attached to a piece of hardware that can get data from one end of the cable to the other. There are many such connections between the local host and the server host, but usually problems with those connections must be handled by others. The driver is simply the software that is used to control the hardware.
      The configuration of the hardware and its driver are found in the file, /etc/sysconfig/network. The driver is started and stopped using the /etc/init.d/network command. The status of the hardware and its driver can be displayed using the ifconfig command described below.

      • Serial port/PPP: This hardware is the slowest network connection. The cable is connected to the serial port on the local host and to a modem at the other end. The modem is connected to a telephone jack. Often modems are built into the local host, and it is connected directly to the telephone jack. The Point-to-Point Protocol is used to send data between the local host and the Internet Service Provider's (ISP) modem. The ISP then has a faster connection to the network.
      • NIC card/Ethernet driver: A Network Interface Card uses software for faster networks. Most NICs use the Ethernet protocol, although there are several others available. The Ethernet protocol is used to send data between the local host and a router, which is connected to the network. On home networks, the cable from the NIC is usually connected to a cable modem or a Digital Subscriber Line (DSL) modem, which is connected to a router at the cable television or telephone company office.
      • Wifi card/Wifi driver: There is no cable connected to a Wifi card. However, the Wifi card communicates with an "access point" which usually has an Ethernet NIC that is connected the same as a local host Ethernet NIC.
    • Home networks usually have a router between the local hosts and the cable or DSL modem. It is possible to plug more than one local host into the router, but most networks use a hub or a switch to allow a single router connector or "port" to be used. There may be an application available to display the status of the router. The same may be true for switches. However, hubs usually only display a little bit of status information on the front panel.
      These devices usually have very little control capability, and are reset by powering them off and then back on. It is advisable to keep them physically close together, and if they are all plugged into a single power strip, then resetting them simultaneously is as simple as turning off and on the main power strip switch.
  • Network

    • Files:

      • /etc/hosts
      • /etc/resolv.conf
    • Connection to network: The ISP provides the hardware and software to get access to any host connected to the world wide web, if everything is working correctly. To simplify making connections, domain name servers translate between network addresses and host names. For example, by issuing the host command, the host names can have their addresses displayed. The address of the ISP's domain name servers is in the file, /etc/resolv.conf. Any hosts defined in the file, /etc/hosts, will be used without requesting them to be translated by the domain name servers. So, it is a good idea to use this table for hosts on the local network, and even a few of the ISP hosts. The names in the /etc/hosts file do not have to match the actual name of the host defined. So instead of using the real domain server name of, ns02.someplace.st.us02.comcast.net, the name dns1 could be used.
    • Network path: Under the TCP/IP protocol, data is broken into "packets" of various sizes. The protocol is designed to choose any path that is available for each individual packet. This means that while some packets may be lost, there is usually a way for the entire message to reach its destination. However, sometimes there is a point of failure that it is impossible to bypass. There are several tools, described below, to determine if the path to the server is available.
  • Destination host

    • Files:

      • /etc/ssh/sshd_config
      • /etc/hosts.deny
      • /etc/hosts.allow
    • Hardware: The hardware for servers and their networks is essentially the same as that at the local host, just more of it. This means complexity, which means more chances for errors.
    • Network stack: The server's network software must answer requests from hundreds or thousands of client hosts. If the clients happen to make their requests at nearly the same time, the server can become so busy that the network software rejects some of the connection requests.
    • Server application: The server application can also become overloaded and responds so slowly that the client times out.
      Another application problem can be that the configuration file is not set correctly for the functionality being attempted. The sshd configuration is famous for its gotchas. The defaults work 90% of the time, but when unusual uses are required, it is often difficult to get it right. Using the ssh -v argument while attempting to connect can help.
    • Permissions: Sometimes the permissions set for files and directories are insufficient for the user id that the application runs under. This usually happens when a new application is being brought up. But it can also happen when the application allows different access privileges, and a users access is changed. Applications often consist of multiple programs, and error messages passed between them can become meaningless, making this problem especially difficult to track down.
      Also, the /etc/hosts.deny and /etc/hosts.allow specify which client hosts are even allowed to establish sessions, and with which application. If this is not configured correctly, the client will usually receive a message indicating "access denied".

Tools

  • Status commands: To determine the status of the local host, use the following command:

    • hostname: This is used to set or display the hostname. Sometimes applications need the fully qualified domain to be defined for the hostname, which is how /etc/HOSTNAME should be defined. Using hostname -d will display the domain name to determine if it is defined properly.
    • ifconfig: This is used to configure the interface and display status. It should be used cautiously for configuration. Some Linux distributions keep configuration meta-data in special files, and if they are not updated properly, then the driver will not work correctly.
      Using ifconfig -a will display the current status of interfaces, including their IP address. The status of UP simply means that the driver is able to contact the NIC. The number of received and transmitted packets is a better indicator of network activity. However, even that can be misleading, because there are packets that are used for control information that are counted, and the counts may not include any data packets.
    • netstat: This is one of the best indicators of network status. It has arguments to display many different network functions. The most common are netstat -a and netstat -r.
      Using netstat -p displays the programs using sockets. This can sometimes indicate that a client application needs to be shut down and restarted.
      Using netstat -a displays all sockets in use. This includes streams, which is a way for the local host to exchange information between programs. There are a lot of these, so using netstat -a | grep tcp will just display tcp ports. Some of these are simply waiting for connections, and their status is LISTEN. Others are connected sessions and their status is ESTABLISHED.
      Another status is CLOSE_WAIT. The TCP/IP protocol is designed to receive packets whenever they arrive. So even though all data of a session has been sent and received, it is not ended immediately, in case there are stray packets that will arrive late. Sometimes sessions can get hung in this state. If there are a lot of these that do not go away after a few minutes, they may cause problems and require restarting the interface with the /etc/init.d/network script.
      Using netstat -r displays the route table. For local hosts, this is very short. But incorrect route tables on routers can create havoc on a network.
  • Logs

    • /var/logs/messages: Most network status is discovered using status commands. However, sometimes the messages logs can provide information on what led up to a problem, which can help diagnose it.
    • Application logs: This may or may not exist, and the information that is written to them is often useful only to developers. However, it is always good to know about all sources of information for troubleshooting.
  • Monitors

    • tcpdump: This is a host based network "sniffer". It will capture network data that is seen by the NIC in file for later playback, or display it in realtime. There are many arguments to control its display format and filters to control what is captured. Use Ctrl-C to break out of it. For quick troubleshooting, it is usually sufficient to simply run it as tcpdump -nX -s1500 -i eth0
    • ethereal: This is a program similar to tcpdump and can read data captured by tcpdump. The advantage of ethereal is that it has many protocol definition modules that break packets apart and display each field with a text description.
      As a side note, ethereal is especially useful if ports and/or data is unrecognized. Ethereal will often recognize the protocol which makes it a learning tool. But if it does not, then it is worth investigating the data to determine if it is a worm or virus.
  • Testing commands

    • ping: This is a way of testing whether a host and its NIC are active. Using ping -c <number> will limit the number of attempts. Otherwise use Ctrl-C to break out of it. Because ping has been misused and abused, many routers and firewalls are configured to discard it. So, just because there is no response, it does not mean that there is a network problem. But if there is a response, then the problem is probably a configuration or application issue.
    • traceroute: This sets the distance that packets will travel through the network to one hop, and then gradually increases it. Each router or host that receives packets returns an error message that the traceroute program can use to determine that device's identity. This way, the list of routers along the path to the destination can be seen. Typically, if there is no response, three asterisks are displayed. If this continues, then it is probably where the problem is. Unfortunately, this is another utility that has been misused and abused, so many devices do not respond to the traceroute packets. Sometimes, the list will continue after one or more lines of asterisks. But usually, the packets are not being allowed to continue along the network path. It can be useful to use traceroute -n as then it does not try to resolve names.
    • telnet/ssh/ftp: An attempt to make a telnet, ssh, or ftp connection will usually at least result in a message indicating access denied if the host isaccessible through the network. So this method can be used where ping and traceroute are disallowed.

Procedure

  • Start from closest point

    • Check processes: Sometimes multiple instances of the same process can create resource conflicts.
      Example: Someone clicked on the mozilla startup icon twice
        > ps -ef | head -1; ps -ef | grep moz
        UID    PID   PPID  C STIME TTY  TIME     CMD
        an_id  4775  3924  0 Jun22 ?    00:00:00 /bin/sh /usr/bin/mozilla
        an_id  4786  4775  0 Jun22 ?    00:11:41 /opt/mozilla/lib/mozilla-bin
        an_id  7976  3924  0 Jun22 ?    00:00:00 /bin/sh /usr/bin/mozilla
        an_id  8103  7976  0 Jun22 ?    00:11:41 /opt/mozilla/lib/mozilla-bin
  • Check TCP/IP stack: If it is up, there should be some services active.

Example:

        > netstat -a | grep tcp
        tcp  0  0 localhost:smtp   :            LISTEN
        tcp  1  0 localhost:34020  localhost:ipp  CLOSE_WAIT
        tcp  0  0 localhost:32803  :            LISTEN
        tcp  0  0 :ssh            :            LISTEN
        tcp  0  0 localhost:smtp   *:            LISTEN
  • Check interface: You may need to execute this as root.
    Example:
        > ifconfig -a
        eth0  Link encap:Ethernet  HWaddr 00:E0:81:29:8E:58
        inet addr:192.168.1.5  Bcast:192.168.1.255 Mask:255.255.255.0
        inet6 addr: fe80::2e0:81ff:fe29:8e58/64 Scope:Link
        UP BROADCAST RUNNING MULTICAST  MTU:1500 Metric:1
        RX packets:480368 errors:0 dropped:0 overruns:0 frame:0
        TX packets:24715 errors:0 dropped:0 overruns:0 carrier:0
        collisions:2141 txqueuelen:1000
        RX bytes:250359054 (238.7 Mb) TX bytes:2946463 (2.8 Mb)
        Interrupt:11

        lo    Link encap:Local Loopback
        inet addr:127.0.0.1 Mask:255.0.0.0
        inet6 addr: ::1/128 Scope:Host
        UP LOOPBACK RUNNING MTU:16436 Metric:1
        RX packets:403 errors:0 dropped:0 overruns:0 frame:0
        TX packets:403 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:0
        RX bytes:27106 (26.4 Kb) TX bytes:27106 (26.4 Kb)
  • Check the route table.
    Example:
        > netstat -rn
        Kernel IP routing table
        Destination  Gateway      Genmask        Flags  MSS Window irtt Iface
        192.168.1.0  0.0.0.0      255.255.255.0  U        0 0         0 eth0
        10.11.12.13  0.0.0.0      255.255.0.0    U        0 0         0 eth0
        127.0.0.0    0.0.0.0      255.0.0.0      U        0 0         0 lo
        0.0.0.0      192.168.1.1  0.0.0.0        UG       0 0         0 eth0
  • Check the local network.
    Example:
        > ping 192.168.1.1
        PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
        64 bytes from 192.168.1.1: icmp_seq=1 ttl=150 time=0.701 ms
        64 bytes from 192.168.1.1: icmp_seq=2 ttl=150 time=0.674 ms
  • Check the domain name server.
    Example:
        > cat /etc/resolv.conf
        nameserver 10.11.12.10
        nameserver 10.11.12.9
        > ping 10.11.12.10
        PING 10.11.12.10 (10.11.12.10) 56(84) bytes of data.
        64 bytes from 10.11.12.10: icmp_seq=1 ttl=242 time=14.8 ms
        64 bytes from 10.11.12.10: icmp_seq=2 ttl=242 time=13.0 ms
        > host linux.org
        linux.org has address 198.182.196.48
        > host redhat.com
        redhat.com has address 209.132.177.50
        > host suse.com
        suse.com has address 195.135.220.3
  • Check the network path.
    Example 1:
        > traceroute something.com
        traceroute to something.com (10.11.12.13), 30 hops max, 40 byte packets
        1 local_router (192.168.1.1)  2.688 ms   4.170 ms   5.757 ms
        2 next_router (10.1.1.1)  3.626 ms   5.143 ms   6.912 ms
        3 * * *
        4 something.com (10.11.12.13)  6.743 ms   7.954 ms   9.051 ms

Example 2: Try to telnet to redhat.com on port 80. Use Ctrl-] to break out of telnet.

        > telnet redhat.com 80
        Trying 209.132.177.50...
        Connected to redhat.com.
        Escape character is '^]'.
        (press Ctrl-])
        telnet> quit
        Connection closed.

RFCs:

The Requests for Comment that define the TCP/IP protocol are found at the following site. They are well worth a read after a little bit of experience working with network issues has been gained.


CategoryNetworking

The following authors of this page have not agreed to the WlugWikiLicense. As such copyright to all content on this page is retained by the original authors.
  • JimSansing
The following authors of this page have agreed to the WlugWikiLicense.

PHP Warning

lib/plugin/WlugLicense.php:99: Warning: Invalid argument supplied for foreach()

lib/plugin/WlugLicense.php:111: Warning: in_array() [<a href='function.in-array'>function.in-array</a>]: Wrong datatype for second argument

lib/plugin/WlugLicense.php:111: Warning: in_array() [<a href='function.in-array'>function.in-array</a>]: Wrong datatype for second argument