When network services fail, administrators need to identify the root cause quickly. But how do they know what tools and techniques to use in a given scenario? Although a network might be complex, that doesn't mean its problems are complex. Network administrators use different approaches depending on the service and the problem symptoms.
Begin troubleshooting with the basics. Is the client properly configured and connected to the network? In addition, confirm the client can access essential services -- namely, routing and name resolution. If the client's settings are correct, move on to the remote server. Is it online? Is it accepting connections? Next, review the service's settings to confirm it is running and allows access to the specified user.
This logical approach, from basic client connectivity through service configuration, lets administrators troubleshoot network problems efficiently. Approach troubleshooting with the layers of the OSI model and TCP/IP stack in mind, beginning at the bottom and working upward.
To illustrate this idea, I'll walk through several troubleshooting examples showing how to debug network issues progressively further up the layers. I'll also reference specific Linux and Windows commands along the way.
1. Start with the basics
It's essential to begin with the basics. Consider the following steps:
- Is it plugged in?
- Is it on?
- Have you restarted it?
- Can you ping it? Can you ping it by name?
Is it plugged in?
It sounds simplistic, but this step calls for investigation in several places. Do you have physical network connectivity between the host and the destination device? Assuming an internal network connection attempt, check whether all devices are connected and functioning correctly.
Is it on?
This might also seem elementary, but check that all devices along the path are powered on. And don't just check devices but also whether the necessary services are running. These might be firewalls, proxies, filters, load balancers or other network services.
Have you restarted it?
If the device is on, have you restarted it? Restarting devices often clears issues, especially on Windows systems.
Can you ping it?
Ping is a particularly useful tool. One trick is to ping by IP address to confirm basic connectivity and then ping by hostname to check both connectivity and name resolution.
2. Check connections to the server
Let's start with a simple scenario: A user submits a ticket indicating they cannot reach a specific server. Assume they should have access. What steps might a network administrator take to investigate the problem?
1. Check the local IP address and subnet mask using the following commands.
2. Confirm that the default gateway and DNS server values are accurate.
3. Ping the router and the DNS server.
Linux and Windows command:
4. Ping the destination server by IP address and hostname to confirm connectivity and name resolution.
Linux and Windows command:
5. Clear the host's name resolution cache.
First, confirm the user's local workstation is properly configured. Client systems usually lease their IP address configurations dynamically from a Dynamic Host Configuration Protocol server, so typos are rare. If the system has a static configuration, check the IP address, subnet mask, default gateway and DNS server values carefully for mistakes.
Use ping to verify connectivity. A ping by name tests both connectivity and name resolution.
Windows clients cache recently resolved hostnames and IP addresses. Use ipconfig /displaydns to view this cache. Clear the cache with ipconfig /flushdns.
3. Check connections to the network folder
Consider a different issue: A user can connect to the server but cannot access a shared directory residing on that server.
- Confirm the folder is shared out correctly. On Linux, use the /etc/exports directory. On Windows, right-click the folder, open Properties and go to the Sharing tab.
- Display the network shares on the system. On Linux, use the showmount command. On Windows, use net view.
- Check the share permissions. Is the user explicitly denied access? Are they a member of a group granted permission? On Linux, use /etc/exports. On Windows, right-click the folder, open Properties and go to the Sharing tab.
- Confirm the user's group memberships. Administrators usually apply access permissions to groups of users with similar security requirements. Users will be denied if they aren't members of a group with access. On Linux, use groups <username>. On Windows, select Properties of the user account in Active Directory.
If access to the server itself is not the problem, as in the previous scenario, check the shared directory's configuration. Is it actually shared? What permissions are in place, and do those permissions grant the given user's identity access? If a user has access, it's usually through a group membership.
4. Verify connections to a website
Sometimes users need to access an internal website, but the page won't display. If connectivity exists -- based on ping results -- check settings, such as router filtering, name resolution, service status and certificates.
- Do any routers between the client and web server filter HTTP and HTTPS traffic?
- Does a ping by hostname succeed, indicating name resolution works?
- Confirm a DNS resource record for the site exists on the DNS server. On both Linux and Windows, use nslookup.
- Verify the HTTP service is running on the web server. On Linux, use either systemctl status httpd or systemctl restart httpd. On Windows, open Administrative Tools and go to the Services console.
- Check for certificate errors that appear in the user's browser window. These might include expired certificates or a broken certificate chain.
Website access problems are likely to be on the server side. If the server is up and running, check the web service. In addition, check that the client can resolve the site's name to an IP address. Linux and Windows both use the nslookup tool, and Linux also supports host and dig to test name resolution.
Another useful utility for troubleshooting services is Telnet. Many network administrators frown upon the use of Telnet for remote administration these days -- many prefer Secure Shell (SSH) -- but it's a helpful tool for testing port connectivity. For example, to check the status of HTTP port 80, type # telnet www.website.com 80.
If Telnet returns HTML code or web server information, the connection is successful, indicating that port 80 is available and the service is running. If you receive a connection failure message, the source computer cannot reach port 80, perhaps due to filtering or a service that isn't running.
You'll likely have to install Telnet on any modern OS, however, as it's rarely included in OSes anymore.
5. Confirm SSH configurations
Linux network administrators rely on SSH for remote administration. If that connectivity fails, it can result in additional administrative overhead. To identify the problem, look into the following settings:
- Confirm the sshd service is running on the destination Linux server. Use systemctl status sshd or systemctl restart sshd.
- Confirm the sshd_config file exists and is properly configured.
- Some administrators block root access to SSH servers, requiring users to connect with standard user accounts and then elevate privileges as needed. In the sshd_config file, use PermitRootLogin no.
- Modern SSH servers often require key-based authentication instead of passwords. The sshd_config file might allow either type of authentication or might enforce only one of them. In the sshd_config file, use PasswordAuthentication no.
SSH is the go-to remote administration tool for Linux system administration. Many security configurations can prevent connections, however. If the server is up and the SSH service is running, check the sshd_config file for settings that might prevent the root user from connecting, explicitly list the users allowed to connect, or require key-based authentication. Any of these settings could prevent an administrator from connecting if they don't match the requirements.
Best practices for debugging network issues
When the inevitable network connection problem arises, it's critical to know how to identify and remedy the issue as quickly as possible. Begin by checking for basic functionality, such as power, physical connections and ping tests.
From the general settings, check the server functionality and settings. Finally, verify the service is running, shared resources are available and permissions are correct. Check the user's group memberships to make sure permissions allow the user to use the service or resource.
The above scenarios moved upward, from basic settings through server connectivity to service functionality, allowing for methodical, logical and efficient troubleshooting. Tie a network debugging approach to the OSI model or TCP/IP stack, where the bottom is the most fundamental and the top the most complex.