Troubleshooting wireless networks: A systematic approach

Troubleshooting wireless networks can take up a lot of time for network operations staff with a wireless LAN (WLAN) environment. Wireless guru Lisa Phifer discusses the tools and processes involved and breaks down how to approach and resolve wireless problems with a step-by-step, systematic method.

Going wireless may avoid the expense of cabling and wired LAN drops, but wireless networks still require effective technical support and troubleshooting. According to Momenta Research, ongoing support represents 45% of a wireless LAN's total cost of operation. Efficient wireless troubleshooting contains that cost while avoiding downtime that can sap productivity benefits.

Smooth operation starts with a well-designed wireless LAN (WLAN). Site surveys, RF modeling, and automated RF management systems are all investments that pay dividends by reducing help desk calls. But no WLAN can escape the need for troubleshooting. Environmental conditions and network usage change. Equipment failures and software glitches occur. Eventually, wireless users require assistance. Good troubleshooting tools and a systematic approach can help you isolate and resolve those problems faster.

Processes and tools

When it comes to WLANs, conventional network management systems and diagnostic utilities are crucial, but insufficient. Many "wireless" help desk calls are caused by upstream network or application failures. But determining that -- and isolating true WLAN failures -- requires new tools and techniques.

Many WLAN adapters are accompanied by wireless client utilities such as Cisco ADU and Intel PROSet. These vendor utilities can provide connection status, data rate, signal strength, and other basic information for triage during that first help desk call.

Help desk staff can also assess the situation remotely using a wireless intrusion detection system (WIDS). A WIDS -- or a WLAN switch with remote monitoring capabilities -- lets you query the user's current status, recent activity, and related alerts representing deviations from security policy or expected performance.

Problems not easily resolved must be assigned to a technician for further investigation. That technician may put a WIDS sensor near the user into filtered capture mode, analyzing wireless traffic in hopes of isolating the problem without an on-site visit.

Of course, some issues simply cannot be diagnosed remotely. Any technician dispatched to the user's location should be armed with a portable WLAN traffic analyzer and a wireless spectrum analyzer to enable passive observation, multi-layer traffic analysis, and active diagnostic testing.

Trouble isolation

Problems experienced by wireless users run the gamut, from radio interference and client misconfiguration to loose access point LAN cables and cranky applications. As with wired troubleshooting, a systematic approach is needed to track down the problem without overlooking common causes or running in circles. And following the connection from client to server, verifying operation of every component in between, is still a sensible approach for wireless. Doing this simply requires an understanding of wireless devices, protocols and your own WLAN's architecture.

  1. Can the user "see" your WLAN?
    If the user's client utility does not list your WLAN's service set identifier (SSID), use a WIDS or WLAN manager to remotely identify and verify operation of the closest access point. If access points are operational, use portable test tools to listen for the SSID at the user's location.

    If the tester sees the SSID but the user cannot, check for client hardware or software problems (e.g., disabled adapter, old/corrupted driver). Verify that the client uses a compatible standard and domain (i.e., rule out channel and modulation mismatches). Assess signal strength -- if the tester's signal-to-noise ratio is weak, the access point may be too distant for a less-powerful client to see. Note that the client may have trouble seeing your SSID if your access points do not broadcast SSID or send multiple SSIDs in the same beacon.

  2. Can the user associate with your WLAN?
    If the user's client utility does not show a persistent association, use a WIDS or WLAN manager to remotely investigate association attempts. If necessary, use portable test tools to watch the user try to associate.

    Alerts and traffic analysis can help you learn why a client cannot associate. First, rule out access point reset, then evaluate the client's configuration. Check for capability mismatch -- like a client that cannot support the access point's minimum data rate -- and security mismatch -- like a client that cannot support the access point's required cipher suite. If the access point rejects the client's requests, check the access point's log for overloading or MAC access control list failure. If attempts are repeatedly disrupted by deauthenticate messages, consider a denial of service attack -- perhaps by a WIDS that mistakenly believes the client is unauthorized.

  3. Can the user authenticate with your WLAN?
    In WLANs that require 802.1X, associations that are established but break shortly thereafter indicate authentication failure. Diagnosing this involves examining client, access point and authentication server logs, and analyzing traffic among these three components.

    On the client side, verify driver, operating system, and client utility (802.1X supplicant) support for the Extensible Authentication Protocol (EAP) types required by your WLAN. Check the client's configuration carefully, including any stored user credentials and configured server certificates. Verify that the access point and authentication server are communicating. Potential problems here include physical disconnection, virtual LAN or routing issues, and bad RADIUS secrets. If the server receives but rejects the client's requests, use the server's log (and perhaps diagnostics) to learn why. In some cases, the problem lies between the authentication server and the user data store (e.g., Active Directory, RSA/ACE Server).

  4. Can the user get an IP address?
    A client that associates but cannot obtain an IP address (or falls back to an automatic private IP address 169.254.x.x) is having trouble reaching a Dynamic Host Configuration Protocol (DHCP) server.

    First, make sure that the DHCP server is operational and reachable from the access point's LAN, and that the IP address pool has not been exhausted. In WLANs that use Wi-Fi Protected Access (WPA or WPA2) Personal, look for a mismatch between the client and access point's pre-shared key. In WLANs that dynamically assign virtual LAN tags by SSID or 802.1X results, check access point and/or switch virtual LAN mappings to verify that client broadcasts reach your DHCP server. Watch DHCP responses to spot problems in the return path. Much of this process will be familiar to wired LAN technicians.

  5. Can the user log into your WLAN portal?
    In WLANs that require captive portal login, users may associate and authenticate at Layer 2 but be unable to send application traffic. Diagnosing login failure starts with examining wired or wireless traffic between the client and portal.

    On the client side, look for common network issues such as DNS (client cannot resolve portal name), routing (client cannot ping portal), and blocking (host firewall or VPN client blocking HTTP). If the client reaches the portal but cannot establish an SSL session, look for version mismatch or server certificate problems. If the portal rejects the client's request, check user credentials and verify communication between the portal and any external authentication server. This process will be familiar to those who already use Web portals for secure remote access.

  6. Can the user reach the target application?
    Users who are new to wireless often suspect RF and access point problems when the true culprit is just a good old network or application reachability problem. Verify wireless client connectivity to the wired LAN -- for example, ping through the access point and login portal to the next hop router. Then use conventional tools and techniques to investigate the usual assortment of DNS, LAN, WAN, firewall, and application server troubles.
  7. Does the user frequently lose wireless connectivity?
    Intermittent failures are frustrating for users and support staff. Application session failures may be caused by WAN or application server problems, but associations that break repeatedly warrant further wireless investigation.

    802.11 clients react to environmental change by roaming to access points that offer better service. This can occur for many reasons: An access point fails, a door is closed, the user carries his laptop down the hall, or his hand moves in a way that impedes transmission. In WLANs with little or no security, access point roaming may occur without noticeable impact. In WLANs that use 802.1X, roaming may require re-authentication or disrupt latency-sensitive applications such as VoIP. In large WLANs, roaming may result in an IP address change, disconnecting application sessions.

    Resolving troubles caused by roaming -- and related RF failures or performance issues -- requires a good understanding of WLAN operation, portable analysis tools, and on-site visits to monitor user behavior and nearby radio transmissions. A WIDS or WLAN manager can also be helpful to determine how performance at this location compares with others, and to spot traffic patterns that trigger intermittent failures.

    Measuring end-to-end performance can help you determine when there is insufficient throughput or excessive latency for a given application. Look for problems that impede WLAN performance, including non-802.11 interference, 802.11g access points operating without b/g protection, channels overloaded with multiple access points, access points overloaded with clients, and excessive collisions or errors. RF trouble-shooting requires experience and perhaps specialized training -- see the Certified Wireless Analysis Professional program.

About the author Lisa Phifer is vice president of Core Competence Inc., a consulting firm specializing in network security and management technology. Phifer has been involved in the design, implementation, and evaluation of data communications, internetworking, security, and network management products for nearly 20 years. She teaches about wireless LANs and virtual private networking at industry conferences and has written extensively about network infrastructure and security technologies for numerous publications. She is also a site expert to SearchMobileComputing.com and SearchNetworking.com.

Dig Deeper on Network infrastructure

Unified Communications
Mobile Computing
Data Center