Previously I discussed utilizing Wireshark I/O Graphs for troubleshooting. In this post I would like to share some thoughts on using the “WLAN Traffic” option for analyzing and troubleshooting purposes. Keep in mind that these tools will not necessarily exactly tell you what the issue is at times. But they are available as part of your troubleshooting kit and can help you get to the root cause or close enough at minimum so you can make an educated decision on what is going on.
Issue Description
Client’s were getting kicked off the wireless, some were showing connected but no data was being passed. Rebooting the access point fixes the issue until it happens again.
This is something very recently I had to troubleshoot and other that Wireshark I/O Graphs I decided to also utilize Wireshark – WLAN Traffic feature that I previously did not pay to much attention to.
Once the “WLAN Traffic” screen opens up, there is lots of data that may show up in there depending on your capture. Screen shot below shows just a small snippet of it for starters. Let’s take a look at all the different columns.
WLAN Traffic Columns
- BSSID: Shows all the BSSIDs
- Channel: Shows all the channels detected
- SSID: Shows all the SSID’s detected
- Percent Packets: Shows the percentage of packets/frames
- Percent Retry: Shows the retry percentage
- Retry: Shows the number of retries
- Beacons: Shows the number of Beacons
- Data Packets: Shows the number of Data Packets
- Probe Requests: Shows the number of Probe Requests
- Probe Responses: Shows the number of Probe Responses
- Auth: Shows the number of Auth frames
- Deauth: Shows the number of Deauth frames
- Other: Shows the number of other misc frames
- Protection: Shows any protection frames
Each one of these columns is very important and allows us to analyze what is going on with the WLAN with additional easy to read/understand statistics. This view can also be saved as a text, csv, xml and yaml formats.
Sorting information
By clicking on top of the columns, information can be sorted out. I personally like to sort out the information based on ascending channels, as shown below:
Navigating through data
While looking at the data I noticed something interesting after sorting the channels.
- Number of Retries and Retry Percentage was really high in 2.4 GHz band vs 5 GHz.
- Number or Probe Requests and Probe Responses were really high in 2.4 GHz
- There were lots of Deauths on channel 6.
Now the important question is how does all this helps/helped me in troubleshooting an issue? Let us look further.
^ click on Mac shows the screen shot above. This allowed me to directly apply filters from here. I decided to expand the address and then apply my filter and narrowed it down with “Probe Requests” only.
“wlan.addr==xx:xx:xx:xx:aa:aa && wlan.fc.type_subtype == 4”
Here I noticed a device that was constantly sending out “Probe Requests”. And majority of them were set with really high Duration time, as shown below. They were all in 2.4 GHz band.
Next I changed the filter for “Probe Responses”
wlan.addr==xx:xx:xx:xx:xx:aa && wlan.fc.type_subtype == 5
Under Probe responses I noticed that there are 100’s of responses from the access point(s) on different channels, specifically in 2.4 GHz there were to many Probe Responses. After expanding the “Probe Response” IE, I discovered that after each response there were 100’s of retransmits.
Anyone who has read my previous post about the Wireshark – I/O Graphs, may remember that I/O graphs provide some good information. Using that I decided to pull up a quick graph. I had to cut off some of the graph but this gives an idea. Comparing “All Packets” the number of “Probe Requests” and “Probe Responses” in the graph as well as the metrics under the WLAN Traffic displayed a higher than usual number, mostly in the 2.4 GHz band.
WLAN Traffic – Conclusion
- To many Probe Requests from multiple devices
- High Duration times due to CCI (Rasika Nayanajith, explains about the Duration Times on his website in great detail). I recommend reading that.
- Access Points having to re-transmit Probe Responses since they are not reaching the clients
- Since the radios were having to do excessive work, it was causing an issue with the CPU usage mixed with the memory leak issue with the code, access points would freeze and watchdog would eventually reboot the access points to free up the CPU and memory.
- Quick resolution to the issue was to disable the 2.4 GHz radios completely.
- Permanent fix was code update on the access points, but I decided to keep all crucial clients on the 5 GHz band regardless.
In my opinion troubleshooting anything is an art in every single profession. You may not have all the answers in front of you but you may have the tools and experience to collect data and make an educated decision/guess on what is root cause of the issue. These troubleshooting steps, not only helped me learn and refresh my memory but also pointed me in the right direction which allowed me to resolve the issue and find out the root cause.
Hopefully this write up would be helpful to other WLAN engineers, please feel free to provide any feedback, if I may have missed anything and/or something that is not correct. Thank you.