In my line of work as a sysadmin at one of the largest sites in Norway, it happens once in a while that I have to inspect HTTP traffic for some more or less urgent reason.
One of the tools I really love working with is tshark.
Tshark is the console version of wireshark and enables you to sniff and dissect just about any protocol in realtime.
One of the problems I had recently was to identify webtraffic originating from our webserver.
Over the years code has accumulated server initiated fetches. Stuff like file_get_content("http://somesite/someurl)
in the presentation code. This is bad since it creates external dependencies to deliver a page and keeps apache/nginx/lighttd threads/processes busy
tshark -i eth0 -n -aduration:60 -zhttp,tree -zhttp_srv,tree -T fields -e http.host -e http.request.uri -e http.request.method -R http -tad 'src host 10.0.0.144 and (dst port 80 or dst port 443)'
This roughly says:
- Listen on the eth0 interface for 60 seconds.
- Write out two different sets of statistics about the traffic.
- Write out the "Host:" header, the URL and the request method. (GET/POST).
- Try to interpret the traffic as a HTTP.
- write timestamps in a readable format (not used)
- Only look at traffic from my IP to port 80 (HTTP) and port 443 (HTTPS)
This little trick helped me identify loads of external dependencies and pinpointed some ugly code that needed some care.
And while I was at it. I figured out I could do something similar with mysql queries. Instead of turning on full Query-logging in mysql (which probably means a restart of a running production mysql) I could just sniff it
tshark -i eth0 -aduration:60 -d tcp.port==3306,mysql -T fields -e mysql.query 'port 3306'
Which roughly says:
- Listen on eth0 for 60 seconds
- Interpret port 3306 as mysql
- write out queries
- Only look at traffic on port 3306
Have fun.
Other usefull options to -T fields -e
http.response.code
http.server
http.content_type
ip.src
ip.dst
tcp.port
http.user_agent