Some days it feels like reading and analysing logs is the only thing I do at work.
And when I analyse logs it's usually something like
$ zcat -f *log* | grep -v something | awk '/something else/ { print $3 }' | sed 's/something/else/g' | sort | uniq -c | sort -gr > output
And when this does not fully comply to what i need to do i fall back to perl and write a ugly oneliner of the type
$ tail -f foo | perl -lane 's/foo(.*)/bar$1bar/g'
So when my wife and daughter got a visit from Mr Noro I had some spare time and wrote tailstat. It's a small perlprogram that does what I manualy do every day
It has a few different modes
Plain
[ay@wildbill logs]# for i in foo foo bar bar foo foobar;do echo $i;done | pipestat
foo : (50.0%) (107546.3 hits/s) (3 hits / 6)
bar : (33.3%) (71697.5 hits/s) (2 hits / 6)
foobar : (16.7%) (35848.8 hits/s) (1 hits / 6)
Parsed 6 lines in 0.00 secs (46863.7 lines/s)
a single regexp
[ay@wildbill logs]# for i in foo foo bar bar foo foobar;do echo $i;done | pipestat foo
foo : (66.7%) (100462.4 hits/s) (4 hits / 6)
: (33.3%) (50231.2 hits/s) (2 hits / 6)
Parsed 6 lines in 0.00 secs (25317.7 lines/s)
More real examples,
[ay@wildbill logs]# cat www-access_log | pipestat 'css' 'jpe?g' 'gif' 'png'
jpe?g : (30.3%) (24590.5 hits/s) (122851 hits / 405830)
css : (11.7%) (9494.6 hits/s) (47434 hits / 405830)
gif : (6.6%) (5355.4 hits/s) (26755 hits / 405830)
png : (0.4%) (341.3 hits/s) (1705 hits / 405830)
<rest> : (51.0%) (41451.2 hits/s) (207085 hits / 405830)
Parsed 405830 lines in 5.00 secs (81230.8 lines/s)
Or if you just want field number 5…
[root@wildbill logs]# cat www-access_log | pipestat --field=9 --limit=5
200 : (88.2%) (77506.3 hits/s) (350721 hits / 397549)
HTTP/1.1" : (3.3%) (2858.5 hits/s) (12935 hits / 397549)
404 : (2.9%) (2530.8 hits/s) (11452 hits / 397549)
301 : (2.4%) (2092.8 hits/s) (9470 hits / 397549)
403 : (1.5%) (1321.3 hits/s) (5979 hits / 397549)
<limited> : (1.8%) (1545.2 hits/s) (6992(1311) hits(uniq / 397549)
Parsed 397549 lines in 4.53 secs (87795.5 lines/s)
This takes a apache accesslogg and print out the 9th whitespace seperated field and writes out the 5 most common hits, the rest of the hits are sumed up in
So then we go into matchmodus.
[root@wildbill logs]# cat www-access_log | pipestat -m 'HTTP/1.1" (\d+)'
200 : (92.3%) (218082.1 hits/s) (375934 hits / 407426)
404 : (3.1%) (7229.9 hits/s) (12463 hits / 407426)
301 : (2.5%) (5910.1 hits/s) (10188 hits / 407426)
403 : (1.4%) (3339.1 hits/s) (5756 hits / 407426)
302 : (0.4%) (873.1 hits/s) (1505 hits / 407426)
405 : (0.0%) (37.1 hits/s) (64 hits / 407426)
206 : (0.0%) (19.1 hits/s) (33 hits / 407426)
304 : (0.0%) (4.6 hits/s) (8 hits / 407426)
<rest> : (0.4%) (855.7 hits/s) (1475 hits / 407426)
Parsed 407426 lines in 1.72 secs (236328.4 lines/s)
Or we could run it on the live log for say 60 seconds and what article is the most popular
[ay@joanie ~]$ varnishlog -i RxURL -I artid | pipestat --runtime=60 --limit 5 -m 'artid=(d+)'
542689 : (7.9%) (1.5 hits/s) (91 hits / 1154)
542739 : (7.5%) (1.4 hits/s) (86 hits / 1154)
542568 : (4.4%) (0.8 hits/s) (51 hits / 1154)
542723 : (4.1%) (0.8 hits/s) (47 hits / 1154)
542718 : (3.7%) (0.7 hits/s) (43 hits / 1154)
<limited> : (72.4%) (13.5 hits/s) (835(244) hits(uniq) / 1154)
<rest> : (0.1%) (0.0 hits/s) (1 hits / 1154)
Parsed 1154 lines in 61.80 secs (18.7 lines/s)
Have fun. The code is here