Friday, December 4, 2009

Geoiplookup script

I found a syntax error in one of our javascript files for the insertion of Google Analytics. This of course meant that the results were flatlined:


On December 4th, however we officially 'launched'. Their are now two problems:
  1. How much traffic did we get on Dec 2 - 4?
  2. Where did this traffic come from?
The first question is relatively easy to answer. Since we use nginx as a proxy for apache, I went over to /var/log/nginx and gunzip'd a few log files... nothing fancy.

How much traffic did we get on Dec 2 - 4
To find out how many unique visitors there were over the span of four log files we do the following pseudo-code:

print logs | show only the ip address for each line | find the unique ones | give me the count

This becomes

$ cat www.izuu.com.access.log www.izuu.com.access.log.[1-3] | awk ' { print $1 } ' | sort | uniq | wc -l
315

Not bad. To do it date based, we just add a grep line in the stack:

December 2nd:
$ cat www.izuu.com.access.log www.izuu.com.access.log.[1-3] | grep 02\/Dec | awk ' { print $1 } ' | sort | uniq | wc -l
52


December 3rd:

$ cat www.izuu.com.access.log www.izuu.com.access.log.[1-3] | grep 03\/Dec | awk ' { print $1 } ' | sort | uniq | wc -l
59


December 4th:
$ cat www.izuu.com.access.log www.izuu.com.access.log.[1-3] | grep 04\/Dec | awk ' { print $1 } ' | sort | uniq | wc -l
204


Nice healthy jump there. But wait, there's more.

Where did this traffic come from?

In order to do this I found a tool called geoiplookup - which is available in the apt repositories.

$ apt-cache search geo | grep IP
libgeoip-dev - Development files for the GeoIP library
libgeoip1 - A non-DNS IP-to-country resolver library
python-geoip - python bindings for the GeoIP IP-to-country resolver library
python-geoip-dbg - python bindings for the GeoIP IP-to-country resolver library (debug extension)
geoip-bin - IP lookup command line tools that use the GeoIP library
kipi-plugins - image manipulation/handling plugins for KIPI aware programs
libapache2-mod-geoip - GeoIP support for apache2
libgeo-ip-perl - Perl bindings for GeoIP library
php5-geoip - GeoIP module for php5
tclgeoip - Tcl extension implementing GeoIP lookup functions
tor-geoipdb - geoIP database for Tor

$ sudo apt-get install geoip-bin

The geoiplookup tool only comes with a database that narrows the IP address to a specific country --- which is not very interesting. However, using the magic oracle, I discovered a much more specific city-based database at http://geolite.maxmind.com/download/geoip/database/

so then I did a nice:
$ cd
$ wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz

Thought a few seconds... then gave up:

$ man geoiplookup

OPTIONS
-f Specify a custom path to a single GeoIP datafile.

-d Specify a custom directory containing GeoIP datafile(s). By default geoiplookup looks in /usr/share/GeoIP

Wow, that's damn confusing. Is this right?:

$ mkdir geo
$ mv GeoLiteCity.dat.gz geo
$ cd geo
$ gunzip GeoLiteCity.dat.gz
$ geoiplookup -f ~/geo/ 4.2.2.4
Error Traversing Database for ipnum = 67240452 - Perhaps database is corrupt?
Segmentation fault
$

Lovely. So the next step

$ strace !! |& less
...
brk(0) = 0x603000
brk(0x624000) = 0x624000
open("/home/chris/geo/", O_RDONLY) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f866a21f000
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
...

It's rather pathetic when strace gives you better documentation then the man page. It clearly wants the full path.

$ geoiplookup -f ~/geo/GeoLiteCity.dat 4.2.2.4
GeoIP City Edition, Rev 1: US, (null), (null), (null), 38.000000, -97.000000, 0, 0

Final Packaging
Ok, that's the ticket. Now we just do a little bit of xargs:
$ cat www.izuu.com.access.log www.izuu.com.access.log.[1-3] | grep 04\/Dec | awk ' { print $1 } ' | sort | uniq | xargs -n 1 geoiplookup -f ~/geo/GeoLiteCity.dat

That does the trick, but now we have this huge ass "GeoIP City Edition, Rev 1:" in front of everything. That's ok, I know sed ... let's make it ordered by country too:

$ cat www.izuu.com.access.log www.izuu.com.access.log.[1-3] | grep 04\/Dec | awk ' { print $1 } ' | sort | uniq | xargs -n 1 geoiplookup -f ~/geo/GeoLiteCity.dat | sort | sed s/^.\*:\ //g


Ah, almost complete. Now we just need to mail it off so I can forward it to the boss

$ cat www.izuu.com.access.log www.izuu.com.access.log.[1-3] | grep 04\/Dec | awk ' { print $1 } ' | sort | uniq | xargs -n 1 geoiplookup -f ~/geo/GeoLiteCity.dat | sort | sed s/^.\*:\ //g | mail cmckenzie


And there. Then I have a nice list of stuff to give to the boss that albeit, not graphical, is still digestible and better then a line at 0... sw33t.



1 comment:

Followers