On December 4th, however we officially 'launched'. Their are now two problems:
- How much traffic did we get on Dec 2 - 4?
- Where did this traffic come from?
How much traffic did we get on Dec 2 - 4
To find out how many unique visitors there were over the span of four log files we do the following pseudo-code:
print logs | show only the ip address for each line | find the unique ones | give me the count
This becomes
$ cat www.izuu.com.access.log www.izuu.com.access.log.[1-3] | awk ' { print $1 } ' | sort | uniq | wc -l
315
Not bad. To do it date based, we just add a grep line in the stack:
December 2nd:
$ cat www.izuu.com.access.log www.izuu.com.access.log.[1-3] | grep 02\/Dec | awk ' { print $1 } ' | sort | uniq | wc -l
52
December 3rd:
$ cat www.izuu.com.access.log www.izuu.com.access.log.[1-3] | grep 03\/Dec | awk ' { print $1 } ' | sort | uniq | wc -l
59
December 4th:
$ cat www.izuu.com.access.log www.izuu.com.access.log.[1-3] | grep 04\/Dec | awk ' { print $1 } ' | sort | uniq | wc -l
204
Nice healthy jump there. But wait, there's more.
Where did this traffic come from?
In order to do this I found a tool called geoiplookup - which is available in the apt repositories.
$ apt-cache search geo | grep IP
libgeoip-dev - Development files for the GeoIP library
libgeoip1 - A non-DNS IP-to-country resolver library
python-geoip - python bindings for the GeoIP IP-to-country resolver library
python-geoip-dbg - python bindings for the GeoIP IP-to-country resolver library (debug extension)
geoip-bin - IP lookup command line tools that use the GeoIP library
kipi-plugins - image manipulation/handling plugins for KIPI aware programs
libapache2-mod-geoip - GeoIP support for apache2
libgeo-ip-perl - Perl bindings for GeoIP library
php5-geoip - GeoIP module for php5
tclgeoip - Tcl extension implementing GeoIP lookup functions
tor-geoipdb - geoIP database for Tor
$ sudo apt-get install geoip-bin
The geoiplookup tool only comes with a database that narrows the IP address to a specific country --- which is not very interesting. However, using the magic oracle, I discovered a much more specific city-based database at http://geolite.maxmind.com/download/geoip/database/
so then I did a nice:
$ cd
$ wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz
Thought a few seconds... then gave up:
$ man geoiplookup
OPTIONSWow, that's damn confusing. Is this right?:
-f Specify a custom path to a single GeoIP datafile.
-d Specify a custom directory containing GeoIP datafile(s). By default geoiplookup looks in /usr/share/GeoIP
$ mkdir geo
$ mv GeoLiteCity.dat.gz geo
$ cd geo
$ gunzip GeoLiteCity.dat.gz
$ geoiplookup -f ~/geo/ 4.2.2.4
Error Traversing Database for ipnum = 67240452 - Perhaps database is corrupt?
Segmentation fault
$
Lovely. So the next step
$ strace !! |& less
...
brk(0) = 0x603000
brk(0x624000) = 0x624000
open("/home/chris/geo/", O_RDONLY) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f866a21f000
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
...
It's rather pathetic when strace gives you better documentation then the man page. It clearly wants the full path.
$ geoiplookup -f ~/geo/GeoLiteCity.dat 4.2.2.4
GeoIP City Edition, Rev 1: US, (null), (null), (null), 38.000000, -97.000000, 0, 0
Final Packaging
Ok, that's the ticket. Now we just do a little bit of xargs:
$ cat www.izuu.com.access.log www.izuu.com.access.log.[1-3] | grep 04\/Dec | awk ' { print $1 } ' | sort | uniq | xargs -n 1 geoiplookup -f ~/geo/GeoLiteCity.dat
That does the trick, but now we have this huge ass "GeoIP City Edition, Rev 1:" in front of everything. That's ok, I know sed ... let's make it ordered by country too:
$ cat www.izuu.com.access.log www.izuu.com.access.log.[1-3] | grep 04\/Dec | awk ' { print $1 } ' | sort | uniq | xargs -n 1 geoiplookup -f ~/geo/GeoLiteCity.dat | sort | sed s/^.\*:\ //g
Ah, almost complete. Now we just need to mail it off so I can forward it to the boss
$ cat www.izuu.com.access.log www.izuu.com.access.log.[1-3] | grep 04\/Dec | awk ' { print $1 } ' | sort | uniq | xargs -n 1 geoiplookup -f ~/geo/GeoLiteCity.dat | sort | sed s/^.\*:\ //g | mail cmckenzie
And there. Then I have a nice list of stuff to give to the boss that albeit, not graphical, is still digestible and better then a line at 0... sw33t.
This comment has been removed by a blog administrator.
ReplyDelete