Posts Tagged ‘weblog’

How to install Awstats Apache weblog statistics on Debian Squeeze GNU Linux

Monday, October 8th, 2012

I like using Webalizer to keep an eye in web of my access.log, however since the information it shows is a bit chaotic and much less than one in Awstats statistics, I decided to install awstats. I haven’t installed awstats for a long time so I have no exact memory how I previously did it and hence run quick search too see if there is information on specifics concerning Debian Squeeze. I did not find any specific article and therefore decided to write this short one to document how awstats install is done on Debian Squeeze Linux.

1. Installing awstats deb package

There is already a deb package so no need to hunt for specific perl CPAN modules and manually fulfill dependencies.

Installation is as straight as any other deb package:


debian:~# apt-get install --yes awstats
...

2. Change basic awstats.conf configurations to make it properly working

First thing to do immediately after install is to set the primary SiteDomain= for which Awstats will process site statistics.

For that in the beginning (first line) of /etc/awstats/awstats.conf add directive:


SiteDomain="www.your-domain-name.com"

Substitution www.your-domain-name.com with whatever your primary domain will be.

Next in /etc/awstats/awstats.conf change value for DNSLookup. By default DNSLookup is 1, which means each of the IP request in /var/log/apache2/access.log is attempted be resolved via separate DNS request; Most IP Addresses that have quieried Apache webserver however miss proper PTR DNS record and hence attempts to be resolved fail after 10 to 20 seconds.
The overall result of this is processing execution of /var/log/apache2/access.log takes hours in case access.log is >100MB or so. This slow processing slowness is due to failed DNS requests. Besides that it does useless hundreds of queries to DNS servers which take up bandwidth for nothing …

To prevent this I disabled immediately DNSLookup right after install by substituting


DNSLookup=1

with:


DNSLookup=0

Other thing is by default Awstats is set to use LogFormat=4. As you can read in awstats.conf (Comments section) 4 stands for:


# 4 - Apache or Squid native common log format (NCSA common/CLF log format)

However in Debian Linux Apache2 default config is done in a way that Apache keeps logged requests formatted in combined (not in common log

Here is LogFormat directive extracted from /etc/apache2/apache2.conf:


LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
LogFormat "%h %l %u %t \"%r\" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent

With that said in awstats.conf to match (combined) Apache set logging change LogFormat to 1:


LogFormat=1

3. Generate manual AWStats access.log statistics for first time

You will have to run as superuser following cmd:


debian:~# /usr/lib/cgi-bin/awstats.pl -config=/etc/awstats/awstats.conf
Create/Update database for config "/etc/awstats/awstats.conf" by AWStats version 6.95 (build 1.943)
From data in log file "/var/log/apache2/access.log"...
Phase 1 : First bypass old records, searching new record...
Searching new records from beginning of log file...
Phase 2 : Now process new records (Flush history on disk after 20000 hosts)...
Flush history file on disk (unique url reach flush limit of 5000)
Flush history file on disk (unique url reach flush limit of 5000)
Flush history file on disk (unique url reach flush limit of 5000)
Flush history file on disk (unique url reach flush limit of 5000)
Flush history file on disk (unique url reach flush limit of 5000)
Flush history file on disk (unique url reach flush limit of 5000)
Flush history file on disk (unique url reach flush limit of 5000)
Flush history file on disk (unique url reach flush limit of 5000)
Flush history file on disk (unique url reach flush limit of 5000)
Flush history file on disk (unique url reach flush limit of 5000)
Jumped lines in file: 0
Parsed lines in file: 602983
Found 8 dropped records,
Found 5 corrupted records,
Found 0 old records,
Found 602970 new qualified records.

4. Access awstats statistics in Web Browser

Once the command execution completes, open in Epiphany, Firefox or whatever browser you like URL:


http://www.your-domain-name.com/cgi-bin//awstats.pl?config

If all is okay you should see some numbers on Unique Visitors like in below browser screenshot:

Screenshot Awstats example Statistics for www.pc-freak.net in Epiphany

5. Set ScriptAlias for easier awstats access path and set directory permissions

In /etc/apache2/apache2.conf or in VirtualHost file, lets say /etc/apache2/sites-enabled/your-domain-name.com, place following configs:


Alias /awstats-icon/ /usr/share/awstats/icon/
ScriptAlias /awstats/ /usr/lib/cgi-bin/

Options None
AllowOverride None
Order allow,deny
Allow from all

For new configs to take effect as usual Apache should be restarted:


debian:~# /etc/init.d/apache2 restart
....

From now on Awstats can be accessed via the much easier to remember access URL:


http://your-domain-name.com/awstats/awstats.pl

6. Protecting Awstats statistics with Apache HTACCESS password

It is a must to protect awstats statistics with password via .htaccess and htpasswd

a.) Use htpasswd to generate user/pass:


linux:~# htpasswd -c /etc/apache2/awstats.passwd admin
New password:
Re-type new password:
Adding password for user admin

b.) Create /usr/lib/cgi-bin/.htaccess with following content:


linux:~# vim /usr/lib/cgi-bin/.htaccess

AuthType Basic
AuthUserFile /etc/apache2/awstats.passwd
AuthGroupFile /dev/null
AuthName "Please Enter Password to access AWstat"
AuthType Basic
Require valid-user

7. Set awstats to generate statistics via daily cron job:

awstats binary deb automatically installs a cron job in /etc/cron.d/awstats:


linux:~# cat /etc/cron.d/awstats
*/10 * * * * www-data [ -x /usr/share/awstats/tools/update.sh ] && /usr/share/awstats/tools/update.sh
# Generate static reports:
10 03 * * * www-data [ -x /usr/share/awstats/tools/buildstatic.sh ] && /usr/share/awstats/tools/buildstatic.sh

I prefer not to use it but use a custom root cron job. To stop /etc/cron.d/awstats from executing I move it to /root:


mv /etc/cron.d/awstats /root

Then I set a new root user cron job to process Apache access.log. The reason I use root user crontab, instead of Apache’s www-data is with www-data user, /var/log/apache2/access.log is unreadable ,…


linux:~# crontab -u root -e
8,18,27,38,48,59 * * * * [ -x /usr/lib/cgi-bin/awstats.pl -a -f /etc/awstats/awstats.conf -a -r /var/log/apache2/access.log ] && /usr/lib/cgi-bin/awstats.pl -config=awstats -update >/dev/null

This cron makes awstats web statistics be refreshed every our in minutes 8,18,27,38,48,59.

That’s it. Enjoy 🙂