Linux: Generating Web statistics from Old Apache logs with Webalizer

Thursday, 25th July 2013

Webalizer generate and visualize in web page statistics of old websites howto webalizer static html google analytics like statistics on linux logo

Often it happens, that some old hosted websites were created in a way so no Web Statistics are available. Almost all modern created websites nowadays are already set to use Google AnalyticsAnyhow every now and then I stumble on hosting clients whose websites creator didn't thought on how to track how many hits or unique visitors site gets in a month / year etc.
 Thanksfully this is solvable by good "uncle" admin with help with of Webalizer (with custom configuration) and a little bit of shell scripting.

The idea is simple, we take the old website logs located in lets say 
/var/log/apache2/www.website-access.log*,
move files to some custom created new directory lets say /root/www.website-access-logs/ and then configure webalizer to read and generate statistics based on log in there.

For the purpose, we have to have webalizer installed on Linux system. In my case this is Debian GNU / Linux.

For those who hear of Webalizer for first time here is short package description:

debian:~# apt-cache show webalizer|grep -i description -A 2

Description-en: web server log analysis program
The Webalizer was designed to scan web server log files in various formats
and produce usage statistics in HTML format for viewing through a browser.

 If webalizer is not installed still install it with:

debian:~# apt-get install --yes webalizer
...
.....

Then make backup copy of original / default webalizer.conf (very important step especially if server is already processing Apache log files with some custom webalizer configuration:

debian:~# cp -rpf /etc/webalizer/webalizer.conf /etc/webalizer/webalizer.conf.orig

Next step is to copy webalizer.conf with a name reminding of website of which logs will be processed, e.g.:

debian:~# cp -rpf /etc/webalizer/webalizer.conf /etc/webalizer/www.website-webalizer.conf

In www.website-webalizer.conf config file its necessary to edit at least 4 variables:

LogFile /var/log/apache2/access.log
OutputDir /var/www
#Incremental no
ReportTitle Usage statistics for

 Make sure after modifying 3 vars read something like:  
LogFile /root/www.website/access_log_merged_1.log
OutputDir /var/www/www.website
Incremental yes
ReportTitle Usage statistics for Your-Website-Host-Name.com

Next create /root/www.website and /var/www/www.website, then copy all files you need to process from /var/log/apache2/www.website* to /root/www.website:

debian:~# mkdir -p /root/www.website
debian:~# cp -rpf /var/log/apache2/www.website* /root/www.website

On Debian Apache uses logrotate to archive old log files, so all logs except www.website-access.log and wwww.website-access.log.1 are gzipped:

debian:~#  cd /root/www.website
debian:~# ls 
www.website-access.log.10.gz
www.website-access.log.11.gz
www.website-access.log.12.gz
www.website-access.log.13.gz
www.website-access.log.14.gz
www.website-access.log.15.gz
www.website-access.log.16.gz
www.website-access.log.17.gz
www.website-access.log.18.gz
www.website-access.log.19.gz
www.website-access.log.20.gz
...
 

Then we have to un-gzip zipped logs and create one merged file from all of them ready to be red later by Webalizer. To do so I use a tiny shell script like so:

for n in {52..1}; do gzip -d www.dobrudzhatour.net-access.log.$n.gz; done
for n in {52..1}; do cat www.dobrudzhatour.net-access.log.$n >> access_log_merged_1.log;
done

First look de-gzips and second one does create a merged file from all with name access_merged_1.log The range of log files in my case is from www.website-access.log.1 to www.website-access.log.52, thus I have in loop back number counting from 52 to 1.

Once access_log_merged_1.log is ready we can run webalizer to process file (Incremental) and generate all time statistics for www.website:

debian:~# webalizer -c /etc/webalizer/webalizer-www.website-webalizer.conf

Webalizer V2.01-10 (Linux 2.6.32-27-server) locale: en_US.UTF-8
Using logfile /root/www.website/access_log_merged_1.log (clf)
Using default GeoIP database Creating output in /var/www/webalizer-www.website
Hostname for reports is 'debian'
Reading history file… webalizer.hist
Reading previous run data.. webalizer.current
333474 records (333474 ignored) in 37.50 seconds, 8892/sec

To check out just generated statistics open in browser:

http://yourserverhost/webalizer-www.website/

or

http://IP_Address/webalizer-www.website

 You should see statistics pop-up, below is screenshot with my currently generated stats:

Webalizer website access statistics screenshot Debian GNU Linux

Share this on:

Download PDFDownload PDF

Tags: , , , , , , , , , , , , ,

2 Responses to “Linux: Generating Web statistics from Old Apache logs with Webalizer”

  1. Gigs says:
    Google Chrome 35.0.1916.153 Google Chrome 35.0.1916.153 GNU/Linux x64 GNU/Linux x64
    Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36

    You can just zcat -f /var/log/apache2/access.log* instead of looping

    View CommentView Comment
  2. oszust says:
    Opera 8.50 8.50 Opera 8.50 8.50 Windows XP Windows XP
    Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50

    Thank you, I’ve just been searching for information about this subject for ages and yours is the greatest I’ve came upon till now. But, what in regards to the bottom line? Are you certain about the source?

    View CommentView Comment

Leave a Reply

CommentLuv badge