Posts Tagged ‘inside’

Apache disable requests to not log to access.log Logfile through SetEnvIf and dontlog httpd variables

Monday, October 11th, 2021

apache-disable-certain-strings-from-logging-to-access-log-logo

Logging to Apache access.log is mostly useful as this is a great way to keep log on who visited your website and generate periodic statistics with tools such as Webalizer or Astats to keep track on your visitors and generate various statistics as well as see the number of new visitors as well most visited web pages (the pages which mostly are attracting your web visitors), once the log analysis tool generates its statistics, it can help you understand better which Web spiders visit your website the most (as spiders has a predefined) IP addresses, which can give you insight on various web spider site indexation statistics on Google, Yahoo, Bing etc. . Sometimes however either due to bugs in web spiders algorithms or inconsistencies in your website structure, some of the web pages gets double visited records inside the logs, this could happen for example if your website uses to include iframes.

Having web pages accessed once but logged to be accessed twice hence is erroneous and unwanted, and though that usually have to be fixed by the website programmers, if such approach is not easily doable in the moment and the website is running on critical production system, the double logging of request can be omitted thanks to a small Apache log hack with SetEnvIf Apache config directive. Even if there is no double logging inside Apache log happening it could be that some cron job or automated monitoring scripts or tool such as monit is making periodic requests to Apache and this is garbling your Log Statistics results.

In this short article hence I'll explain how to do remove certain strings to not get logged inside /var/log/httpd/access.log.

1. Check SetEnvIf is Loaded on the Webserver
 

On CentOS / RHEL Linux:

# /sbin/apachectl -M |grep -i setenvif
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using localhost.localdomain. Set the 'ServerName' directive globally to suppress this message
 setenvif_module (shared)


On Debian / Ubuntu Linux:

/usr/sbin/apache2ctl -M |grep -i setenvif
AH00548: NameVirtualHost has no effect and will be removed in the next release /etc/apache2/sites-enabled/000-default.conf:1
 setenvif_module (shared)


2. Using SetEnvIf to omit certain string to get logged inside apache access.log


SetEnvIf could be used either in some certain domain VirtualHost configuration (if website is configured so), or it can be set as a global Apache rule from the /etc/httpd/conf/httpd.conf 

To use SetEnvIf  you have to place it inside a <Directory …></Directory> configuration block, if it has to be enabled only for a Certain Apache configured directory, otherwise you have to place it in the global apache config section.

To be able to use SetEnvIf, only in a certain directories and subdirectories via .htaccess, you will have defined in <Directory>

AllowOverride FileInfo


The general syntax to omit a certain Apache repeating string from keep logging with SetEnvIf is as follows:
 

SetEnvIf Request_URI "^/WebSiteStructureDirectory/ACCESS_LOG_STRING_TO_REMOVE$" dontlog


General syntax for SetEnvIf is as follows:

SetEnvIf attribute regex env-variable

SetEnvIf attribute regex [!]env-variable[=value] [[!]env-variable[=value]] …

Below is the overall possible attributes to pass as described in mod_setenvif official documentation.
 

  • Host
  • User-Agent
  • Referer
  • Accept-Language
  • Remote_Host: the hostname (if available) of the client making the request.
  • Remote_Addr: the IP address of the client making the request.
  • Server_Addr: the IP address of the server on which the request was received (only with versions later than 2.0.43).
  • Request_Method: the name of the method being used (GET, POST, etc.).
  • Request_Protocol: the name and version of the protocol with which the request was made (e.g., "HTTP/0.9", "HTTP/1.1", etc.).
  • Request_URI: the resource requested on the HTTP request line – generally the portion of the URL following the scheme and host portion without the query string.

Next locate inside the configuration the line:

CustomLog /var/log/apache2/access.log combined


To enable filtering of included strings, you'll have to append env=!dontlog to the end of line.

 

CustomLog /var/log/apache2/access.log combined env=!dontlog

 

You might be using something as cronolog for log rotation to prevent your WebServer logs to become too big in size and hard to manage, you can append env=!dontlog to it in same way.

If you haven't used cronolog is it is perhaps best to show you the package description.

server:~# apt-cache show cronolog|grep -i description -A10 -B5
Version: 1.6.2+rpk-2
Installed-Size: 63
Maintainer: Debian QA Group <packages@qa.debian.org>
Architecture: amd64
Depends: perl:any, libc6 (>= 2.4)
Description-en: Logfile rotator for web servers
 A simple program that reads log messages from its input and writes
 them to a set of output files, the names of which are constructed
 using template and the current date and time.  The template uses the
 same format specifiers as the Unix date command (which are the same
 as the standard C strftime library function).
 .
 It intended to be used in conjunction with a Web server, such as
 Apache, to split the access log into daily or monthly logs:
 .
   TransferLog "|/usr/bin/cronolog /var/log/apache/%Y/access.%Y.%m.%d.log"
 .
 A cronosplit script is also included, to convert existing
 traditionally-rotated logs into this rotation format.

Description-md5: 4d5734e5e38bc768dcbffccd2547922f
Homepage: http://www.cronolog.org/
Tag: admin::logging, devel::lang:perl, devel::library, implemented-in::c,
 implemented-in::perl, interface::commandline, role::devel-lib,
 role::program, scope::utility, suite::apache, use::organizing,
 works-with::logfile
Section: web
Priority: optional
Filename: pool/main/c/cronolog/cronolog_1.6.2+rpk-2_amd64.deb
Size: 27912
MD5sum: 215a86766cc8d4434cd52432fd4f8fe7

If you're using cronolog to daily rotate the access.log and you need to filter out the strings out of the logs, you might use something like in httpd.conf:

 

CustomLog "|/usr/bin/cronolog –symlink=/var/log/httpd/access.log /var/log/httpd/access.log_%Y_%m_%d" combined env=!dontlog


 

3. Disable Apache logging access.log from certain USERAGENT browser
 

You can do much more with SetEnvIf for example you might want to omit logging requests from a UserAgent (browser) to end up in /dev/null (nowhere), e.g. prevent any Website requests originating from Internet Explorer (MSIE) to not be logged.

SetEnvIf User_Agent "(MSIE)" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog


4. Disable Apache logging from requests coming from certain FQDN (Fully Qualified Domain Name) localhost 127.0.0.1 or concrete IP / IPv6 address

SetEnvIf Remote_Host "dns.server.com$" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog


Of course for this to work, your website should have a functioning DNS servers and Apache should be configured to be able to resolve remote IPs to back resolve to their respective DNS defined Hostnames.

SetEnvIf recognized also perl PCRE Regular Expressions, if you want to filter out of Apache access log requests incoming from multiple subdomains starting with a certain domain hostname.

 

SetEnvIf Remote_Host "^example" dontlog

– To not log anything coming from localhost.localdomain address ( 127.0.0.1 ) as well as from some concrete IP address :

SetEnvIf Remote_Addr "127\.0\.0\.1" dontlog

SetEnvIf Remote_Addr "192\.168\.1\.180" dontlog

– To disable IPv6 requests that be coming at the log even though you don't happen to use IPv6 at all

SetEnvIf Request_Addr "::1" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog


– Note here it is obligatory to escape the dots '.'


5. Disable robots.txt Web Crawlers requests from being logged in access.log

SetEnvIf Request_URI "^/robots\.txt$" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog

Using SetEnvIfNoCase to read incoming useragent / Host / file requests case insensitve

The SetEnvIfNoCase is to be used if you want to threat incoming originators strings as case insensitive, this is useful to omit extraordinary regular expression SetEnvIf rules for lower upper case symbols.

SetEnvIFNoCase User-Agent "Slurp/cat" dontlog
SetEnvIFNoCase User-Agent "Ask Jeeves/Teoma" dontlog
SetEnvIFNoCase User-Agent "Googlebot" dontlog
SetEnvIFNoCase User-Agent "bingbot" dontlog
SetEnvIFNoCase Remote_Host "fastsearch.net$" dontlog

Omit from access.log logging some standard web files .css , .js .ico, .gif , .png and Referrals from own domain

Sometimes your own site scripts do refer to stuff on your own domain that just generates junks in the access.log to keep it off.

SetEnvIfNoCase Request_URI "\.(gif)|(jpg)|(png)|(css)|(js)|(ico)|(eot)$" dontlog

 

SetEnvIfNoCase Referer "www\.myowndomain\.com" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog

 

6. Disable Apache requests in access.log and error.log completely


Sometimes at rare cases the produced Apache logs and error log is really big and you already have the requests logged in another F5 Load Balancer or Haproxy in front of Apache WebServer or alternatively the logging is not interesting at all as the Web Application served written in ( Perl / Python / Ruby ) does handle the logging itself. 
I've earlier described how this is done in a good amount of details in previous article Disable Apache access.log and error.log logging on Debian Linux and FreeBSD

To disable it you will have to comment out CustomLog or set it to together with ErrorLog to /dev/null in apache2.conf / httpd.conf (depending on the distro)
 

CustomLog /dev/null
ErrorLog /dev/null


7. Restart Apache WebServer to load settings
 

An important to mention is in case you have Webserver with multiple complex configurations and there is a specific log patterns to omit from logs it might be a very good idea to:

a. Create /etc/httpd/conf/dontlog.conf / etc/apache2/dontlog.conf
add inside all your custom dontlog configurations
b. Include dontlog.conf from /etc/httpd/conf/httpd.conf / /etc/apache2/apache2.conf

Finally to make the changes take affect, of course you will need to restart Apache webserver depending on the distro and if it is with systemd or System V:

For systemd RPM based distro:

systemctl restart httpd

or for Deb based Debian etc.

systemctl apache2 restart

On old System V scripts systems:

On RedHat / CentOS etc. restart Apache with:
 

/etc/init.d/httpd restart


On Deb based SystemV:
 

/etc/init.d/apache2 restart


What we learned ?
 

We have learned about SetEnvIf how it can be used to prevent certain requests strings getting logged into access.log through dontlog, how to completely stop certain browser based on a useragent from logging to the access.log as well as how to omit from logging certain requests incoming from certain IP addresses / IPv6 or FQDNs and how to stop robots.txt from being logged to httpd log.


Finally we have learned how to completely disable Apache logging if logging is handled by other external application.
 

Convert .doc to .pdf on Linux and BSD using console / Convertion of PDF to DOC inside scripts

Tuesday, November 13th, 2012

how to Convert .doc to .PDF using console / terminal on Linux and FreeBSD

On Linux, there are plenty of ways nowadays to convert Microsoft Word or OpenOffice .DOC documents to Adobe's PDF (Postscript). However most of the ways require a graphical environment. As I'm interested in how convertion is done mainly from console to suit shell scripts and php which has to routinely convert a bunch of .DOC files to .PDF. I've checked today how PDF to DOC is possible on Debian, Ubuntu, Arch Linux  and FreeBSD..

There are few tools one can use from console, that doesn't requiere you to have running Xorg on the convertion host. The quality of the produced converted document, may vary and with some Microsoft Office doc files, there might be some garbage. But generally for simplistic and well written "macros" free documents the quality of PDF is satisfactory with few of the tools.

Here I will list the few tools, one can use for convertion:

  • abiword – you probably know abiword GUI program which is a good substitute for people who doesn't want the huge openoffice on the host. interestingly abiword supports converts with no need for GUI
     
  • wvPDF (you have to have install wv package and usually this converter works well only with very old .DOC (MS Office 97) – I was not impressed with those convert results
     
  • oowriter / swriter (whether LibreOffice installed) or writer (on LibreOffice), on some Ubuntus and derivatives the equivalent cmd is lowriter
     
  • unoconv – this tool produces really good DOC to  PDF converts, it is a python script using openoffice / libreoffice as backend convertion engine so produced PDFs will be identical like the ones produced with oowriter, the pros of the tool is its syntax is very user friendly and along with PDF to DOC it supports easy syntax converting to  bunch of other file formats. Actually unoconv supports same convertions which supported by OpenOffice.org, the advantage is however you can use it within console and even schedule convertion to be processed by a remote host.

 

1. Convertion of DOC to PDF with abiword

abiword --to=pdf doc_file_to_convert.doc

2. Convert DOC to PDF with wvPDF

apt-get install --yes wv texlive-base texlive-latex-base ghostscript

wvPDF doc-file-to-convert-to-pdf.doc converted-to-pdf.pdf

wvPDF doc-file-to-convert-to-pdf.doc convert-to-pdf.pdf

Current directory: /home/hipo/Desktop
"doc-file-to-convert-to-pdf.eps" exists - skipping...
Some problem running latex.
Check for Errors in steinway.log
Continuing... 

 

The produced .pdf was not useful most of the text inside was completely missing as well as some weird probably PostScript convertion characters were in the .PDF. Seeing its output I would as of time of writing wvPDF Debian's verion 1.2.4 is crap.


3. Convert DOC to PDF with oowriter / swriter / lowrite

a) convert with oowriter and swriter

I saw posts online claiming DOC to PDF convertion is possible directly with oowriter or swriters with commands:

oowriter -convert-to pdf:writer_pdf_Export input-doc-file-to-convert.doc

or

swriter -convert-to pdf:writer_pdf_Export steinway.doc - as named on some Linux-es
 

As long as I tested it on my Debian Squeeze, neither of the two works
.I saw some suggestions that PDF can be generated by installing and using cups-pdf debian package:

apt-get install cups-pdf oowriter -pt pdf your_word_file.doc b) convert DOC to PDF with lowriter I've seen in Ubuntu documentation and in Ubuntu forums, users saying they had some good results using lowriter, which is a sort of front-end program to ImageMagick's convert. I never tested that but I doubt of any satisfactory results, as I tried converting to PDF earlier using convert and often converts failed. Anyways you try it with:

lowriter --convert-to pdf *.doc

 

4. Converting PDF to DOC  with unoconv

As of time of writing it seems unoconv is best Linux console tool for converting .doc to .pdf

It produces good readable text, as well as pictures and elements looks exactly as in OpenOffice.

To install it I run:

# apt-get install --yes unoconv
....

To use it:

$ unoconv -fpdf any-file-to-convert.doc

If you don't get errors or it doesn't crash a .doc file with same name any-file-to-convert.doc is created.

What unoconv, does is precisely the same as if using OpenOffice GUI's  to convert to PDF:

 

  • Open -> Open Office (3.2 in my case)
  • Open Document to export
  • File->Export as PDF
  • Click: Export
  • Choose file namefor output PDF


An interesting feature of unoconv is its possibility to run and convert as a port listening server. I never used this but noticed it mentioned in manual EXAMPLE section:

 

EXAMPLES
       You can use unoconv in standalone mode, this means that in absence of an OpenOffice listener, it will starts its own:

       unoconv -f pdf some-document.odt
       One can use unoconv as a listener (by default localhost:2002) to let other unoconv instances connect to it:

       unoconv --listener &
       unoconv -f pdf some-document.odt
       unoconv -f doc other-document.odt
       unoconv -f jpg some-image.png
       unoconv -f xsl some-spreadsheet.csv
       kill -15 %-
       This also works on a remote host:

       unoconv --listener --server 1.2.3.4 --port 4567
       and then connect another system to convert documents:

       unoconv --server 1.2.3.4 --port 4567

unoconv does not recognize wildcards like ' * ' , so in order to convert multiple DOC to PDF files one has to use the usual shell loop:

for i in *.doc; do unoconv -fpdf $i; done

From all my tests, I think unoconv is preferred tool for Linux and BSD users (good time to mention unoconv is available on FreeBSD too. BSD users can install it via port  /usr/ports/textproc/unoconv)