Posts Tagged ‘request’

Apache disable requests to not log to access.log Logfile through SetEnvIf and dontlog httpd variables

Monday, October 11th, 2021

apache-disable-certain-strings-from-logging-to-access-log-logo

Logging to Apache access.log is mostly useful as this is a great way to keep log on who visited your website and generate periodic statistics with tools such as Webalizer or Astats to keep track on your visitors and generate various statistics as well as see the number of new visitors as well most visited web pages (the pages which mostly are attracting your web visitors), once the log analysis tool generates its statistics, it can help you understand better which Web spiders visit your website the most (as spiders has a predefined) IP addresses, which can give you insight on various web spider site indexation statistics on Google, Yahoo, Bing etc. . Sometimes however either due to bugs in web spiders algorithms or inconsistencies in your website structure, some of the web pages gets double visited records inside the logs, this could happen for example if your website uses to include iframes.

Having web pages accessed once but logged to be accessed twice hence is erroneous and unwanted, and though that usually have to be fixed by the website programmers, if such approach is not easily doable in the moment and the website is running on critical production system, the double logging of request can be omitted thanks to a small Apache log hack with SetEnvIf Apache config directive. Even if there is no double logging inside Apache log happening it could be that some cron job or automated monitoring scripts or tool such as monit is making periodic requests to Apache and this is garbling your Log Statistics results.

In this short article hence I'll explain how to do remove certain strings to not get logged inside /var/log/httpd/access.log.

1. Check SetEnvIf is Loaded on the Webserver
 

On CentOS / RHEL Linux:

# /sbin/apachectl -M |grep -i setenvif
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using localhost.localdomain. Set the 'ServerName' directive globally to suppress this message
 setenvif_module (shared)


On Debian / Ubuntu Linux:

/usr/sbin/apache2ctl -M |grep -i setenvif
AH00548: NameVirtualHost has no effect and will be removed in the next release /etc/apache2/sites-enabled/000-default.conf:1
 setenvif_module (shared)


2. Using SetEnvIf to omit certain string to get logged inside apache access.log


SetEnvIf could be used either in some certain domain VirtualHost configuration (if website is configured so), or it can be set as a global Apache rule from the /etc/httpd/conf/httpd.conf 

To use SetEnvIf  you have to place it inside a <Directory …></Directory> configuration block, if it has to be enabled only for a Certain Apache configured directory, otherwise you have to place it in the global apache config section.

To be able to use SetEnvIf, only in a certain directories and subdirectories via .htaccess, you will have defined in <Directory>

AllowOverride FileInfo


The general syntax to omit a certain Apache repeating string from keep logging with SetEnvIf is as follows:
 

SetEnvIf Request_URI "^/WebSiteStructureDirectory/ACCESS_LOG_STRING_TO_REMOVE$" dontlog


General syntax for SetEnvIf is as follows:

SetEnvIf attribute regex env-variable

SetEnvIf attribute regex [!]env-variable[=value] [[!]env-variable[=value]] …

Below is the overall possible attributes to pass as described in mod_setenvif official documentation.
 

  • Host
  • User-Agent
  • Referer
  • Accept-Language
  • Remote_Host: the hostname (if available) of the client making the request.
  • Remote_Addr: the IP address of the client making the request.
  • Server_Addr: the IP address of the server on which the request was received (only with versions later than 2.0.43).
  • Request_Method: the name of the method being used (GET, POST, etc.).
  • Request_Protocol: the name and version of the protocol with which the request was made (e.g., "HTTP/0.9", "HTTP/1.1", etc.).
  • Request_URI: the resource requested on the HTTP request line – generally the portion of the URL following the scheme and host portion without the query string.

Next locate inside the configuration the line:

CustomLog /var/log/apache2/access.log combined


To enable filtering of included strings, you'll have to append env=!dontlog to the end of line.

 

CustomLog /var/log/apache2/access.log combined env=!dontlog

 

You might be using something as cronolog for log rotation to prevent your WebServer logs to become too big in size and hard to manage, you can append env=!dontlog to it in same way.

If you haven't used cronolog is it is perhaps best to show you the package description.

server:~# apt-cache show cronolog|grep -i description -A10 -B5
Version: 1.6.2+rpk-2
Installed-Size: 63
Maintainer: Debian QA Group <packages@qa.debian.org>
Architecture: amd64
Depends: perl:any, libc6 (>= 2.4)
Description-en: Logfile rotator for web servers
 A simple program that reads log messages from its input and writes
 them to a set of output files, the names of which are constructed
 using template and the current date and time.  The template uses the
 same format specifiers as the Unix date command (which are the same
 as the standard C strftime library function).
 .
 It intended to be used in conjunction with a Web server, such as
 Apache, to split the access log into daily or monthly logs:
 .
   TransferLog "|/usr/bin/cronolog /var/log/apache/%Y/access.%Y.%m.%d.log"
 .
 A cronosplit script is also included, to convert existing
 traditionally-rotated logs into this rotation format.

Description-md5: 4d5734e5e38bc768dcbffccd2547922f
Homepage: http://www.cronolog.org/
Tag: admin::logging, devel::lang:perl, devel::library, implemented-in::c,
 implemented-in::perl, interface::commandline, role::devel-lib,
 role::program, scope::utility, suite::apache, use::organizing,
 works-with::logfile
Section: web
Priority: optional
Filename: pool/main/c/cronolog/cronolog_1.6.2+rpk-2_amd64.deb
Size: 27912
MD5sum: 215a86766cc8d4434cd52432fd4f8fe7

If you're using cronolog to daily rotate the access.log and you need to filter out the strings out of the logs, you might use something like in httpd.conf:

 

CustomLog "|/usr/bin/cronolog –symlink=/var/log/httpd/access.log /var/log/httpd/access.log_%Y_%m_%d" combined env=!dontlog


 

3. Disable Apache logging access.log from certain USERAGENT browser
 

You can do much more with SetEnvIf for example you might want to omit logging requests from a UserAgent (browser) to end up in /dev/null (nowhere), e.g. prevent any Website requests originating from Internet Explorer (MSIE) to not be logged.

SetEnvIf User_Agent "(MSIE)" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog


4. Disable Apache logging from requests coming from certain FQDN (Fully Qualified Domain Name) localhost 127.0.0.1 or concrete IP / IPv6 address

SetEnvIf Remote_Host "dns.server.com$" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog


Of course for this to work, your website should have a functioning DNS servers and Apache should be configured to be able to resolve remote IPs to back resolve to their respective DNS defined Hostnames.

SetEnvIf recognized also perl PCRE Regular Expressions, if you want to filter out of Apache access log requests incoming from multiple subdomains starting with a certain domain hostname.

 

SetEnvIf Remote_Host "^example" dontlog

– To not log anything coming from localhost.localdomain address ( 127.0.0.1 ) as well as from some concrete IP address :

SetEnvIf Remote_Addr "127\.0\.0\.1" dontlog

SetEnvIf Remote_Addr "192\.168\.1\.180" dontlog

– To disable IPv6 requests that be coming at the log even though you don't happen to use IPv6 at all

SetEnvIf Request_Addr "::1" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog


– Note here it is obligatory to escape the dots '.'


5. Disable robots.txt Web Crawlers requests from being logged in access.log

SetEnvIf Request_URI "^/robots\.txt$" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog

Using SetEnvIfNoCase to read incoming useragent / Host / file requests case insensitve

The SetEnvIfNoCase is to be used if you want to threat incoming originators strings as case insensitive, this is useful to omit extraordinary regular expression SetEnvIf rules for lower upper case symbols.

SetEnvIFNoCase User-Agent "Slurp/cat" dontlog
SetEnvIFNoCase User-Agent "Ask Jeeves/Teoma" dontlog
SetEnvIFNoCase User-Agent "Googlebot" dontlog
SetEnvIFNoCase User-Agent "bingbot" dontlog
SetEnvIFNoCase Remote_Host "fastsearch.net$" dontlog

Omit from access.log logging some standard web files .css , .js .ico, .gif , .png and Referrals from own domain

Sometimes your own site scripts do refer to stuff on your own domain that just generates junks in the access.log to keep it off.

SetEnvIfNoCase Request_URI "\.(gif)|(jpg)|(png)|(css)|(js)|(ico)|(eot)$" dontlog

 

SetEnvIfNoCase Referer "www\.myowndomain\.com" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog

 

6. Disable Apache requests in access.log and error.log completely


Sometimes at rare cases the produced Apache logs and error log is really big and you already have the requests logged in another F5 Load Balancer or Haproxy in front of Apache WebServer or alternatively the logging is not interesting at all as the Web Application served written in ( Perl / Python / Ruby ) does handle the logging itself. 
I've earlier described how this is done in a good amount of details in previous article Disable Apache access.log and error.log logging on Debian Linux and FreeBSD

To disable it you will have to comment out CustomLog or set it to together with ErrorLog to /dev/null in apache2.conf / httpd.conf (depending on the distro)
 

CustomLog /dev/null
ErrorLog /dev/null


7. Restart Apache WebServer to load settings
 

An important to mention is in case you have Webserver with multiple complex configurations and there is a specific log patterns to omit from logs it might be a very good idea to:

a. Create /etc/httpd/conf/dontlog.conf / etc/apache2/dontlog.conf
add inside all your custom dontlog configurations
b. Include dontlog.conf from /etc/httpd/conf/httpd.conf / /etc/apache2/apache2.conf

Finally to make the changes take affect, of course you will need to restart Apache webserver depending on the distro and if it is with systemd or System V:

For systemd RPM based distro:

systemctl restart httpd

or for Deb based Debian etc.

systemctl apache2 restart

On old System V scripts systems:

On RedHat / CentOS etc. restart Apache with:
 

/etc/init.d/httpd restart


On Deb based SystemV:
 

/etc/init.d/apache2 restart


What we learned ?
 

We have learned about SetEnvIf how it can be used to prevent certain requests strings getting logged into access.log through dontlog, how to completely stop certain browser based on a useragent from logging to the access.log as well as how to omit from logging certain requests incoming from certain IP addresses / IPv6 or FQDNs and how to stop robots.txt from being logged to httpd log.


Finally we have learned how to completely disable Apache logging if logging is handled by other external application.
 

Briefly unavailable for scheduled maintenance. Fix WordPress after interrupted upgrade

Thursday, March 2nd, 2017

briefly-unavaiable-for-a-scheduled-maintenance-wordpress-website-fix-howto_1

I've recenty tried Update my WordPress blog sites and being unattentive I've selected all the plugins possitble for Upgrade by checking the "Select All" check box on the Update dialogs and almost automatically int he hurry pressing Update button however out of a sudden I've realized I could screw up my websites brutally as some of the plugins to upgraded might be lacking 100% compitability with their prior versions.

I've made a messes out of my blog many times during upgrades because of choosing to upgrade the wrong not 100% compatible plugins and I know well how painful and hard to track it could be a misbehaving incompatible plugin or how ot could cause a severe sluggishness to blog which automatically reflects on how well the website search engine ranked in Google / Yahoo / Bing indexed etc.

Thus as an almost unconcsious reaction to prevent myself the future troubles I've tried to cancel the update request in Firefox browser and trying to reload the Update page with a hope that I might be quick enough for the Apache / WP / MySQLbackend Update Update queries request to be delaying for processing but I was too slow and bang! I ended up with the following unpleasent message in my browser:

briefly-unavaiable-for-a-scheduled-maintenance-wordpress-website-fix-howto
Briefly unavailable for scheduled maintenance.
 

As you could guess that message caused me quite a lot of worries at hand especially since I've already break up my sites many times by doing quick unmindful reactions and the fact that there is Google Adsense ads appearing which does give me some Return on Investment cents every now and then …

It took me few minutes of research online to find what really happened and how to fix / resolve the WebSites normal operations.
 


So what causes the Briefly unavailable for scheduled maintenance. appears ?

When WordPress does some of its integrated maintenance jobs a plugin enable / disable or any task that has to modify crucial configurations inside the database WordPress does disable access to all end clients to itself in order to protect its sensitive data to appear to browser requestors as showing some unexpected information to end client browser could be later used by crackers / hackers or a possibly open a security hole for an attacker.

The message is wordpress generated notice and it is pretty normal for the end user to see it during the WP site installation update depending on how many plugins are installed and loaded to the site and how long it will take for the backend Linux / Windows server to fetch the archived .zips of plugins and substitute with the new ones and update the files extracting them to wp-content/plugins and updating the respective required SQL database / tables it could be showing for end users from few secs to few minutes.

However under some circumstances on Browser request timeout to remote wordpress site due to a network connectivity issue or just a bad configuration of Apache for requests timeout (or a slow remote server Apache responce time due to server Hardware / Mem overload) or a stupid browser "Stop" / cancel request like in my case you end up with the Briefly unavailable for scheduled maintenance and you can can longer access the your https://siteurl.com/wp-admin Admin Panel.

The message is triggered by a WP craeted file .maintenance inside /var/www/blog-site/ e.g. WordPress PHP scripts does check for /var/www/blog-site/.maintenance
existence and if it is matched the WP scripts does generate the Briefly unavailble … message.


How to resolve the "Briefly unavailable for scheduled maintenance. Check back in a minute" WordPress error ?

As you might guess removing the maintenance "coming soon" like message in most of the cases comes to just deleting the .maintenance file, to do so:

1. Login to remote server via FTP or SFTP
2. Locate your WP website root folder that should be something like /var/www/blog-site/.maintenance and issue:
issue something like:
 

$ rm -rf /var/www/bog-site/.maintenance


Assuming that some plugin Update .zip extraction or SQL update query did not ended being half installed / executed that should solve the error.
To check whether all is back to normal just refresh your browser pointing to the "broken" site. If it appears well you can thank God for that 🙂
If not check the apache error logs and php error logs and see which of the php scripts is failing and then try to manually fetch and unzip the WP .zip package to wp-content/plugins folder and give it another try and if God bless so it will work as before 🙂

How to prevent your WP based business in future from such nasty errors using A Staging site (test) version of your blog ?

Just run a duplicate of your website under a separate folder on your hosting and do enable the same plugins as on the primary website and copy over the MySQL / PostgreSQL Database from your Live site to the Staging, then once it is enabled before doing any crucial WordPress version updates or Plugins Update always do try the Upgrade first on your Test Staging site. If it does execute fine there in most of the cases the result should be the same on the Production host and that could solve you effors and nerves of debugging a hard to get failure errors or faulty plugins without affecting what your End users see.

If you're not hosting the WordPress install under your own hosting like me you can always use some of the public available hostings like  BlueHost or WPEngine

 

Preserve Session IDs of Tomcat cluster behind Apache reverse proxy / Sticky sessions with mod_proxy and Tomcat

Wednesday, February 26th, 2014

apache_and_tomcat_merged_logo_prevent_sticky_sessions
Having a combination of Apache webservice Reverse Proxy to redirect invisibly traffic to a number of Tomcat server positioned in a DMZ is a classic task in big companies Corporate world.
Hence if you work for company like IBM or HP sooner or later you will need to configure Apache Webserver cluster with few running Jakarta Tomcat Application servers behind. Scenario with necessity to access a java based application via Tomcat which requires logging (authentication) relaying on establishing and keeping a session ID is probably one of the most common ones and if you do it for first time you will probably end up with Session ID issues.  Session ID issues are hard to capture at first as on first glimpse application will seem to be working but users will have to re-login all the time even though the programmers might have coded for a session to expiry in 30 minutes or so.

… I mean not having configured Session ID prevention to Tomcats will cause random authentication session expiries and users using the Tomcat app will be unable to normally access below application with authenticated credentials. The solution to these is known under term "Sticky sessions"
To configure Sticky sessions you need to already have configured Apache/s with following minimum configuration:

  • enabled mod_proxy, proxy_balancer_module, proxy_http_module and or mod_proxy_ajp (in Apache config)

  LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_balancer_module modules/mod_proxy_balancer.so
LoadModule proxy_http_module modules/mod_proxy_http.so

  • And configured and tested Tomcats running an Application reachable via AJP protocol

Below example assumes there is Reverse Proxy Load Balancer Apache which has to forward all traffic to 2 tomcats. The config can easily be extended for as many as necessary by adding more BalancerMembers.

In Apache webserver (apache2.conf / httpd.conf) you need to have JSESSIONID configured. These JSESSIONID is going to be appended to each client request from Reverse Proxy to each of Tomcat servers with value opened once on authentication to first Tomcat node to each of the other ones.

<Proxy balancer://mycluster>
BalancerMember ajp://10.16.166.53:11010/ route=delivery1
BalancerMember ajp://10.16.166.66:11010/ route=delivery2
</Proxy>

ProxyRequests Off
ProxyPass / balancer://mycluster/ stickysession=JSESSIONID
ProxyPassReverse / balancer://mycluster/

The two variables route=delivery1 and route=delivery2 are routed to hosts identificators that also has to be present in Tomcat server configurations
In Tomcat App server First Node (server.xml)

<Engine name="Catalina" defaultHost="localhost" jvmRoute="delivery1">

In Tomcat App server Second Node (server.xml)

<Engine name="Catalina" defaultHost="localhost" jvmRoute="delivery2">

Once Sticky Sessions are configured it is useful to be able to track they work fine this is possible through logging each of established JESSSIONIDs, to do so add in httpd.conf

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"\"%{JSESSIONID}C\"" combined

After modifications restart Apache and Tomcat to load new configs. In Apache access.log the proof should be the proof that sessions are preserved via JSESSIONID, there should be logs like:
 

127.0.0.1 - - [18/Sep/2013:10:02:02 +0800] "POST /examples/servlets/servlet/RequestParamExample HTTP/1.1" 200 662 "http://localhost/examples/servlets/servlet/RequestParamExample" "Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130807 Firefox/17.0""B80557A1D9B48EC1D73CF8C7482B7D46.server2"

127.0.0.1 - - [18/Sep/2013:10:02:06 +0800] "GET /examples/servlets/servlet/RequestInfoExample HTTP/1.1" 200 693 "http://localhost/examples/servlets/" "Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130807 Firefox/17.0""B80557A1D9B48EC1D73CF8C7482B7D46.server2"

That should solve problems with mysterious session expiries 🙂

How to resolve (fix) WordPress wp-cron.php errors like “POST /wp-cron.php?doing_wp_cron HTTP/1.0″ 404” / What is wp-cron.php and what it does

Monday, March 12th, 2012

fix wordpress wp-cron.php 404 HTTP error, what is wp-cron.php schedule logo

One of the WordPress websites hosted on our dedicated server produces all the time a wp-cron.php 404 error messages like:

xxx.xxx.xxx.xxx - - [15/Apr/2010:06:32:12 -0600] "POST /wp-cron.php?doing_wp_cron HTTP/1.0

I did not know until recently, whatwp-cron.php does, so I checked in google and red a bit. Many of the places, I've red are aa bit unclear and doesn't give good exlanation on what exactly wp-cron.php does. I wrote this post in hope it will shed some more light on wp-config.php and how this major 404 issue is solved..
So

what is wp-cron.php doing?

 

  • wp-cron.php is acting like a cron scheduler for WordPress.
  • wp-cron.php is a wp file that controls routine actions for particular WordPress install.
  • Updates the data in SQL database on every, request, every day or every hour etc. – (depending on how it's set up.).
  • wp-cron.php executes automatically by default after EVERY PAGE LOAD!
  • Checks all pending comments for spam with Akismet (if akismet or anti-spam plugin alike is installed)
  • Sends all scheduled emails (e.g. sent a commentor email when someone comments on his comment functionality, sent newsletter subscribed persons emails etc.)
  • Post online scheduled articles for a day and time of particular day

Suppose you're writting a new post and you want to take advantage of WordPress functionality to schedule a post to appear Online at specific time:

What is wordpress wp-cron.php, Scheduling wordpress post screenshot

The Publish Immediately, field execution is being issued on the scheduled time thanks to the wp-cron.php periodic invocation.

Another example for wp-cron.php operation is in handling flushing of WP old HTML Caches generated by some wordpress caching plugin like W3 Total Cache
wp-cron.php takes care for dozens of other stuff silently in the background. That's why many wordpress plugins are depending heavily on wp-cron.php proper periodic execution. Therefore if something is wrong with wp-config.php, this makes wordpress based blog or website partially working or not working at all.
 

Our company wp-cron.php errors case

In our case the:
212.235.185.131 – – [15/Apr/2010:06:32:12 -0600] "POST /wp-cron.php?doing_wp_cron HTTP/1.0" 404
is occuring in Apache access.log (after each unique vistor request to wordpress!.), this is cause wp-cron.php is invoked on each new site visitor site request.
This puts a "vain load" on the Apache Server, attempting constatly to invoke the script … always returning not found 404 err.

As a consequence, the WP website experiences "weird" problems all the time. An illustration of a problem caused by the impoper wp-cron.php execution is when we are adding new plugins to WP.

Lets say a new wordpress extension is download, installed and enabled in order to add new useful functioanlity to the site.

Most of the time this new plugin would be malfunctioning if for example it is prepared to add some kind of new html form or change something on some or all the wordpress HTML generated pages.
This troubles are result of wp-config.php's inability to update settings in wp SQL database, after each new user request to our site.
So the newly added plugin website functionality is not showing up at all, until WP cache directory is manually deleted with rm -rf /var/www/blog/wp-content/cache/

I don't know how thi whole wp-config.php mess occured, however my guess is whoever installed this wordpress has messed something in the install procedure.

Anyways, as I researched thoroughfully, I red many people complaining of having experienced same wp-config.php 404 errs. As I red, most of the people troubles were caused by their shared hosting prohibiting the wp-cron.php execution.
It appears many shared hostings providers choose, to disable the wordpress default wp-cron.php execution. The reason is probably the script puts heavy load on shared hosting servers and makes troubles with server overloads.

Anyhow, since our company server is adedicated server I can tell for sure in our case wordpress had no restrictions for how and when wp-cron.php is invoked.
I've seen also some posts online claiming, the wp-cron.php issues are caused of improper localhost records in /etc/hosts, after a thorough examination I did not found any hosts problems:

hipo@debian:~$ grep -i 127.0.0.1 /etc/hosts
127.0.0.1 localhost.localdomain localhost

You see from below paste, our server, /etc/hosts has perfectly correct 127.0.0.1 records.

Changing default way wp-cron.php is executed

As I've learned it is generally a good idea for WordPress based websites which contain tens of thousands of visitors, to alter the default way wp-cron.php is handled. Doing so will achieve some efficiency and improve server hardware utilization.
Invoking the script, after each visitor request can put a heavy "useless" burden on the server CPU. In most wordpress based websites, the script did not need to make frequent changes in the DB, as new comments in posts did not happen often. In most wordpress installs out there, big changes in the wordpress are not common.

Therefore, a good frequency to exec wp-cron.php, for wordpress blogs getting only a couple of user comments per hour is, half an hour cron routine.

To disable automatic invocation of wp-cron.php, after each visitor request open /var/www/blog/wp-config.php and nearby the line 30 or 40, put:

define('DISABLE_WP_CRON', true);

An important note to make here is that it makes sense the position in wp-config.php, where define('DISABLE_WP_CRON', true); is placed. If for instance you put it at the end of file or near the end of the file, this setting will not take affect.
With that said be sure to put the variable define, somewhere along the file initial defines or it will not work.

Next, with Apache non-root privileged user lets say www-data, httpd, www depending on the Linux distribution or BSD Unix type add a php CLI line to invoke wp-cron.php every half an hour:

linux:~# crontab -u www-data -e

0,30 * * * * cd /var/www/blog; /usr/bin/php /var/www/blog/wp-cron.php 2>&1 >/dev/null

To assure, the php CLI (Command Language Interface) interpreter is capable of properly interpreting the wp-cron.php, check wp-cron.php for syntax errors with cmd:

linux:~# php -l /var/www/blog/wp-cron.php
No syntax errors detected in /var/www/blog/wp-cron.php

That's all, 404 wp-cron.php error messages will not appear anymore in access.log! 🙂

Just for those who can find the root of the /wp-cron.php?doing_wp_cron HTTP/1.0" 404 and fix the issue in some other way (I'll be glad to know how?), there is also another external way to invoke wp-cron.php with a request directly to the webserver with short cron invocation via wget or lynx text browser.

– Here is how to call wp-cron.php every half an hour with lynxPut inside any non-privileged user, something like:
01,30 * * * * /usr/bin/lynx -dump "http://www.your-domain-url.com/wp-cron.php?doing_wp_cron" 2>&1 >/dev/null

– Call wp-cron.php every 30 mins with wget:

01,30 * * * * /usr/bin/wget -q "http://www.your-domain-url.com/wp-cron.php?doing_wp_cron"

Invoke the wp-cron.php less frequently, saves the server from processing the wp-cron.php thousands of useless times.

Altering the way wp-cron.php works should be seen immediately as the reduced server load should drop a bit.
Consider you might need to play with the script exec frequency until you get, best fit cron timing. For my company case there are only up to 3 new article posted a week, hence too high frequence of wp-cron.php invocations is useless.

With blog where new posts occur once a day a script schedule frequency of 6 up to 12 hours should be ok.

 

Possible way to Improve wordpress performance with wp-config.php 4 config variables

Tuesday, March 6th, 2012

Wordpress improve performance wp-config.php logo chromium effect GIMP

Nowdays WordPress is ran by million of blogs and websites all around the net. I myself run wordpress for this blog in general wordpress behaves quite well in terms of performance. However as with time the visitors tend to increase, on frequently updated websites or blogs. As a consequence, the blog / website performance slowly starts to decrease as result of the MySQL server read / write operations creating I/O and CPU load overheads. Buying a new hardware and migrating the wordpress database is a possible solution, however for many small or middle size wordpress blogs en sites like mine this is not easy task. Getting a dedicated server or simply upgrading your home server hardware is expensive and time consuming process… In my efforts to maximize my hardware utilization and increase my blog decaying performance I've stumbled on the article Optimize WordPress performance with wp-config.php

According to the article there are 4 simple wp-config.php config directvies useful in decreasing a lot of queries to the MySQL server issued with each blog visitor.

define('WP_HOME','http://www.yourblog-or-siteurl.com');
define('WP_SITEURL','http://www.yourblog-or-siteurl.com');
define('TEMPLATEPATH', '/var/www/blog/wp-content/themes/default');
define('STYLESHEETPATH', '/var/www/blog/wp-content/themes/default');

1. WP_HOME and WP_SITEURL wp-config.php directvies

The WP_HOME and WP_SITEURL variables are used to hard-code the address of the wordpress blog or site url, so wordpress doesn't have to check everytime in the database on every user request to know it is own URL address.

2. TEMPLATEPATH and TEMPLATEPATH wp variables

This variables will surely improve performance to Wodpress blogs which doesn't implement caching. On wp install with enabled caching plugins like WordPress Super Cache, Hyper Cache or WordPress Db Cache is used, I don't know if this variables will have performance impact …

So far I have tested the vars on a couple of wordpress based installs with caching enabled and even on them it seems the pages load faster than before, but I cannot say this for sure as I did not check the site loading time in advance before hardcoding the vars.

Anyways even if the suggested variables couldn't make positive impact on performance, having the four variables in wp-config.php is a good practice for blogs or websites which are looking for extra clarity.
For multiple wordpress installations living on the same server, having defined the 4 vars in different wordpress seems like a good idea too.

How to disable nginx static requests access.log logging

Monday, March 5th, 2012

NGINX logo Static Content Serving Stop logging

One of the companies, where I'm employed runs nginx as a CDN (Content Delivery Network) server.
Actually nginx, today has become like a standard for delivering tremendous amounts of static content to clients.
The nginx, server load has recently increased with the number of requests, we have much more site visitors now.
Just recently I've noticed the log files are growing to enormous sizes and in reality this log files are not used at all.
As I've used disabling of web server logging as a way to improve Apache server performance in past time, I thought of implying the same little "trick" to improve the hardware utilization on the nginx server as well.

To disable logging, I proceeded and edit the /usr/local/nginx/conf/nginx.conf file, commenting inside every occurance of:

access_log /usr/local/nginx/logs/access.log main;

to

#access_log /usr/local/nginx/logs/access.log main;

Next, to load the new nginx.conf settings I did a restart:

nginx:~# killall -9 nginx; sleep 1; /etc/init.d/nginx start

I expected, this should be enough to disable completely access.log, browser request logins. Unfortunately /usr/local/nginx/logs/access.log was still displaying growing with:

nginx:~# tail -f /usr/local/nginx/logs/access.log

After a bit thorough reading of nginx.conf config rules, I've noticed there is a config directive:

access_log off;

Therefore to succesfully disable logging I had to edit config occurance of:

access_log /usr/local/nginx/logs/access.log main

to

After a bit thorough reading of nginx.conf config rules, I've noticed there is a config directive:

access_log off;

Therefore to succesfully disable logging I had to edit config occurance of:

access_log /usr/local/nginx/logs/access.log main

to

access_log /usr/local/nginx/logs/access.log main
access_log off;

Finally to load the new settings, which thanksfully this time worked, I did nginx restart:

nginx:~# killall -9 nginx; sleep 1; /etc/init.d/nginx start

And hooray! Thanks God, now nginx logging is disabled!

As a result, as expected the load avarage on the server reduced a bit 🙂

How to make a mirror of website on GNU / Linux with wget / Few tips on wget site mirroring

Wednesday, February 22nd, 2012

how-to-make-mirror-of-website-on-linux-wget

Everyone who used Linux is probably familiar with wget or has used this handy download console tools at least thousand of times. Not so many Desktop GNU / Linux users like Ubuntu and Fedora Linux users had tried using wget to do something more than single files download.
Actually wget is not so popular as it used to be in earlier linux days. I've noticed the tendency for newer Linux users to prefer using curl (I don't know why).

With all said I'm sure there is plenty of Linux users curious on how a website mirror can be made through wget.
This article will briefly suggest few ways to do website mirroring on linux / bsd as wget is both available on those two free operating systems.

1. Most Simple exact mirror copy of website

The most basic use of wget's mirror capabilities is by using wget's -mirror argument:

# wget -m http://website-to-mirror.com/sub-directory/

Creating a mirror like this is not a very good practice, as the links of the mirrored pages will still link to external URLs. In other words link URL will not pointing to your local copy and therefore if you're not connected to the internet and try to browse random links of the webpage you will end up with many links which are not opening because you don't have internet connection.

2. Mirroring with rewritting links to point to localhost and in between download page delay

Making mirror with wget can put an heavy load on the remote server as it fetches the files as quick as the bandwidth allows it. On heavy servers rapid downloads with wget can significantly reduce the download server responce time. Even on a some high-loaded servers it can cause the server to hang completely.
Hence mirroring pages with wget without explicity setting delay in between each page download, could be considered by remote server as a kind of DoS – (denial of service) attack. Even some site administrators have already set firewall rules or web server modules configured like Apache mod_security which filter requests to IPs which are doing too frequent HTTP GET /POST requests to the web server.
To make wget delay with a 10 seconds download between mirrored pages use:

# wget -mk -w 10 -np --random-wait http://website-to-mirror.com/sub-directory/

The -mk stands for -m/-mirror and -k / shortcut argument for –convert-links (make links point locally), –random-wait tells wget to make random waits between o and 10 seconds between each page download request.

3. Mirror / retrieve website sub directory ignoring robots.txt "mirror restrictions"

Some websites has a robots.txt which restricts content download with clients like wget, curl or even prohibits, crawlers to download their website pages completely.

/robots.txt restrictions are not a problem as wget has an option to disable robots.txt checking when downloading.
Getting around the robots.txt restrictions with wget is possible through -e robots=off option.
For instance if you want to make a local mirror copy of the whole sub-directory with all links and do it with a delay of 10 seconds between each consequential page request without reading at all the robots.txt allow/forbid rules:

# wget -mk -w 10 -np -e robots=off --random-wait http://website-to-mirror.com/sub-directory/

4. Mirror website which is prohibiting Download managers like flashget, getright, go!zilla etc.

Sometimes when try to use wget to make a mirror copy of an entire site domain subdirectory or the root site domain, you get an error similar to:

Sorry, but the download manager you are using to view this site is not supported.
We do not support use of such download managers as flashget, go!zilla, or getright

This message is produced by the site dynamic generation language PHP / ASP / JSP etc. used, as the website code is written to check on the browser UserAgent sent.
wget's default sent UserAgent to the remote webserver is:
Wget/1.11.4

As this is not a common desktop browser useragent many webmasters configure their websites to only accept well known established desktop browser useragents sent by client browsers.
Here are few typical user agents which identify a desktop browser:
 

  • Mozilla/5.0 (Windows NT 6.1; rv:6.0) Gecko/20110814 Firefox/6.0
  • Mozilla/5.0 (X11; Linux i686; rv:6.0) Gecko/20100101 Firefox/6.0
  • Mozilla/6.0 (Macintosh; I; Intel Mac OS X 11_7_9; de-LI; rv:1.9b4) Gecko/2012010317 Firefox/10.0a4
  • Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.2a1pre) Gecko/20110324 Firefox/4.2a1pre

etc. etc.

If you're trying to mirror a website which has implied some kind of useragent restriction based on some "valid" useragent, wget has the -U option enabling you to fake the useragent.

If you get the Sorry but the download manager you are using to view this site is not supported , fake / change wget's UserAgent with cmd:

# wget -mk -w 10 -np -e robots=off \
--random-wait
--referer="http://www.google.com" \--user-agent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6" \--header="Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5" \--header="Accept-Language: en-us,en;q=0.5" \--header="Accept-Encoding: gzip,deflate" \--header="Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7" \--header="Keep-Alive: 300"

For the sake of some wget anonimity – to make wget permanently hide its user agent and pretend like a Mozilla Firefox running on MS Windows XP use .wgetrc like this in home directory.

5. Make a complete mirror of a website under a domain name

To retrieve complete working copy of a site with wget a good way is like so:

# wget -rkpNl5 -w 10 --random-wait www.website-to-mirror.com

Where the arguments meaning is:
-r – Retrieve recursively
-k – Convert the links in documents to make them suitable for local viewing
-p – Download everything (inline images, sounds and referenced stylesheets etc.)
-N – Turn on time-stamping
-l5 – Specify recursion maximum depth level of 5

6. Make a dynamic pages static site mirror, by converting CGI, ASP, PHP etc. to HTML for offline browsing

It is often websites pages are ending in a .php / .asp / .cgi … extensions. An example of what I mean is for instance the URL http://php.net/manual/en/tutorial.php. You see the url page is tutorial.php once mirrored with wget the local copy will also end up in .php and therefore will not be suitable for local browsing as .php extension is not understood how to interpret by the local browser.
Therefore to copy website with a non-html extension and make it offline browsable in HTML there is the –html-extension option e.g.:

# wget -mk -w 10 -np -e robots=off \
--random-wait \
--convert-links http://www.website-to-mirror.com

A good practice in mirror making is to set a download limit rate. Setting such rate is both good for UP and DOWN side (the local host where downloading and remote server). download-limit is also useful when mirroring websites consisting of many enormous files (documental movies, some music etc.).
To set a download limit to add –limit-rate= option. Passing by to wget –limit-rate=200K would limit download speed to 200KB.

Other useful thing to assure wget has made an accurate mirror is wget logging. To use it pass -o ./my_mirror.log to wget.
 

How to scan for DHCP available servers in a Network range on Linux and FreeBSD

Thursday, December 8th, 2011

GNU / Linux and FreeBSD had a nifty little program (tool) called dhcping . dhcpingsend a DHCP request to DHCP server to see if it’s up and running. dhcping is also able to send a request to DHCP servers on a whole network range and therefore it can e asily be used as a scanner to find any available DHCP servers in a network.
This makes dhcping a nmap like scanner capable to determine if dhcp servers are in a network 😉
To scan an an entire network range with dhclient and find any existing DHCP servers:

noah:~# dhcping -s 255.255.255.255 -r -v
Got answer from: 192.168.2.1
received from 192.168.2.1, expected from 255.255.255.255
no answer

In above’s output actually my Dlink wireless router returns answer to the broadcast DHCP LEASE UDP network requests of dhcping .
On a networks where there is no DHCP server available, the requests dhcping -s 255.255.255.255 -r -v returns:

noah:~# dhcping -s 255.255.255.255 -r -v
no answer

This article was inspired by a post, I’ve red by a friend (Amridikon), so thx goes to him.

 

How to renew IP address, Add Routing and flush DNS cache on Windows XP / Vista / 7

Friday, November 25th, 2011

There are two handy Windows commands which can be used to renew IP address or flush prior cached DNS records which often create problems with resolving hosts.

1. To renew the IP address (fetch address from DHCP server) C:> ipconfig /release
C:> ipconfig /renew

In above cmd ipconfig /release will de-assign the IP address configured on all Windows LAN and Wireless interfaces, whether ipconfig /renew will send request for IP address to the DNS server.

To unassign and assign again IP address from DHCP server only for a particular LAN or WLAN card:

C:> ipconfig /release LAN
C:> ipconfig /renew LAN
C:> ipconfig /release WLAN
C:> ipconfig /renew WLAN

2. Adding specific routing to Windows

Windows has a Route command similar by syntax to Linux’s route command.
To add routing via a specific predefined IP addresses on Windows the commands should be something like:

C:> Route add 192.168.40.0 mask 255.255.255.0 192.168.41.253
C:> Route add 0.0.0.0 mask 0.0.0.0 192.168.41.254
The first command adds IP 192.168.40.0 in the network of 255 hosts to be routed via 192.168.41.253
The second one adds 192.168.41.254 as a default gateway for all outbound traffic from the Windows host.
To make permanent routing -p switch is used.
3. To clear Windows DNS cache (flush DNS cached records) C:> ipconfig /flushdns
This will clear all IP records corresponding to hostnames previously cached on the Windows host. Using ipconfig /flushdns is especially handy when IP address for a specific DNS host is changed. Flushing the Windows DNS cache can save us a lot of waiting before the domain example.com starts resolving to the new IP address let’s say 1.2.3.4 instead of the old one 2.2.2.2