Posts Tagged ‘log’

Why du and df reporting different on a filesystem / How to fix inconsistency between used space on FS and disk showing full strangeness

Wednesday, July 24th, 2019

linux-why-du-and-df-shows-different-result-inconsincy-explained-filesystem-full-oddity

If you're a sysadmin on a large server environment such as a couple of hundred of Virtual Machines running Linux OS on either physical host or OpenXen / VmWare hosted guest Virtual Machine, you might end up sometimes at an odd case where some mounted partition mount point reports its file use different when checked with
df
cmd than when checked with du command, like for example:
 

root@sqlserver:~# df -hT /var/lib/mysql
Filesystem   Type  Size Used Avail Use% Mounted On
/dev/sdb5      ext4    19G  3,4G    14G  20% /var/lib/mysql

Here the '-T' argument is used to show us the filesystem.

root@sqlserver:~# du -hsc /var/lib/mysql
0K    /var/lib/mysql/
0K    total

 

1. Simple debug on what might be the root cause for df / du inconsistency reporting

 

Of course the basic thing to do when in that weird situation is to be totally shocked how this is possible and to investigate a bit what is the biggest first level sub-directories that eat up the space on the mounted location, with du:

 

# du -hkx –max-depth=1 /var/lib/mysql/|uniq|sort -n
4       /var/lib/mysql/test
8       /var/lib/mysql/ezmlm
8       /var/lib/mysql/micropcfreak
8       /var/lib/mysql/performance_schema
12      /var/lib/mysql/mysqltmp
24      /var/lib/mysql/speedtest
64      /var/lib/mysql/yourls
144     /var/lib/mysql/narf
320     /var/lib/mysql/webchat_plus
424     /var/lib/mysql/goodfaithair
528     /var/lib/mysql/moonman
648     /var/lib/mysql/daniel
852     /var/lib/mysql/lessn
1292    /var/lib/mysql/gallery

The given output is in Kilobytes so it is a little bit hard to read, if you're used to Mbytes instead, do

 

 # du -hmx –max-depth=1 /var/lib/mysql/|uniq|sort -n|less

 

I've also investigated on the complete /var directory contents sorted by size with:

 

 # du -akx ./ | sort -n
5152564    ./cache/rsnapshot/hourly.2/localhost
5255788    ./cache/rsnapshot/hourly.2
5287912    ./cache/rsnapshot
7192152    ./cache


Even after finding out the bottleneck dirs and trying to clear up a bit, continued facing that inconsistently shown in two commands and if you're likely to be stunned like me and try … to move some files to a different filesystem to free up space or assigned inodes with a hope that shown inconsitency output will be fixed as it might be caused  due to some kernel / FS caching ?? and this will eventually make the mounted FS to refresh …

But unfortunately, if you try it you'll figure out clearing up a couple of Megas or Gigas will make no difference in cmd output.

In my exact case /var/lib/mysql is a separate mounted ext4 filesystem, however same issue was present also on a Network Filesystem (NFS) and thus, my first thought that this is caused by a network failure problem or NFS bug turned to be wrong.

After further short investigation on the inodes on the Filesystem, it was clear enough inodes are available:
 

# df -i /var/lib/mysql
Filesystem       Inodes  IUsed   IFree IUse% Mounted on
/dev/sdb5      1221600  2562 1219038   1% /var/lib/mysql

 

So the filled inodes count assumed issue also has been rejected.
P.S. (if you're not well familiar with them read manual, i.e. – man 7 inode).
 

– Remounting the mounted filesystem

To make sure the filesystem shown inconsistency between du and df is not due to some hanging network mount or bug, first logical thing I did is to remount the filesytem showing different in size, in my case this was done with:
 

# mount -o remount,rw -t ext4 /var/lib/mysql

For machines with NFS remote mounted storage locations, used:

# mount -o remount,rw -t nfs /var/www


FS remount did not solved it so I continued to ponder what oddity and of course I thought of a workaround (in case if this issues are caused by kernel bug or OS lib issue) reboot might be the solution, however unfortunately restarting the VMs was not a wanted easy to do solution, thus I continued investigating what is wrong …

Next check of course was to check, what kind of network connections are opened to the affected hosts with:
 

# netstat -tupanl


Did not found anything that might point me to the reported different Megabytes issue, so next step was to check what is the situation with currently opened files by running processes on the weird df / du reported systems with lsof, and boom there I observed oddity such as multiple files

 

# lsof -nP | grep '(deleted)'

COMMAND   PID   USER   FD   TYPE DEVICE    SIZE NLINK  NODE NAME
mysqld   2588  mysql    4u   REG 253,17      52     0  1495 /var/lib/mysql/tmp/ibY0cXCd (deleted)
mysqld   2588  mysql    5u   REG 253,17    1048     0  1496 /var/lib/mysql/tmp/ibOrELhG (deleted)
mysqld   2588  mysql    6u   REG 253,17       777884290     0  1497 /var/lib/mysql/tmp/ibmDFAW8 (deleted)
mysqld   2588  mysql    7u   REG 253,17       123667875     0 11387 /var/lib/mysql/tmp/ib2CSACB (deleted)
mysqld   2588  mysql   11u   REG 253,17       123852406     0 11388 /var/lib/mysql/tmp/ibQpoZ94 (deleted)

 

Notice that There were plenty of '(deleted)' STATE files shown in memory an overall of 438:

 

# lsof -nP | grep '(deleted)' |wc -l
438


As I've learned a bit online about the problem, I found it is also possible to find deleted unlinked files only without any greps (to list all deleted files in memory files with lsof args only):

 

# lsof +L1|less


The SIZE field (fourth column)  shows a number of files that are really hard in size and that are kept in open on filesystem and in memory, totally messing up with the filesystem. In my case this is temp files created by MYSQLD daemon but depending on the server provided service this might be apache's www-data, some custom perl / bash script executed via a cron job, stalled rsync jobs etc.
 

2. Check all the list open files with the mysql / root user as part of the the server filesystem inconsistency debugging with:

 

– Grep opened files on server by user

# lsof |grep mysql
mysqld    1312                       mysql  cwd       DIR               8,21       4096          2 /var/lib/mysql
mysqld    1312                       mysql  rtd       DIR                8,1       4096          2 /
mysqld    1312                       mysql  txt       REG                8,1   20336792   23805048 /usr/sbin/mysqld
mysqld    1312                       mysql  mem       REG               8,21      24576         20 /var/lib/mysql/tc.log
mysqld    1312                       mysql  DEL       REG               0,16                 29467 /[aio]
mysqld    1312                       mysql  mem       REG                8,1      55792   14886933 /lib/x86_64-linux-gnu/libnss_files-2.28.so

 

# lsof | grep root
COMMAND    PID   TID TASKCMD          USER   FD      TYPE             DEVICE   SIZE/OFF       NODE NAME
systemd      1                        root  cwd       DIR                8,1       4096          2 /
systemd      1                        root  rtd       DIR                8,1       4096          2 /
systemd      1                        root  txt       REG                8,1    1489208   14928891 /lib/systemd/systemd
systemd      1                        root  mem       REG                8,1    1579448   14886924 /lib/x86_64-linux-gnu/libm-2.28.so

Other command that helped to track the discrepancy between df and du different file usage on FS is:
 

# du -hxa  / | egrep '^[[:digit:]]{1,1}G[[:space:]]*'
 

 

3. Fixing large files kept in memory filesystem problem


What is the real reason for ending up with this file handlers opened by running backgrounded programs on the Linux OS?
It could be multiple  but most likely it is due to exceeded server / client interactions or breaking up RAM or HDD drive with writing plenty of logs on the FS without ending keeping space occupied or Programming library bugs used by hanged service leaving the FH opened on storage.

What is the solution to file system files left in memory problem?

The best solution is to first fix custom script or hanged service and then if possible to simply restart the server to make the kernel / services reload or if this is not possible just restart the problem creation processes.

Once the process is identified like in my case this was MySQL on systemd enabled newer OS distros, just do:

 

 

# systemctl restart mysqld.service


or on older init.d system V ones:

# /etc/init.d/service restart


For custom hanged scripts being listed in ps axuwef you can grep the pid and do a kill -HUP (if the script is written in a good way to recognize -HUP and restart the sub-running process properly – BE EXTRA CAREFUL IF YOU'RE RESTARTING BROKEN SCRIPTS as this might cause your running service disruptions …).

# pgrep -l script.sh
7977 script.sh


# kill -HUP PID

 

Now finally this should either mitigate or at best case completely solve the reported disagreement between df and du, after which the calculated / reported disk space should be back to normal and show up approximately the same (note that size changes a bit as mysql service is writting data) constantly extending the size between the two checks.

 

# df -hk /var/lib/mysql; du -hskc /var/lib/mysql
Filesystem       Inodes  IUsed   IFree IUse% Mounted on
/dev/sdb5        19097172 3472744 14631296  20% /var/lib/mysql
3427772    /var/lib/mysql
3427772    total

 

What we learned?

What I've explained in this article is why and how it comes that 'zoombie' files reside on a filesystem
appearing to be eating disk space on a mounted local or network partition, giving strange inconsistent
reports, leading to system service disruptions and impossibility to have correctly shown information on used
disk space on mounted drive.

I went through with some standard logic on debugging service / filesystem / inode issues up explainat, that led me to the finding about deleted files being kept in filesystem and producing the filesystem strange sized / showing not correct / filled even after it was extended with tune2fs and was supposed to have extra 50GBs.

Finally it was explained shortly how to HUP / restart hanging script / service to fix it.

Some few good readings that helped to fix the issue:

What to do when du and df report different usage is here
df in linux not showing correct free space after file removal is here
Why do “df” and “du” commands show different disk usage?
 

Check linux install date / How do I find out how long a Linux server OS was installed?

Wednesday, March 30th, 2016

linux-check-install-date-howto-commands-on-debian-and-fedora-tux_the_linux_penguin_by_hello

To find out the Linux install date, there is no one single solution according to the Linux distribution type and version, there are some common ways to get the Linux OS install age.
Perhaps the most popular way to get the OS installation date and time is to check out when the root filesystem ( / ) was created, this can be done with tune2fs command

 

server:~# tune2fs -l /dev/sda1 | grep 'Filesystem created:'
Filesystem created:       Thu Sep  6 21:44:22 2012

 

server:~# ls -alct /|tail -1|awk '{print $6, $7, $8}'
sep 6 2012

 

root home directory is created at install time
 

 

server:~# ls -alct /root

 

root@server:~# ls -lAhF /etc/hostname
-rw-r–r– 1 root root 8 sep  6  2012 /etc/hostname

 

For Debian / Ubuntu and other deb based distributions the /var/log/installer directory is being created during OS install, so on Debian the best way to check the Linux OS creation date is with:
 

root@server:~# ls -ld /var/log/installer
drwxr-xr-x 3 root root 4096 sep  6  2012 /var/log/installer/
root@server:~# ls -ld /lost+found
drwx—— 2 root root 16384 sep  6  2012 /lost+found/

 

On Red Hat / Fedora / CentOS, redhat based Linuces , you can use:

 

rpm -qi basesystem | grep "Install Date"

 

basesystem is the package containing basic Linux binaries many of which should not change, however in some cases if there are some security updates package might change so it is also good to check the root filesystem creation time and compare whether these two match.

Howto Fix “sysstat Cannot open /var/log/sysstat/sa no such file or directory” on Debian / Ubuntu Linux

Monday, February 15th, 2016

sysstast-no-such-file-or-directory-fix-Debian-Ubuntu-Linux-howto
I really love sysstat and as a console maniac I tend to install it on every server however by default there is some <b>sysstat</b> tuning once installed to make it work, for those unfamiliar with <i>sysstat</i> I warmly recommend to check, it here is in short the package description:<br /><br />
 

server:~# apt-cache show sysstat|grep -i desc -A 15
Description: system performance tools for Linux
 The sysstat package contains the following system performance tools:
  – sar: collects and reports system activity information;
  – iostat: reports CPU utilization and disk I/O statistics;
  – mpstat: reports global and per-processor statistics;
  – pidstat: reports statistics for Linux tasks (processes);
  – sadf: displays data collected by sar in various formats;
  – nfsiostat: reports I/O statistics for network filesystems;
  – cifsiostat: reports I/O statistics for CIFS filesystems.
 .
 The statistics reported by sar deal with I/O transfer rates,
 paging activity, process-related activities, interrupts,
 network activity, memory and swap space utilization, CPU
 utilization, kernel activities and TTY statistics, among
 others. Both UP and SMP machines are fully supported.
Homepage: http://pagesperso-orange.fr/sebastien.godard/

 

If you happen to install sysstat on a Debian / Ubuntu server with:

server:~# apt-get install –yes sysstat


, and you try to get some statistics with sar command but you get some ugly error output from:

 

server:~# sar Cannot open /var/log/sysstat/sa20: No such file or directory


And you wonder how to resolve it and to be able to have the server log in text databases periodically the nice sar stats load avarages – %idle, %iowait, %system, %nice, %user, then to FIX that Cannot open /var/log/sysstat/sa20: No such file or directory

You need to:

server:~# vim /etc/default/sysstat


By Default value you will find out sysstat stats it is disabled, e.g.:

ENABLED="false"

Switch the value to "true"

ENABLED="true"


Then restart sysstat init script with:

server:~# /etc/init.d/sysstat restart

However for those who prefer to do things from menu Ncurses interfaces and are not familiar with Vi Improved, the easiest way is to run dpkg reconfigure of the sysstat:

server:~# dpkg –reconfigure


sysstat-reconfigure-on-gnu-linux

 

root@server:/# sar
Linux 2.6.32-5-amd64 (pcfreak) 15.02.2016 _x86_64_ (2 CPU)

0,00,01 CPU %user %nice %system %iowait %steal %idle
0,15,01 all 24,32 0,54 3,10 0,62 0,00 71,42
1,15,01 all 18,69 0,53 2,10 0,48 0,00 78,20
10,05,01 all 22,13 0,54 2,81 0,51 0,00 74,01
10,15,01 all 17,14 0,53 2,44 0,40 0,00 79,49
10,25,01 all 24,03 0,63 2,93 0,45 0,00 71,97
10,35,01 all 18,88 0,54 2,44 1,08 0,00 77,07
10,45,01 all 25,60 0,54 3,33 0,74 0,00 69,79
10,55,01 all 36,78 0,78 4,44 0,89 0,00 57,10
16,05,01 all 27,10 0,54 3,43 1,14 0,00 67,79


Well that's it now sysstat error resolved, text reporting stats data works again, Hooray! 🙂

No space left on device with free disk space / Why no space left on device while there is plenty of disk space on drive – Running out of Inodes

Tuesday, November 17th, 2015

no_space_left-on-device-while-there-is-disk-space-running-out-of-file-inodes-unix_linux_file_system_diagram.gif

 

On one of the servers, I'm administrating the websites started showing some Mysql database table corrup errors like:
 

 

Table './database_name/site_news_list_com' is marked as crashed and last (automatic?) repair failed

The server is using Oracle MySQL server community stable edition on Debian GNU / Linux 6.0, so I first thought during work the server crashed either due to some bug issue in MySQL or it crashed due to some PHP cron job that did something messy. Thus to solve the crashed tables, tried using mysqlcheck tool which helped pretty fine, at many times whether there were database / table corruptions. I've run the following set of mysqlcheck commands with root (superuser) in a bash shell after logging in through SSH:

:

server:~# /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–check –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log
server:~# /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf –analyze –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log
server:~# /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–auto-repair –optimize –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log
server:~# /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–optimize –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log


In order for above commands to work, I've created the /root/.my.cnf containing my root (mysql CLI) mysql username and password, e.g. file has content like below:

 

[client]
user=root
password=MySecretPassword8821238

 

Btw a good note here is its generally a good idea (if you want to have consistent mysql databases) to automatically execute via a cron job 2 times a month, I've in root cronjob the following:

 

crontab -u root -l |grep -i mysqlcheck
04 06 5,10,15,20,25,1 * * /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–check –all-databases –silent -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log 07 06 5,10,15,20,25,1 * * /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf –analyze –all-databases –silent -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log 12 06 5,10,15,20,25,1 * * /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–auto-repair –optimize –all-databases –silent -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log 17 06 5,10,15,20,25,1 * * /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–optimize –all-databases –silent -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log


Strangely I got a lot of errors that some .MYI / .MYD .frm temp files, necessery for the mysql tables recovery can't be written inside /home/mysql/database_name

That was pretty weird and I thought there might be some issues with permissions, causing the inability to write, due to some bug or something so I went straight and checked /home/mysql/database_name permissions, e.g.::

 

server:/home/mysql/database_name# ls -ld soccerfame
drwx—— 2 mysql mysql 36864 Nov 17 12:00 soccerfame
server:/home/mysql/database_name# ls -al1|head -n 10
total 1979012
drwx—— 2 mysql mysql 36864 Nov 17 12:00 .
drwx—— 36 mysql mysql 4096 Nov 17 11:12 ..
-rw-rw—- 1 mysql mysql 8712 Nov 17 10:26 1_campaigns_diez.frm
-rw-rw—- 1 mysql mysql 14672 Jul 8 18:57 1_campaigns_diez.MYD
-rw-rw—- 1 mysql mysql 1024 Nov 17 11:38 1_campaigns_diez.MYI
-rw-rw—- 1 mysql mysql 8938 Nov 17 10:26 1_campaigns.frm
-rw-rw—- 1 mysql mysql 8738 Nov 17 10:26 1_campaigns_logs.frm
-rw-rw—- 1 mysql mysql 883404 Nov 16 22:01 1_campaigns_logs.MYD
-rw-rw—- 1 mysql mysql 330752 Nov 17 11:38 1_campaigns_logs.MYI


As seen from above output, all was perfect with permissions, so it should have been something else, so I decided to try to create a random file with touch command inside /home/mysql/database_name directory:

 

touch /home/mysql/database_name/somefile-to-test-writtability.txt touch: cannot touch ‘/scr1/data/somefile-to-test-writtability.txt‘: No space left on device


Then logically I thought the /home/mysql/ mounted ext4 partition got filled, because of crashed SQL database or a bug thus, checked with disk free command df whether there is enough space on server:

server:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md1 20G 7.6G 11G 42% /
udev 10M 0 10M 0% /dev
tmpfs 13G 1.3G 12G 10% /run
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/md2 256G 134G 110G 55% /home

Well that's weird? Obviously only 55% of available disk space is used and available 134G which was more than enough so I got totally puzzled why, files can't be written.

Then very logically, I thought it might be that /home directory has remounted as read only, because the SSD memory disk on server is failing and checked for errors in dmesg, i.e.:

 

server:~# dmesg|grep -i error


Also checked how exactly was partition mounted, to check whether it is (RO) read-only:

 

server:~# mount -l|grep -i /home
/dev/md2 on /home type ext4 (rw,relatime,discard,data=ordered)


Now everything become even more weirder, as obviously the disk continued to be claiming no space left on device, while in reality there was plenty of disk space.

Then after running a quick research on the internet for the no space left on device with free disk space, I've come across this great superuser.com thread which let me realize the partition run out of inodes and that's why no new file inodes could be assigned and therefore, the linux kernel is refusing to write the file on ext4 partition.

For those who haven't heard of Linux Partition Inodes here is link to Wikipedia and a quick quote:

 

In a Unix-style file system, the inode is a data structure used to represent a filesystem object, which can be one of various things including a file or a directory. Each inode stores the attributes and disk block location(s) of the filesystem object's data.[1] Filesystem object attributes may include manipulation metadata (e.g. change,[2] access, modify time), as well as owner and permission data (e.g. group-id, user-id, permissions).[3]
Directories are lists of names assigned to inodes. The directory contains an entry for itself, its parent, and each of its children.


Once I understood it is the inodes, I checked how many of them are occupied with cmd:

 

server:~# df -i /home
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/md2 17006592 17006592 0 100% /home


You see, there were 0 (zero) free file inodes on server and that was the reason for no space left on device while there was actually free disk space

To clean up (free) some inodes on partition, first thing I did is to delete all old logs which were inside /home and files I positively know not to be necessery, then to find which directories allocating most innodes used:

 

server:~# find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n


If you're on a regular old fashined IDE Hard Drive and not SSD or you have too much files inside this command will take really long …:

Therefore a better solution might be to frist:

a) Try to find root folders with large inodes count:

for i in /home/*; do echo $i; find $i |wc -l; done
Try to find specific folders:


You should get output like:

 

/home/new_website
606692
/home/common
73
/home/pcfreak
5661
/home/hipo
33
/home/blog
13570
/home/log
123
/home/lost+found
1

b) Then once you know the directory allocating most inodes, run the command again to see the sub-directories with most files (eating) partition innodes:

 

for i in /home/webservice/*; do echo $i; find $i |wc -l; done

 

One usual large folder which could free you some nodes is the linux source headers, but in my case it was simply a lot of tiny old logs being logged on the system for few years in the past without cleaning:

After deleting the log dirs and cache folder in my case /home/new_website/{log,cache}:

server:~# rm -rf /home/new_website/log/*
server:~# rm -rf /home/new_website/cache/*

 

 

a) Then, stopping Apache webserver to check prevent Apache to use MySQl databases while running database repair and restaring MySQL:
 

server:~# /etc/init.d/apache2 stop Restarting MySQL server
..
server:~# /etc/init.d/mysql restart
..


b) And re-issuing MySQL Check / Repair / Optimize database commands:
 

 

mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–check –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log

mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf –analyze –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log

mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–auto-repair –optimize –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log

mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–optimize –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log

c) And finally starting the Apache Webserver again:
 

server:~# /etc/init.d/apache2 start


Some innodse got freed up:
 

server:~# df -i /home Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/md2 17006592 16797196 209396 99% /home


And hooray by God's Grace and with help of prayers of The most Holy Theotokos (Virgin) Mary, websites started again !

Fix “Secure Connection Failed” – An error occured SSL received a record that exceeded the maximum permissible length howto

Monday, September 14th, 2015

secure-connection-failed-an-error-occured-during-connection-ssl-received-a-record-that-exceeds-the-maximum-permissible-length-fix-howto
When I was trying to establish a new Internal Business SSL certificate on one of the 6 months planned SPLIT projects (e.g. duplicate a range systems environment to another one), I've stumbled a very odd SSL issue. Once I've setup all the virtualhost SSL configurations properly (identical SSL configuration directives and Apache Webserver version to another host and testing in a browser I was getting the following error:
 

Secure Connection Failed

An error occurred during a connection to 10.253.39.93.

SSL received a record that exceeded the maximum permissible length.

(Error code: ssl_error_rx_record_too_long)


Below is a screenshot:

http://www.pc-freak.net/images/secure-connection-failed-an-error-occured-during-connection-ssl-received-a-record-that-exceeds-the-maximum-permissible-length.png

The page you are trying to view can not be shown because the authenticity of the received data could not be verified. Please contact the web site owners to inform them of this problem. Alternatively, use the command found in the help menu to report this broken site.

The first logical thing to do was to check the error.log but there was no any errors there that point me to anything meaningful, besides that the queries I was making to the Domain doesn't show off as requests neither in Apache access.log nor in error.log so this was puzzling.
I thought I might have messed up something during Key file / CSR generation time so I revoked old certificate and reissued it.

 

$ openssl x509 -text -in test-pegasusgas-eon.intranet.eon-vertrieb.com.crt |less ertificate: Data: Version: 3 (0x2) Serial Number:

Shows that all is fine with certificate Then when trying to test remote certificate with SSL command:

 

openssl s_client -CApath test-pegasusgas-eon.intranet.eon-vertrieb.com.crt -connect test-pegasusgas-eon.intranet.eon-vertrieb.com:443


: There was an error After plenty of research in Google I come to conclusion something is either wrong with Listen httpd.conf directive or NameVirtualHost is binded to port 80 or some other port different from 443, however surprisingly I did not used the NameVirtualHost at all in my apache config. After a lot of pondering I finally spot it. The whole certificate isseus were caused by:

< – Less than sign

which I missaw and forget to clean up from template during IP paste (obtained from /sbin/ifconfig |grep -i xx.xx.xx.xx). So finally in order to fix the SSL error I had to just delete <, e.g.:
 

<VirtualHost <10.253.39.35:443>

had to become:

 

<Virtualhost 10.253.39.35:443>

Such a minor thing took me 3 hours of pondering to resolve and thanksfully it is finally fixed! Then of course had to restart Apache to make fixed Vhost settings working:
 

# apachectl stop; sleep 2; apachectl start

So now the SSL works again, thanks God!

Fix “tar: Error exit delayed from previous errors” and its cause and solution

Monday, August 18th, 2014

fix-solve-tar-error-delayed-exit-from-previous-errors-tarball-error

tar: Error exit delayed from previous errors

error is a very common error encountered when creating archives (or backing up server configurations / websites / sql binary data). The error is quite unexplanatory and whenever creating files verbose in order to see the files added to archve in "real time" with lets say:

tar -czvf /tmp/filename_backup_date-of-backup.tar.gz /home/websites /home/sql


its pretty hard to track on exactly which file is the backup producing the Error exit delayed from previous errors, this is especially the case whenever adding to archive directories containing millions of tiny few kilobyte sized files. Many novice on uncautious Linux admins , might simply ignore the warning if they're in a hurry / are having excessive work to be done as there will be .tar.gz backup produced and whenever uncompressed most of the files are there and the backup error would seem not of a big issue.

However as backuping files is vital stuff, especially when moving the files from a server to be decomissioned you have to be extra careful and make the backup properly, e.g. figure out the cause of the error, to do so log the full output of tar operations with tee command, like so:

tar -czvf /tmp/filename_backup_date-of-backup.tar.gz /home/websites/ /home/sql | tee /tmp/backup_tar_full_output.log

Then you will have to review the file and lookup for errors with less search string – / (slash) – look for "error" and "permission den" keywords and this should point you to what is causing the error. In cases when millions of files are to be archived, the log might grow really big and hard to process, therefore a much quicker way to understand what's happening is to only log and show in shell standard output last file error with > (shell redirect):
 

tar -czvf /tmp/filename_backup_date-of-backup.tar.gz /home/websites /home/sql > /tmp/backup_failure-cause.log

 

tar: www.ur-website.com-http/2.0.63/conf/tnsnames.ora.20080918: Cannot open: Permission denied
tar: Removing leading `/' from member names

The error indicates clearly the cause of error is lack of Permissions to read the file tnsnames.ora.20080918 so solution is to either grant permissions to non-root user with (chmod / chown) cmds, in my case grant perms to user hipo with which tar is ran, or run again the website backup with superuser, I usually just run with root user to prevent tampering with original permissions, e.g. to solve the error, either:

$ su root
# tar -czvf /tmp/filename_backup_date-of-backup.tar.gz /home/websites /home/sql

Or even better if sudo is installed and user is added to /etc/sudoers file

$ sudo tar -czvf /tmp/filename_backup_date-of-backup.tar.gz /home/websites /home/sql


Though permission errors is the most often reason for:

tar: Error exit delayed from previous errors, you should keep in mind that in some cases the error might be caused due to failing RAID membered disk drive or single hdd failure on systems that are not in some RAID array

 

Disable php notice logging / stop variable warnings in error.log on Apache / Nginx / Lighttpd

Monday, July 28th, 2014

disable_php_notice_warnings_logging_in-apache-nginx-lighttpd
At one of companies where I administrate few servers, we are in process of optimizing the server performance to stretch out the maximum out of server hardware and save money from unnecessery hardware costs and thus looking for ways to make server performance better.

On couple of web-sites hosted on few of the production servers, administrating, I've noticed dozens of PHP Notice errors, making the error.log quickly grow to Gigabytes and putting useless hard drive I/O overhead. Most of the php notice warnings are caused by unitialized php variables.

I'm aware having an unitialized values is a horrible security hole, however the websites are running fine even though the notice warnings and currently the company doesn't have the necessery programmers resource to further debug and fix all this undefined php vars, thus what happens is monthly a couple of hundreds megabytes of useless same php notice warnings are written in error.log.

That  error.log errors puts an extra hardship for awstats which is later generating server access statistics while generating the 404 errors statistics and thus awstats script has to read and analyze huge files with plenty of records which doesn't have nothing to do with 404 error

We found this PHP Notice warnings logged is one of the things we can optimize had to be disabled.

Here is how this is done:
On the servers running Debian Wheezy stable to disable php notices.

I had to change in /etc/php5/apache2/php.ini error_reporting variable.

Setting was to log everything (including PHP critical errors, warning and notices) like so:
 

vi /etc/php5/apache2/php.ini

error_reporting = E_ALL & ~E_DEPRECATED

to

error_reporting = E_COMPILE_ERROR|E_ERROR|E_CORE_ERROR


On CentOS, RHEL, SuSE based servers, edit instead /etc/php.ini.

This setting makes Apache to only log in error.log critical errors, php core dump (thread) errors and php code compilation (interpretation errors)

To make settings take affect on Debian host Apache webserver:

/etc/init.d/apache2 restart

On CentOS, RHEL Linux, had to restart Apache with:

/etc/init.d/httpd restart

For other servers running Nginx and Lighttpd webservers, after changing php.ini:

service nginx reload
service lighttpd restart

To disable php notices errors only on some websites, where .htaccess enabled, you can use also place in website DocumentRoot .htaccess:
 

php_value error_reporting 2039


Other way to disable via .htaccess is by adding to it code:
 

php_flag display_errors off


Also for hosted websites on some of the servers, where .htaccess is disabled, enabling / disabling php notices can be easily triggered by adding following php code to index.php

define('DEBUG', true);

if(DEBUG == true)
{
    ini_set('display_errors', 'On');
    error_reporting(E_ALL);
}
else
{
    ini_set('display_errors', 'Off');
    error_reporting(0);
}

 

Linux / BSD: Check if Apache web server is listening on port 80 and 443

Tuesday, June 3rd, 2014

apache_check_if_web_server_running_port-80-and-port-443-logo-linux-and-bsd-check-apache-running
If you're configuring a new Webserver or adding a new VirtualHost to an existing Apache configuration you will need to restart Apache with or without graceful option once Apache is restarted to assure Apache is continuously running on server (depending on Linux distribution) issue:

1. On Debian Linux / Ubuntu servers

# ps axuwf|grep -i apache|grep -v grep

root 23280 0.0 0.2 388744 16812 ? Ss May29 0:13 /usr/sbin/apache2 -k start
www-data 10815 0.0 0.0 559560 3616 ? S May30 2:25 _ /usr/sbin/apache2 -k start
www-data 10829 0.0 0.0 561340 3600 ? S May30 2:31 _ /usr/sbin/apache2 -k start
www-data 10906 0.0 0.0 554256 3580 ? S May30 0:20 _ /usr/sbin/apache2 -k start
www-data 10913 0.0 0.0 562488 3612 ? S May30 2:32 _ /usr/sbin/apache2 -k start
www-data 10915 0.0 0.0 555524 3588 ? S May30 0:19 _ /usr/sbin/apache2 -k start
www-data 10935 0.0 0.0 553760 3588 ? S May30 0:29 _ /usr/sbin/apache2 -k start

 


2. On CentOS, Fedora, RHEL and SuSE Linux and FreeBSD

ps ax | grep httpd | grep -v grep

 

7661 ? Ss 0:00 /usr/sbin/httpd
7664 ? S 0:00 /usr/sbin/httpd
7665 ? S 0:00 /usr/sbin/httpd
7666 ? S 0:00 /usr/sbin/httpd
7667 ? S 0:00 /usr/sbin/httpd
7668 ? S 0:00 /usr/sbin/httpd
7669 ? S 0:00 /usr/sbin/httpd
7670 ? S 0:00 /usr/sbin/httpd
7671 ? S 0:00 /usr/sbin/httpd

 

Whether a new Apache IP Based VirtualHosts are added to already existing Apache and you have added new

Listen 1.1.1.1:80
Listen 1.1.1.1:443

directives, after Apache is restarted to check whether Apache is listening on port :80 and :443
 

netstat -ln | grep -E ':80|443'

tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:443            0.0.0.0:*               LISTEN


Meaning of 0.0.0.0 is that Apache is configured to Listen on Any Virtualhost IPs and interfaces. This output is usually returned whether in Apache config httpd.conf / apache2.conf webserver is configured with directive.

Listen *:80
 

If in netstat output there is some IP poping up for example  "192.168.1.1:http", this means that only connections to the "192.168.1.1" IP address will be accepted by Apache.

Another way to look for Apache in netstat (in case Apache is configured to listen on some non-standard port number) is with:

netstat -l |grep -E 'http|www'

tcp        0      0 *:www                   *:*                     LISTEN


As sometimes it might be possible that Apache is listening but its processes are in in defunct (Zommbie) state it is always a good idea, also to check if pages server by Apache are opening in browser (check it with elinks, lynx or curl)

To get more thorough information on Apache listened ports, protocol, user with which Apache is running nomatter of Linux distribution use lsof command:
 

/usr/bin/lsof -i|grep -E 'httpd|http|www'

httpd     6982 nobody    3u  IPv4  29388359      0t0  TCP pc-freak.net:https (LISTEN)
httpd    18071 nobody    3u  IPv4 702790659      0t0  TCP pc-freak.net:http (LISTEN)
httpd    18071 nobody    4u  IPv4 702790661      0t0  TCP pc-freak.net.net:https (LISTEN)


If Apache is not showing up even though restarted check what is going wrong in the error logs:

– on Debian standard error log is /var/log/apache2/error.log
– On RHEL, CentOS, SuSE std. error log is in /var/log/httpd/error.log
– on FeeBSD /var/log/httpd-error.log

 

How to Turn Off, Suppress PHP Notices and Warnings – PHP error handling levels via php.ini and PHP source code

Friday, April 25th, 2014

php-logo-disable-warnings-and-notices-in-php-through-htaccess-php-ini-and-php-code

PHP Notices are common to occur after PHP version upgrades or where an obsolete PHP code is moved from Old version PHP to new version. This is common error in web software using Frameworks which have been abandoned by developers.

Having PHP Notices to appear on a webpage is pretty ugly and give a lot of information which might be used by malicious crackers to try to break your site thus it is always a good idea to disable PHP Notices. There are plenty of ways to disable PHP Notices

The easiest way to disable it is globally in all Webserver PHP library via php.ini (/etc/php.ini) open it and make sure display_errors is disabled:

display_errors = 0

or

display_errors = Off

Note that that some claim in PHP 5.3 setting display_errors to Off will not work as expected. Anyways to make sure where your loaded PHP Version display_errors is ON or OFF use phpinfo();

It is also possible to disable PHP Notices and error reporting straight from PHP code you need code like:

 

<?php
// Turn off all error reporting
error_reporting(0);
?>

 

or through code:

 

ini_set('display_errors',0);


PHP has different levels of error reporting, here is complete list of possible error handling variables:

 

 

 

<?php
// Report simple running errors

error_reporting(E_ERROR | E_WARNING | E_PARSE);

// Reporting E_NOTICE can be good too (to report uninitialized
// variables or catch variable name misspellings …)

error_reporting(E_ERROR | E_WARNING | E_PARSE | E_NOTICE);

// Report all errors except E_NOTICE
// This is the default value set in php.ini

error_reporting(E_ALL ^ E_NOTICE);
// Report all PHP errors (see changelog)

error_reporting(E_ALL);
// Report all PHP errors error_reporting(-1);
// Same as error_reporting(E_ALL);

ini_set('error_reporting', E_ALL); ?>

The level of logging could be tuned on Debian Linux via /etc/php5/apache2/php.ini or if necessary to set PHP log level in PHP CLI through /etc/php5/cli/php.ini with:

error_reporting = E_ALL & ~E_NOTICE

 

If you need to remove to remove exact warning or notices from PHP without changing the way  PHPLib behaves is to set @ infront of variable or function that is causing NOTICES or WARNING:
For example:
 

@yourFunctionHere();
@var = …;


Its also possible to Disable PHP Notices and Warnings using .htaccess file (useful in shared hosting where you don't have access to global php.ini), here is how:

# PHP error handling for development servers
php_flag display_startup_errors off
php_flag display_errors off
php_flag html_errors off
php_flag log_errors on
php_flag ignore_repeated_errors off
php_flag ignore_repeated_source off
php_flag report_memleaks on
php_flag track_errors on
php_value docref_root 0
php_value docref_ext 0
php_value error_log /home/path/public_html/domain/php_errors.log
php_value error_reporting -1
php_value log_errors_max_len 0

This way though PHP Notices and Warnings will be suppressed errors will get logged into php_error.log

How to detect failing storage LUN on Linux – multipath

Wednesday, April 16th, 2014

detect-failing-lun-on-linux-failing-scsi-detection

If you login to server and after running dmesg – show kernel log command you get tons of messages like:

# dmesg

 

end_request: I/O error, dev sdc, sector 0
sd 0:0:0:1: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdb, sector 0
sd 0:0:0:0: SCSI error: return code = 0x00010000
end_request: I/O error, dev sda, sector 0
sd 0:0:0:2: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdc, sector 0
sd 0:0:0:1: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdb, sector 0
sd 0:0:0:0: SCSI error: return code = 0x00010000
end_request: I/O error, dev sda, sector 0
sd 0:0:0:2: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdc, sector 0
sd 0:0:0:1: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdb, sector 0
sd 0:0:0:0: SCSI error: return code = 0x00010000
end_request: I/O error, dev sda, sector 0

 

In /var/log/messages there are also log messages present like:

# cat /var/log/messages
...

 

Apr 13 09:45:49 sl02785 kernel: end_request: I/O error, dev sda, sector 0
Apr 13 09:45:49 sl02785 kernel: klogd 1.4.1, ———- state change ———-
Apr 13 09:45:50 sl02785 kernel: sd 0:0:0:1: SCSI error: return code = 0x00010000
Apr 13 09:45:50 sl02785 kernel: end_request: I/O error, dev sdb, sector 0
Apr 13 09:45:55 sl02785 kernel: sd 0:0:0:2: SCSI error: return code = 0x00010000
Apr 13 09:45:55 sl02785 kernel: end_request: I/O error, dev sdc, sector 0

 

 

This is a sure sign something is wrong with SCSI hard drives or SCSI controller, to further debug the situation you can use the multipath command, i.e.:


multipath -ll | grep -i failed
_ 0:0:0:2 sdc 8:32 [failed][faulty]
_ 0:0:0:1 sdb 8:16 [failed][faulty]
_ 0:0:0:0 sda 8:0 [failed][faulty]

 

As you can see all 3 drives merged (sdc, sdb and sda) to show up on 1 physical drive via the remote network connectedLUN to the server is showing as faulty. This is a clear sign something is either wrong with 3 hard drive members of LUN – (Logical Unit Number) (which is less probable)  or most likely there is problem with the LUN network  SCSI controller. It is common error that LUN SCSI controller optics cable gets dirty and needs a physical clean up to solve it.

In case you don't know what is storage LUN? – In computer storage, a logical unit number, or LUN, is a number used to identify a logical unit, which is a device addressed by the SCSI protocol or protocols which encapsulate SCSI, such as Fibre Channel or iSCSI. A LUN may be used with any device which supports read/write operations, such as a tape drive, but is most often used to refer to a logical disk as created on a SAN. Though not technically correct, the term "LUN" is often also used to refer to the logical disk itself.

What LUN's does is very similar to Linux's Software LVM (Logical Volume Manager).