Posts Tagged ‘log’

Why du and df reporting different on a filesystem / How to fix inconsistency between used space on FS and disk showing full strangeness

Wednesday, July 24th, 2019

linux-why-du-and-df-shows-different-result-inconsincy-explained-filesystem-full-oddity

If you're a sysadmin on a large server environment such as a couple of hundred of Virtual Machines running Linux OS on either physical host or OpenXen / VmWare hosted guest Virtual Machine, you might end up sometimes at an odd case where some mounted partition mount point reports its file use different when checked with
df
cmd than when checked with du command, like for example:
 

root@sqlserver:~# df -hT /var/lib/mysql
Filesystem   Type  Size Used Avail Use% Mounted On
/dev/sdb5      ext4    19G  3,4G    14G  20% /var/lib/mysql

Here the '-T' argument is used to show us the filesystem.

root@sqlserver:~# du -hsc /var/lib/mysql
0K    /var/lib/mysql/
0K    total

 

1. Simple debug on what might be the root cause for df / du inconsistency reporting

 

Of course the basic thing to do when in that weird situation is to be totally shocked how this is possible and to investigate a bit what is the biggest first level sub-directories that eat up the space on the mounted location, with du:

 

# du -hkx –max-depth=1 /var/lib/mysql/|uniq|sort -n
4       /var/lib/mysql/test
8       /var/lib/mysql/ezmlm
8       /var/lib/mysql/micropcfreak
8       /var/lib/mysql/performance_schema
12      /var/lib/mysql/mysqltmp
24      /var/lib/mysql/speedtest
64      /var/lib/mysql/yourls
144     /var/lib/mysql/narf
320     /var/lib/mysql/webchat_plus
424     /var/lib/mysql/goodfaithair
528     /var/lib/mysql/moonman
648     /var/lib/mysql/daniel
852     /var/lib/mysql/lessn
1292    /var/lib/mysql/gallery

The given output is in Kilobytes so it is a little bit hard to read, if you're used to Mbytes instead, do

 

 # du -hmx –max-depth=1 /var/lib/mysql/|uniq|sort -n|less

 

I've also investigated on the complete /var directory contents sorted by size with:

 

 # du -akx ./ | sort -n
5152564    ./cache/rsnapshot/hourly.2/localhost
5255788    ./cache/rsnapshot/hourly.2
5287912    ./cache/rsnapshot
7192152    ./cache


Even after finding out the bottleneck dirs and trying to clear up a bit, continued facing that inconsistently shown in two commands and if you're likely to be stunned like me and try … to move some files to a different filesystem to free up space or assigned inodes with a hope that shown inconsitency output will be fixed as it might be caused  due to some kernel / FS caching ?? and this will eventually make the mounted FS to refresh …

But unfortunately, if you try it you'll figure out clearing up a couple of Megas or Gigas will make no difference in cmd output.

In my exact case /var/lib/mysql is a separate mounted ext4 filesystem, however same issue was present also on a Network Filesystem (NFS) and thus, my first thought that this is caused by a network failure problem or NFS bug turned to be wrong.

After further short investigation on the inodes on the Filesystem, it was clear enough inodes are available:
 

# df -i /var/lib/mysql
Filesystem       Inodes  IUsed   IFree IUse% Mounted on
/dev/sdb5      1221600  2562 1219038   1% /var/lib/mysql

 

So the filled inodes count assumed issue also has been rejected.
P.S. (if you're not well familiar with them read manual, i.e. – man 7 inode).
 

– Remounting the mounted filesystem

To make sure the filesystem shown inconsistency between du and df is not due to some hanging network mount or bug, first logical thing I did is to remount the filesytem showing different in size, in my case this was done with:
 

# mount -o remount,rw -t ext4 /var/lib/mysql

For machines with NFS remote mounted storage locations, used:

# mount -o remount,rw -t nfs /var/www


FS remount did not solved it so I continued to ponder what oddity and of course I thought of a workaround (in case if this issues are caused by kernel bug or OS lib issue) reboot might be the solution, however unfortunately restarting the VMs was not a wanted easy to do solution, thus I continued investigating what is wrong …

Next check of course was to check, what kind of network connections are opened to the affected hosts with:
 

# netstat -tupanl


Did not found anything that might point me to the reported different Megabytes issue, so next step was to check what is the situation with currently opened files by running processes on the weird df / du reported systems with lsof, and boom there I observed oddity such as multiple files

 

# lsof -nP | grep '(deleted)'

COMMAND   PID   USER   FD   TYPE DEVICE    SIZE NLINK  NODE NAME
mysqld   2588  mysql    4u   REG 253,17      52     0  1495 /var/lib/mysql/tmp/ibY0cXCd (deleted)
mysqld   2588  mysql    5u   REG 253,17    1048     0  1496 /var/lib/mysql/tmp/ibOrELhG (deleted)
mysqld   2588  mysql    6u   REG 253,17       777884290     0  1497 /var/lib/mysql/tmp/ibmDFAW8 (deleted)
mysqld   2588  mysql    7u   REG 253,17       123667875     0 11387 /var/lib/mysql/tmp/ib2CSACB (deleted)
mysqld   2588  mysql   11u   REG 253,17       123852406     0 11388 /var/lib/mysql/tmp/ibQpoZ94 (deleted)

 

Notice that There were plenty of '(deleted)' STATE files shown in memory an overall of 438:

 

# lsof -nP | grep '(deleted)' |wc -l
438


As I've learned a bit online about the problem, I found it is also possible to find deleted unlinked files only without any greps (to list all deleted files in memory files with lsof args only):

 

# lsof +L1|less


The SIZE field (fourth column)  shows a number of files that are really hard in size and that are kept in open on filesystem and in memory, totally messing up with the filesystem. In my case this is temp files created by MYSQLD daemon but depending on the server provided service this might be apache's www-data, some custom perl / bash script executed via a cron job, stalled rsync jobs etc.
 

2. Check all the list open files with the mysql / root user as part of the the server filesystem inconsistency debugging with:

 

– Grep opened files on server by user

# lsof |grep mysql
mysqld    1312                       mysql  cwd       DIR               8,21       4096          2 /var/lib/mysql
mysqld    1312                       mysql  rtd       DIR                8,1       4096          2 /
mysqld    1312                       mysql  txt       REG                8,1   20336792   23805048 /usr/sbin/mysqld
mysqld    1312                       mysql  mem       REG               8,21      24576         20 /var/lib/mysql/tc.log
mysqld    1312                       mysql  DEL       REG               0,16                 29467 /[aio]
mysqld    1312                       mysql  mem       REG                8,1      55792   14886933 /lib/x86_64-linux-gnu/libnss_files-2.28.so

 

# lsof | grep root
COMMAND    PID   TID TASKCMD          USER   FD      TYPE             DEVICE   SIZE/OFF       NODE NAME
systemd      1                        root  cwd       DIR                8,1       4096          2 /
systemd      1                        root  rtd       DIR                8,1       4096          2 /
systemd      1                        root  txt       REG                8,1    1489208   14928891 /lib/systemd/systemd
systemd      1                        root  mem       REG                8,1    1579448   14886924 /lib/x86_64-linux-gnu/libm-2.28.so

Other command that helped to track the discrepancy between df and du different file usage on FS is:
 

# du -hxa  / | egrep '^[[:digit:]]{1,1}G[[:space:]]*'
 

 

3. Fixing large files kept in memory filesystem problem


What is the real reason for ending up with this file handlers opened by running backgrounded programs on the Linux OS?
It could be multiple  but most likely it is due to exceeded server / client interactions or breaking up RAM or HDD drive with writing plenty of logs on the FS without ending keeping space occupied or Programming library bugs used by hanged service leaving the FH opened on storage.

What is the solution to file system files left in memory problem?

The best solution is to first fix custom script or hanged service and then if possible to simply restart the server to make the kernel / services reload or if this is not possible just restart the problem creation processes.

Once the process is identified like in my case this was MySQL on systemd enabled newer OS distros, just do:

 

 

# systemctl restart mysqld.service


or on older init.d system V ones:

# /etc/init.d/service restart


For custom hanged scripts being listed in ps axuwef you can grep the pid and do a kill -HUP (if the script is written in a good way to recognize -HUP and restart the sub-running process properly – BE EXTRA CAREFUL IF YOU'RE RESTARTING BROKEN SCRIPTS as this might cause your running service disruptions …).

# pgrep -l script.sh
7977 script.sh


# kill -HUP PID

 

Now finally this should either mitigate or at best case completely solve the reported disagreement between df and du, after which the calculated / reported disk space should be back to normal and show up approximately the same (note that size changes a bit as mysql service is writting data) constantly extending the size between the two checks.

 

# df -hk /var/lib/mysql; du -hskc /var/lib/mysql
Filesystem       Inodes  IUsed   IFree IUse% Mounted on
/dev/sdb5        19097172 3472744 14631296  20% /var/lib/mysql
3427772    /var/lib/mysql
3427772    total

 

What we learned?

What I've explained in this article is why and how it comes that 'zoombie' files reside on a filesystem
appearing to be eating disk space on a mounted local or network partition, giving strange inconsistent
reports, leading to system service disruptions and impossibility to have correctly shown information on used
disk space on mounted drive.

I went through with some standard logic on debugging service / filesystem / inode issues up explainat, that led me to the finding about deleted files being kept in filesystem and producing the filesystem strange sized / showing not correct / filled even after it was extended with tune2fs and was supposed to have extra 50GBs.

Finally it was explained shortly how to HUP / restart hanging script / service to fix it.

Some few good readings that helped to fix the issue:

What to do when du and df report different usage is here
df in linux not showing correct free space after file removal is here
Why do “df” and “du” commands show different disk usage?
 

xorg on Toshiba Satellite L40 14B with Intel GM965 video hangs up after boot and the worst fix ever / How to reinstall Ubuntu by keeping the old personal data and programs

Wednesday, April 27th, 2011

black screen ubuntu troubles

I have updated Ubuntu version 9.04 (Jaunty) to 9.10 and followed the my previous post update ubuntu from 9.04 to Latest Ubuntu

I expected that a step by step upgrade from a release to release will work like a charm and though it does on many notebooks it doesn't on Toshiba Satellite L40

The update itself went fine, whether I used the update-manager -d and followed the above pointed tutorial, however after a system restart the PC failed to boot the X server properly, a completely blank screen with blinking cursor appeared and that was all.

I restarted the system into the 2.6.35-28-generic kernel rescue-mode recovery kernel in order to be able to enter into physical console.

Logically the first thing I did is to check /var/log/messages and /var/log/Xorg.0.log but I couldn't find nothing unusual or wrong there.

I suspected something might be wrong with /etc/X11/xorg.conf so I deleted it:

ubuntu:~# rm -f /etc/X11/xorg.conf

and attempted to re-create the xorg.conf X configuration with command:

ubuntu:~# dpkg-reconfigure xserver-xorg

This command was reported to be the usual way to reconfigure the X server settings from console, but in my case (for unknown reasons) it did nothing.

Next the command which was able to re-generate the xorg.conf file was:

ubuntu:~# X -configure

The command generates a xorg.conf sample file in /root/xorg.conf.* so I used the conf to put it in /etc/X11/xorg.conf X's default location and restarted in hope that this would fix the non-booting issue.

Very sadly again the black screen of death appeared on the notebook toshiba screen.
I further thought of completely wipe out the xorg.conf in hope that at least it might boot without the conf file but this worked out neither.

I attempted to run the Xserver with a xorg.conf configured to work with vesa as it's well known vesa X server driver is supposed to work on 99% of the video cards, as almost all of them nowdays are compatible with the vesa standard, but guess what in my case vesa worked not!

The only version of X I can boot in was the failsafe X screen mode which is available through the grub's boot menu recovery mode.

Further on I decided to try few xorg.conf which I found online and were reported to work fine with Intel GM965 internal video , and yes this was also unsucessful.

Some of my other futile attempts were: to re-install the xorg server with apt-get, reinstall the xserver-xorg-video-intel driver e.g.:

ubuntu:~# apt-get install --reinstall xserver-xorg xserver-xorg-video-intel

As nothing worked out I was completely pissed off and decided to take an alternative approach which will take a lot of time but at least will probably be succesful, I decided to completely re-install the Ubuntu from a CD after backing up the /home directory and making a list of available packages on the system, so I can further easily run a tiny bash one-liner script to install all the packages which were previously existing on the laptop before the re-install:

Here is how I did it:

First I archived the /home directory:

ubuntu:/# tar -czvf home.tar.gz home/
....

For 12GB of data with some few thousands of files archiving it took about 40 minutes.

The tar spit archive became like 9GB and I hence used sftp to upload it to a remote FTP server as I was missing a flash drive or an external HDD where I can place the just archived data.

Uploading with sftp can be achieved with a command similar to:

sftp user@yourhost.com
Password:
Connected to yourhost.com.
sftp> put home.tar.gz

As a next step to backup in a file the list of all current installed packages, before I can further proceed to boot-up with the Ubuntu Maverich 10.10 CD and prooceed with the fresh install I used command:

for i in $(dpkg -l| awk '{ print $2 }'); do
echo $i; done >> my_current_ubuntu_packages.txt

Once again I used sftp as in above example to upload my_current_update_packages.txt file to my FTP host.

After backing up all the stuff necessery, I restarted the system and booted from the CD-rom with Ubuntu.
The Ubuntu installation as usual is more than a piece of cake and even if you don't have a brain you can succeed with it, so I wouldn't comment on it 😉

Right after the installation I used the sftp client once again to fetch the home.tar.gz and my_current_ubuntu_packages.txt

I placed the home.tar.gz in /home/ and untarred it inside the fresh /home dir:

ubuntu:/home# tar -zxvf home.tar.gz

Eventually the old home directory was located in /home/home so thereon I used Midnight Commander ( the good old mc text file explorer and manager ) to restore the important user files to their respective places.

As a last step I used the my_current_ubuntu_packages.txt in combination with a tiny shell script to install all the listed packages inside the file with command:

ubuntu:~# for i in $(cat my_current_ubuntu_packagespackages.txt); do
apt-get install --yes $i; sleep 1;
done

You will have to stay in front of the computer and manually answer a ncurses interface questions concerning some packages configuration and to be honest this is really annoying and time consuming.

Summing up the overall time I spend with this stupid Toshiba Satellite L40 with the shitty Intel GM965 was 4 days, where each day I tried numerous ways to fix up the X and did my best to get through the blank screen xserver non-bootable issue, without a complete re-install of the old Ubuntu system.
This is a lesson for me that if I stumble such a shitty issues I will straight proceed to the re-install option and not loose my time with non-sense fixes which would never work.

Hope the article might be helpful to somebody else who experience some problems with Linux similar to mine.

After all at least the Ubuntu Maverick 10.10 is really good looking in general from a design perspective.
What really striked me was the placement of the close, minimize and maximize window buttons , it seems in newer Ubuntus the ubuntu guys decided to place the buttons on the left, here is a screenshot:

Left button positioning of navigation Buttons in Ubuntu 10.10

I believe the solution I explain, though very radical and slow is a solution that would always work and hence worthy 😉
Let me hear from you if the article was helpful.

How to disable tidy HTML corrector and validator to output error and warning messages

Sunday, March 18th, 2012

I've noticed in /var/log/apache2/error.log on one of the Debian servers I manage a lot of warnings and errors produced by tidy HTML syntax checker and reformatter program.

There were actually quite plenty frequently appearing messages in the the log like:

...
To learn more about HTML Tidy see http://tidy.sourceforge.net
Please fill bug reports and queries using the "tracker" on the Tidy web site.
Additionally, questions can be sent to html-tidy@w3.org
HTML and CSS specifications are available from http://www.w3.org/
Lobby your company to join W3C, see http://www.w3.org/Consortium
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 1 - Warning: plain text isn't allowed in <head> elements
line 1 column 1 - Info: <head> previously mentioned
line 1 column 1 - Warning: inserting implicit <body>
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML 3.2
4 warnings, 0 errors were found!
...

I did a quick investigation on where from this messages are logged in error.log, and discovered few .php scripts in one of the websites containing the tidy string.
I used Linux find + grep cmds find in all php files the "tidy "string, like so:

server:~# find . -iname '*.php'-exec grep -rli 'tidy' '{}' ;
find . -iname '*.php' -exec grep -rli 'tidy' '{}' ; ./new_design/modules/index.mod.php
./modules/index.mod.php
./modules/index_1.mod.php
./modules/index1.mod.php

Opening the files, with vim to check about how tidy is invoked, revealed tidy calls like:

exec('/usr/bin/tidy -e -ashtml -utf8 '.$tmp_name,$rett);

As you see the PHP programmers who wrote this website, made a bigtidy mess. Instead of using php5's tidy module, they hard coded tidy external command to be invoked via php's exec(); external tidy command invocation.
This is extremely bad practice, since it spawns the command via a pseudo limited apache shell.
I've notified about the issue, but I don't know when, the external tidy calls will be rewritten.

Until the external tidy invocations are rewritten to use the php tidy module, I decided to at least remove the tidy warnings and errors output.

To remove the warning and error messages I've changed:

exec('/usr/bin/tidy -e -ashtml -utf8 '.$tmp_name,$rett);

exec('/usr/bin/tidy --show-warnings no --show-errors no -q -e -ashtml -utf8 '.$tmp_name,$rett);

The extra switches meaning is like so:

q – instructs tidy to produce quiet output
-e – show only errors and warnings
–show warnings no && –show errors no, completely disable warnings and error output

Onwards tidy no longer logs junk messages in error.log Not logging all this useless warnings and errors has positive effect on overall server performance especially, when the scripts, running /usr/bin/tidy are called as frequently as 1000 times per sec. or more

Check linux install date / How do I find out how long a Linux server OS was installed?

Wednesday, March 30th, 2016

linux-check-install-date-howto-commands-on-debian-and-fedora-tux_the_linux_penguin_by_hello

To find out the Linux install date, there is no one single solution according to the Linux distribution type and version, there are some common ways to get the Linux OS install age.
Perhaps the most popular way to get the OS installation date and time is to check out when the root filesystem ( / ) was created, this can be done with tune2fs command

 

server:~# tune2fs -l /dev/sda1 | grep 'Filesystem created:'
Filesystem created:       Thu Sep  6 21:44:22 2012

 

server:~# ls -alct /|tail -1|awk '{print $6, $7, $8}'
sep 6 2012

 

root home directory is created at install time
 

 

server:~# ls -alct /root

 

root@server:~# ls -lAhF /etc/hostname
-rw-r–r– 1 root root 8 sep  6  2012 /etc/hostname

 

For Debian / Ubuntu and other deb based distributions the /var/log/installer directory is being created during OS install, so on Debian the best way to check the Linux OS creation date is with:
 

root@server:~# ls -ld /var/log/installer
drwxr-xr-x 3 root root 4096 sep  6  2012 /var/log/installer/
root@server:~# ls -ld /lost+found
drwx—— 2 root root 16384 sep  6  2012 /lost+found/

 

On Red Hat / Fedora / CentOS, redhat based Linuces , you can use:

 

rpm -qi basesystem | grep "Install Date"

 

basesystem is the package containing basic Linux binaries many of which should not change, however in some cases if there are some security updates package might change so it is also good to check the root filesystem creation time and compare whether these two match.

Howto Fix “sysstat Cannot open /var/log/sysstat/sa no such file or directory” on Debian / Ubuntu Linux

Monday, February 15th, 2016

sysstast-no-such-file-or-directory-fix-Debian-Ubuntu-Linux-howto
I really love sysstat and as a console maniac I tend to install it on every server however by default there is some <b>sysstat</b> tuning once installed to make it work, for those unfamiliar with <i>sysstat</i> I warmly recommend to check, it here is in short the package description:<br /><br />
 

server:~# apt-cache show sysstat|grep -i desc -A 15
Description: system performance tools for Linux
 The sysstat package contains the following system performance tools:
  – sar: collects and reports system activity information;
  – iostat: reports CPU utilization and disk I/O statistics;
  – mpstat: reports global and per-processor statistics;
  – pidstat: reports statistics for Linux tasks (processes);
  – sadf: displays data collected by sar in various formats;
  – nfsiostat: reports I/O statistics for network filesystems;
  – cifsiostat: reports I/O statistics for CIFS filesystems.
 .
 The statistics reported by sar deal with I/O transfer rates,
 paging activity, process-related activities, interrupts,
 network activity, memory and swap space utilization, CPU
 utilization, kernel activities and TTY statistics, among
 others. Both UP and SMP machines are fully supported.
Homepage: http://pagesperso-orange.fr/sebastien.godard/

 

If you happen to install sysstat on a Debian / Ubuntu server with:

server:~# apt-get install –yes sysstat


, and you try to get some statistics with sar command but you get some ugly error output from:

 

server:~# sar Cannot open /var/log/sysstat/sa20: No such file or directory


And you wonder how to resolve it and to be able to have the server log in text databases periodically the nice sar stats load avarages – %idle, %iowait, %system, %nice, %user, then to FIX that Cannot open /var/log/sysstat/sa20: No such file or directory

You need to:

server:~# vim /etc/default/sysstat


By Default value you will find out sysstat stats it is disabled, e.g.:

ENABLED="false"

Switch the value to "true"

ENABLED="true"


Then restart sysstat init script with:

server:~# /etc/init.d/sysstat restart

However for those who prefer to do things from menu Ncurses interfaces and are not familiar with Vi Improved, the easiest way is to run dpkg reconfigure of the sysstat:

server:~# dpkg –reconfigure


sysstat-reconfigure-on-gnu-linux

 

root@server:/# sar
Linux 2.6.32-5-amd64 (pcfreak) 15.02.2016 _x86_64_ (2 CPU)

0,00,01 CPU %user %nice %system %iowait %steal %idle
0,15,01 all 24,32 0,54 3,10 0,62 0,00 71,42
1,15,01 all 18,69 0,53 2,10 0,48 0,00 78,20
10,05,01 all 22,13 0,54 2,81 0,51 0,00 74,01
10,15,01 all 17,14 0,53 2,44 0,40 0,00 79,49
10,25,01 all 24,03 0,63 2,93 0,45 0,00 71,97
10,35,01 all 18,88 0,54 2,44 1,08 0,00 77,07
10,45,01 all 25,60 0,54 3,33 0,74 0,00 69,79
10,55,01 all 36,78 0,78 4,44 0,89 0,00 57,10
16,05,01 all 27,10 0,54 3,43 1,14 0,00 67,79


Well that's it now sysstat error resolved, text reporting stats data works again, Hooray! 🙂

No space left on device with free disk space / Why no space left on device while there is plenty of disk space on drive – Running out of Inodes

Tuesday, November 17th, 2015

no_space_left-on-device-while-there-is-disk-space-running-out-of-file-inodes-unix_linux_file_system_diagram.gif

 

On one of the servers, I'm administrating the websites started showing some Mysql database table corrup errors like:
 

 

Table './database_name/site_news_list_com' is marked as crashed and last (automatic?) repair failed

The server is using Oracle MySQL server community stable edition on Debian GNU / Linux 6.0, so I first thought during work the server crashed either due to some bug issue in MySQL or it crashed due to some PHP cron job that did something messy. Thus to solve the crashed tables, tried using mysqlcheck tool which helped pretty fine, at many times whether there were database / table corruptions. I've run the following set of mysqlcheck commands with root (superuser) in a bash shell after logging in through SSH:

:

server:~# /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–check –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log
server:~# /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf –analyze –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log
server:~# /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–auto-repair –optimize –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log
server:~# /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–optimize –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log


In order for above commands to work, I've created the /root/.my.cnf containing my root (mysql CLI) mysql username and password, e.g. file has content like below:

 

[client]
user=root
password=MySecretPassword8821238

 

Btw a good note here is its generally a good idea (if you want to have consistent mysql databases) to automatically execute via a cron job 2 times a month, I've in root cronjob the following:

 

crontab -u root -l |grep -i mysqlcheck
04 06 5,10,15,20,25,1 * * /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–check –all-databases –silent -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log 07 06 5,10,15,20,25,1 * * /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf –analyze –all-databases –silent -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log 12 06 5,10,15,20,25,1 * * /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–auto-repair –optimize –all-databases –silent -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log 17 06 5,10,15,20,25,1 * * /usr/bin/mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–optimize –all-databases –silent -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log


Strangely I got a lot of errors that some .MYI / .MYD .frm temp files, necessery for the mysql tables recovery can't be written inside /home/mysql/database_name

That was pretty weird and I thought there might be some issues with permissions, causing the inability to write, due to some bug or something so I went straight and checked /home/mysql/database_name permissions, e.g.::

 

server:/home/mysql/database_name# ls -ld soccerfame
drwx—— 2 mysql mysql 36864 Nov 17 12:00 soccerfame
server:/home/mysql/database_name# ls -al1|head -n 10
total 1979012
drwx—— 2 mysql mysql 36864 Nov 17 12:00 .
drwx—— 36 mysql mysql 4096 Nov 17 11:12 ..
-rw-rw—- 1 mysql mysql 8712 Nov 17 10:26 1_campaigns_diez.frm
-rw-rw—- 1 mysql mysql 14672 Jul 8 18:57 1_campaigns_diez.MYD
-rw-rw—- 1 mysql mysql 1024 Nov 17 11:38 1_campaigns_diez.MYI
-rw-rw—- 1 mysql mysql 8938 Nov 17 10:26 1_campaigns.frm
-rw-rw—- 1 mysql mysql 8738 Nov 17 10:26 1_campaigns_logs.frm
-rw-rw—- 1 mysql mysql 883404 Nov 16 22:01 1_campaigns_logs.MYD
-rw-rw—- 1 mysql mysql 330752 Nov 17 11:38 1_campaigns_logs.MYI


As seen from above output, all was perfect with permissions, so it should have been something else, so I decided to try to create a random file with touch command inside /home/mysql/database_name directory:

 

touch /home/mysql/database_name/somefile-to-test-writtability.txt touch: cannot touch ‘/scr1/data/somefile-to-test-writtability.txt‘: No space left on device


Then logically I thought the /home/mysql/ mounted ext4 partition got filled, because of crashed SQL database or a bug thus, checked with disk free command df whether there is enough space on server:

server:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md1 20G 7.6G 11G 42% /
udev 10M 0 10M 0% /dev
tmpfs 13G 1.3G 12G 10% /run
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/md2 256G 134G 110G 55% /home

Well that's weird? Obviously only 55% of available disk space is used and available 134G which was more than enough so I got totally puzzled why, files can't be written.

Then very logically, I thought it might be that /home directory has remounted as read only, because the SSD memory disk on server is failing and checked for errors in dmesg, i.e.:

 

server:~# dmesg|grep -i error


Also checked how exactly was partition mounted, to check whether it is (RO) read-only:

 

server:~# mount -l|grep -i /home
/dev/md2 on /home type ext4 (rw,relatime,discard,data=ordered)


Now everything become even more weirder, as obviously the disk continued to be claiming no space left on device, while in reality there was plenty of disk space.

Then after running a quick research on the internet for the no space left on device with free disk space, I've come across this great superuser.com thread which let me realize the partition run out of inodes and that's why no new file inodes could be assigned and therefore, the linux kernel is refusing to write the file on ext4 partition.

For those who haven't heard of Linux Partition Inodes here is link to Wikipedia and a quick quote:

 

In a Unix-style file system, the inode is a data structure used to represent a filesystem object, which can be one of various things including a file or a directory. Each inode stores the attributes and disk block location(s) of the filesystem object's data.[1] Filesystem object attributes may include manipulation metadata (e.g. change,[2] access, modify time), as well as owner and permission data (e.g. group-id, user-id, permissions).[3]
Directories are lists of names assigned to inodes. The directory contains an entry for itself, its parent, and each of its children.


Once I understood it is the inodes, I checked how many of them are occupied with cmd:

 

server:~# df -i /home
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/md2 17006592 17006592 0 100% /home


You see, there were 0 (zero) free file inodes on server and that was the reason for no space left on device while there was actually free disk space

To clean up (free) some inodes on partition, first thing I did is to delete all old logs which were inside /home and files I positively know not to be necessery, then to find which directories allocating most innodes used:

 

server:~# find . -xdev -type f | cut -d "/" -f 2 | sort | uniq -c | sort -n


If you're on a regular old fashined IDE Hard Drive and not SSD or you have too much files inside this command will take really long …:

Therefore a better solution might be to frist:

a) Try to find root folders with large inodes count:

for i in /home/*; do echo $i; find $i |wc -l; done
Try to find specific folders:


You should get output like:

 

/home/new_website
606692
/home/common
73
/home/pcfreak
5661
/home/hipo
33
/home/blog
13570
/home/log
123
/home/lost+found
1

b) Then once you know the directory allocating most inodes, run the command again to see the sub-directories with most files (eating) partition innodes:

 

for i in /home/webservice/*; do echo $i; find $i |wc -l; done

 

One usual large folder which could free you some nodes is the linux source headers, but in my case it was simply a lot of tiny old logs being logged on the system for few years in the past without cleaning:

After deleting the log dirs and cache folder in my case /home/new_website/{log,cache}:

server:~# rm -rf /home/new_website/log/*
server:~# rm -rf /home/new_website/cache/*

 

 

a) Then, stopping Apache webserver to check prevent Apache to use MySQl databases while running database repair and restaring MySQL:
 

server:~# /etc/init.d/apache2 stop Restarting MySQL server
..
server:~# /etc/init.d/mysql restart
..


b) And re-issuing MySQL Check / Repair / Optimize database commands:
 

 

mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–check –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log

mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf –analyze –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log

mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–auto-repair –optimize –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log

mysqlcheck –defaults-extra-file=/etc/mysql/debian.cnf \–optimize –all-databases -u root -p`grep -i password /root/.my.cnf |sed -e 's#password=##g'`>> /var/log/cronwork.log

c) And finally starting the Apache Webserver again:
 

server:~# /etc/init.d/apache2 start


Some innodse got freed up:
 

server:~# df -i /home Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/md2 17006592 16797196 209396 99% /home


And hooray by God's Grace and with help of prayers of The most Holy Theotokos (Virgin) Mary, websites started again !

Fix “Secure Connection Failed” – An error occured SSL received a record that exceeded the maximum permissible length howto

Monday, September 14th, 2015

secure-connection-failed-an-error-occured-during-connection-ssl-received-a-record-that-exceeds-the-maximum-permissible-length-fix-howto
When I was trying to establish a new Internal Business SSL certificate on one of the 6 months planned SPLIT projects (e.g. duplicate a range systems environment to another one), I've stumbled a very odd SSL issue. Once I've setup all the virtualhost SSL configurations properly (identical SSL configuration directives and Apache Webserver version to another host and testing in a browser I was getting the following error:
 

Secure Connection Failed

An error occurred during a connection to 10.253.39.93.

SSL received a record that exceeded the maximum permissible length.

(Error code: ssl_error_rx_record_too_long)


Below is a screenshot:

https://www.pc-freak.net/images/secure-connection-failed-an-error-occured-during-connection-ssl-received-a-record-that-exceeds-the-maximum-permissible-length.png

The page you are trying to view can not be shown because the authenticity of the received data could not be verified. Please contact the web site owners to inform them of this problem. Alternatively, use the command found in the help menu to report this broken site.

The first logical thing to do was to check the error.log but there was no any errors there that point me to anything meaningful, besides that the queries I was making to the Domain doesn't show off as requests neither in Apache access.log nor in error.log so this was puzzling.
I thought I might have messed up something during Key file / CSR generation time so I revoked old certificate and reissued it.

 

$ openssl x509 -text -in test-pegasusgas-eon.intranet.eon-vertrieb.com.crt |less ertificate: Data: Version: 3 (0x2) Serial Number:

Shows that all is fine with certificate Then when trying to test remote certificate with SSL command:

 

openssl s_client -CApath test-pegasusgas-eon.intranet.eon-vertrieb.com.crt -connect test-pegasusgas-eon.intranet.eon-vertrieb.com:443


: There was an error After plenty of research in Google I come to conclusion something is either wrong with Listen httpd.conf directive or NameVirtualHost is binded to port 80 or some other port different from 443, however surprisingly I did not used the NameVirtualHost at all in my apache config. After a lot of pondering I finally spot it. The whole certificate isseus were caused by:

< – Less than sign

which I missaw and forget to clean up from template during IP paste (obtained from /sbin/ifconfig |grep -i xx.xx.xx.xx). So finally in order to fix the SSL error I had to just delete <, e.g.:
 

<VirtualHost <10.253.39.35:443>

had to become:

 

<Virtualhost 10.253.39.35:443>

Such a minor thing took me 3 hours of pondering to resolve and thanksfully it is finally fixed! Then of course had to restart Apache to make fixed Vhost settings working:
 

# apachectl stop; sleep 2; apachectl start

So now the SSL works again, thanks God!

Resolving “nf_conntrack: table full, dropping packet.” flood message in dmesg Linux kernel log

Wednesday, March 28th, 2012

nf_conntrack_table_full_dropping_packet
On many busy servers, you might encounter in /var/log/syslog or dmesg kernel log messages like

nf_conntrack: table full, dropping packet

to appear repeatingly:

[1737157.057528] nf_conntrack: table full, dropping packet.
[1737157.160357] nf_conntrack: table full, dropping packet.
[1737157.260534] nf_conntrack: table full, dropping packet.
[1737157.361837] nf_conntrack: table full, dropping packet.
[1737157.462305] nf_conntrack: table full, dropping packet.
[1737157.564270] nf_conntrack: table full, dropping packet.
[1737157.666836] nf_conntrack: table full, dropping packet.
[1737157.767348] nf_conntrack: table full, dropping packet.
[1737157.868338] nf_conntrack: table full, dropping packet.
[1737157.969828] nf_conntrack: table full, dropping packet.
[1737157.969928] nf_conntrack: table full, dropping packet
[1737157.989828] nf_conntrack: table full, dropping packet
[1737162.214084] __ratelimit: 83 callbacks suppressed

There are two type of servers, I've encountered this message on:

1. Xen OpenVZ / VPS (Virtual Private Servers)
2. ISPs – Internet Providers with heavy traffic NAT network routers
 

I. What is the meaning of nf_conntrack: table full dropping packet error message

In short, this message is received because the nf_conntrack kernel maximum number assigned value gets reached.
The common reason for that is a heavy traffic passing by the server or very often a DoS or DDoS (Distributed Denial of Service) attack. Sometimes encountering the err is a result of a bad server planning (incorrect data about expected traffic load by a company/companeis) or simply a sys admin error…

– Checking the current maximum nf_conntrack value assigned on host:

linux:~# cat /proc/sys/net/ipv4/netfilter/ip_conntrack_max
65536

– Alternative way to check the current kernel values for nf_conntrack is through:

linux:~# /sbin/sysctl -a|grep -i nf_conntrack_max
error: permission denied on key 'net.ipv4.route.flush'
net.netfilter.nf_conntrack_max = 65536
error: permission denied on key 'net.ipv6.route.flush'
net.nf_conntrack_max = 65536

– Check the current sysctl nf_conntrack active connections

To check present connection tracking opened on a system:

:

linux:~# /sbin/sysctl net.netfilter.nf_conntrack_count
net.netfilter.nf_conntrack_count = 12742

The shown connections are assigned dynamicly on each new succesful TCP / IP NAT-ted connection. Btw, on a systems that work normally without the dmesg log being flooded with the message, the output of lsmod is:

linux:~# /sbin/lsmod | egrep 'ip_tables|conntrack'
ip_tables 9899 1 iptable_filter
x_tables 14175 1 ip_tables

On servers which are encountering nf_conntrack: table full, dropping packet error, you can see, when issuing lsmod, extra modules related to nf_conntrack are shown as loaded:

linux:~# /sbin/lsmod | egrep 'ip_tables|conntrack'
nf_conntrack_ipv4 10346 3 iptable_nat,nf_nat
nf_conntrack 60975 4 ipt_MASQUERADE,iptable_nat,nf_nat,nf_conntrack_ipv4
nf_defrag_ipv4 1073 1 nf_conntrack_ipv4
ip_tables 9899 2 iptable_nat,iptable_filter
x_tables 14175 3 ipt_MASQUERADE,iptable_nat,ip_tables

 

II. Remove completely nf_conntrack support if it is not really necessery

It is a good practice to limit or try to omit completely use of any iptables NAT rules to prevent yourself from ending with flooding your kernel log with the messages and respectively stop your system from dropping connections.

Another option is to completely remove any modules related to nf_conntrack, iptables_nat and nf_nat.
To remove nf_conntrack support from the Linux kernel, if for instance the system is not used for Network Address Translation use:

/sbin/rmmod iptable_nat
/sbin/rmmod ipt_MASQUERADE
/sbin/rmmod rmmod nf_nat
/sbin/rmmod rmmod nf_conntrack_ipv4
/sbin/rmmod nf_conntrack
/sbin/rmmod nf_defrag_ipv4

Once the modules are removed, be sure to not use iptables -t nat .. rules. Even attempt to list, if there are any NAT related rules with iptables -t nat -L -n will force the kernel to load the nf_conntrack modules again.

Btw nf_conntrack: table full, dropping packet. message is observable across all GNU / Linux distributions, so this is not some kind of local distribution bug or Linux kernel (distro) customization.
 

III. Fixing the nf_conntrack … dropping packets error

– One temporary, fix if you need to keep your iptables NAT rules is:

linux:~# sysctl -w net.netfilter.nf_conntrack_max=131072

I say temporary, because raising the nf_conntrack_max doesn't guarantee, things will get smoothly from now on.
However on many not so heavily traffic loaded servers just raising the net.netfilter.nf_conntrack_max=131072 to a high enough value will be enough to resolve the hassle.

– Increasing the size of nf_conntrack hash-table

The Hash table hashsize value, which stores lists of conntrack-entries should be increased propertionally, whenever net.netfilter.nf_conntrack_max is raised.

linux:~# echo 32768 > /sys/module/nf_conntrack/parameters/hashsize
The rule to calculate the right value to set is:
hashsize = nf_conntrack_max / 4

– To permanently store the made changes ;a) put into /etc/sysctl.conf:

linux:~# echo 'net.netfilter.nf_conntrack_count = 131072' >> /etc/sysctl.conf
linux:~# /sbin/sysct -p

b) put in /etc/rc.local (before the exit 0 line):

echo 32768 > /sys/module/nf_conntrack/parameters/hashsize

Note: Be careful with this variable, according to my experience raising it to too high value (especially on XEN patched kernels) could freeze the system.
Also raising the value to a too high number can freeze a regular Linux server running on old hardware.

– For the diagnosis of nf_conntrack stuff there is ;

/proc/sys/net/netfilter kernel memory stored directory. There you can find some values dynamically stored which gives info concerning nf_conntrack operations in "real time":

linux:~# cd /proc/sys/net/netfilter
linux:/proc/sys/net/netfilter# ls -al nf_log/

total 0
dr-xr-xr-x 0 root root 0 Mar 23 23:02 ./
dr-xr-xr-x 0 root root 0 Mar 23 23:02 ../
-rw-r--r-- 1 root root 0 Mar 23 23:02 0
-rw-r--r-- 1 root root 0 Mar 23 23:02 1
-rw-r--r-- 1 root root 0 Mar 23 23:02 10
-rw-r--r-- 1 root root 0 Mar 23 23:02 11
-rw-r--r-- 1 root root 0 Mar 23 23:02 12
-rw-r--r-- 1 root root 0 Mar 23 23:02 2
-rw-r--r-- 1 root root 0 Mar 23 23:02 3
-rw-r--r-- 1 root root 0 Mar 23 23:02 4
-rw-r--r-- 1 root root 0 Mar 23 23:02 5
-rw-r--r-- 1 root root 0 Mar 23 23:02 6
-rw-r--r-- 1 root root 0 Mar 23 23:02 7
-rw-r--r-- 1 root root 0 Mar 23 23:02 8
-rw-r--r-- 1 root root 0 Mar 23 23:02 9

 

IV. Decreasing other nf_conntrack NAT time-out values to prevent server against DoS attacks

Generally, the default value for nf_conntrack_* time-outs are (unnecessery) large.
Therefore, for large flows of traffic even if you increase nf_conntrack_max, still shorty you can get a nf_conntrack overflow table resulting in dropping server connections. To make this not happen, check and decrease the other nf_conntrack timeout connection tracking values:

linux:~# sysctl -a | grep conntrack | grep timeout
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 432000
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_events_retry_timeout = 15
net.ipv4.netfilter.ip_conntrack_generic_timeout = 600
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent2 = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_recv = 60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 432000
net.ipv4.netfilter.ip_conntrack_tcp_timeout_fin_wait = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close_wait = 60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_last_ack = 30
net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close = 10
net.ipv4.netfilter.ip_conntrack_tcp_timeout_max_retrans = 300
net.ipv4.netfilter.ip_conntrack_udp_timeout = 30
net.ipv4.netfilter.ip_conntrack_udp_timeout_stream = 180
net.ipv4.netfilter.ip_conntrack_icmp_timeout = 30

All the timeouts are in seconds. net.netfilter.nf_conntrack_generic_timeout as you see is quite high – 600 secs = (10 minutes).
This kind of value means any NAT-ted connection not responding can stay hanging for 10 minutes!

The value net.netfilter.nf_conntrack_tcp_timeout_established = 432000 is quite high too (5 days!)
If this values, are not lowered the server will be an easy target for anyone who would like to flood it with excessive connections, once this happens the server will quick reach even the raised up value for net.nf_conntrack_max and the initial connection dropping will re-occur again …

With all said, to prevent the server from malicious users, situated behind the NAT plaguing you with Denial of Service attacks:

Lower net.ipv4.netfilter.ip_conntrack_generic_timeout to 60 – 120 seconds and net.ipv4.netfilter.ip_conntrack_tcp_timeout_established to stmh. like 54000

linux:~# sysctl -w net.ipv4.netfilter.ip_conntrack_generic_timeout = 120
linux:~# sysctl -w net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 54000

This timeout should work fine on the router without creating interruptions for regular NAT users. After changing the values and monitoring for at least few days make the changes permanent by adding them to /etc/sysctl.conf

linux:~# echo 'net.ipv4.netfilter.ip_conntrack_generic_timeout = 120' >> /etc/sysctl.conf
linux:~# echo 'net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 54000' >> /etc/sysctl.conf

Maximal protection against SSH attacks. If your server has to stay with open SSH (Secure Shell) port open to the world

Thursday, April 7th, 2011

Brute Force Attack SSH screen, Script kiddie attacking
If you’re a a remote Linux many other Unix based OSes, you have defitenily faced the security threat of many failed ssh logins or as it’s better known a brute force attack

During such attacks your /var/log/messages or /var/log/auth gets filled in with various failed password logs like for example:

Feb 3 20:25:50 linux sshd[32098]: Failed password for invalid user oracle from 95.154.249.193 port 51490 ssh2
Feb 3 20:28:30 linux sshd[32135]: Failed password for invalid user oracle1 from 95.154.249.193 port 42778 ssh2
Feb 3 20:28:55 linux sshd[32141]: Failed password for invalid user test1 from 95.154.249.193 port 51072 ssh2
Feb 3 20:30:15 linux sshd[32163]: Failed password for invalid user test from 95.154.249.193 port 47481 ssh2
Feb 3 20:33:20 linux sshd[32211]: Failed password for invalid user testuser from 95.154.249.193 port 51731 ssh2
Feb 3 20:35:32 linux sshd[32249]: Failed password for invalid user user from 95.154.249.193 port 38966 ssh2
Feb 3 20:35:59 linux sshd[32256]: Failed password for invalid user user1 from 95.154.249.193 port 55850 ssh2
Feb 3 20:36:25 linux sshd[32268]: Failed password for invalid user user3 from 95.154.249.193 port 36610 ssh2
Feb 3 20:36:52 linux sshd[32274]: Failed password for invalid user user4 from 95.154.249.193 port 45514 ssh2
Feb 3 20:37:19 linux sshd[32279]: Failed password for invalid user user5 from 95.154.249.193 port 54262 ssh2
Feb 3 20:37:45 linux sshd[32285]: Failed password for invalid user user2 from 95.154.249.193 port 34755 ssh2
Feb 3 20:38:11 linux sshd[32292]: Failed password for invalid user info from 95.154.249.193 port 43146 ssh2
Feb 3 20:40:50 linux sshd[32340]: Failed password for invalid user peter from 95.154.249.193 port 46411 ssh2
Feb 3 20:43:02 linux sshd[32372]: Failed password for invalid user amanda from 95.154.249.193 port 59414 ssh2
Feb 3 20:43:28 linux sshd[32378]: Failed password for invalid user postgres from 95.154.249.193 port 39228 ssh2
Feb 3 20:43:55 linux sshd[32384]: Failed password for invalid user ftpuser from 95.154.249.193 port 47118 ssh2
Feb 3 20:44:22 linux sshd[32391]: Failed password for invalid user fax from 95.154.249.193 port 54939 ssh2
Feb 3 20:44:48 linux sshd[32397]: Failed password for invalid user cyrus from 95.154.249.193 port 34567 ssh2
Feb 3 20:45:14 linux sshd[32405]: Failed password for invalid user toto from 95.154.249.193 port 42350 ssh2
Feb 3 20:45:42 linux sshd[32410]: Failed password for invalid user sophie from 95.154.249.193 port 50063 ssh2
Feb 3 20:46:08 linux sshd[32415]: Failed password for invalid user yves from 95.154.249.193 port 59818 ssh2
Feb 3 20:46:34 linux sshd[32424]: Failed password for invalid user trac from 95.154.249.193 port 39509 ssh2
Feb 3 20:47:00 linux sshd[32432]: Failed password for invalid user webmaster from 95.154.249.193 port 47424 ssh2
Feb 3 20:47:27 linux sshd[32437]: Failed password for invalid user postfix from 95.154.249.193 port 55615 ssh2
Feb 3 20:47:54 linux sshd[32442]: Failed password for www-data from 95.154.249.193 port 35554 ssh2
Feb 3 20:48:19 linux sshd[32448]: Failed password for invalid user temp from 95.154.249.193 port 43896 ssh2
Feb 3 20:48:46 linux sshd[32453]: Failed password for invalid user service from 95.154.249.193 port 52092 ssh2
Feb 3 20:49:13 linux sshd[32458]: Failed password for invalid user tomcat from 95.154.249.193 port 60261 ssh2
Feb 3 20:49:40 linux sshd[32464]: Failed password for invalid user upload from 95.154.249.193 port 40236 ssh2
Feb 3 20:50:06 linux sshd[32469]: Failed password for invalid user debian from 95.154.249.193 port 48295 ssh2
Feb 3 20:50:32 linux sshd[32479]: Failed password for invalid user apache from 95.154.249.193 port 56437 ssh2
Feb 3 20:51:00 linux sshd[32492]: Failed password for invalid user rds from 95.154.249.193 port 45540 ssh2
Feb 3 20:51:26 linux sshd[32501]: Failed password for invalid user exploit from 95.154.249.193 port 53751 ssh2
Feb 3 20:51:51 linux sshd[32506]: Failed password for invalid user exploit from 95.154.249.193 port 33543 ssh2
Feb 3 20:52:18 linux sshd[32512]: Failed password for invalid user postgres from 95.154.249.193 port 41350 ssh2
Feb 3 21:02:04 linux sshd[32652]: Failed password for invalid user shell from 95.154.249.193 port 54454 ssh2
Feb 3 21:02:30 linux sshd[32657]: Failed password for invalid user radio from 95.154.249.193 port 35462 ssh2
Feb 3 21:02:57 linux sshd[32663]: Failed password for invalid user anonymous from 95.154.249.193 port 44290 ssh2
Feb 3 21:03:23 linux sshd[32668]: Failed password for invalid user mark from 95.154.249.193 port 53285 ssh2
Feb 3 21:03:50 linux sshd[32673]: Failed password for invalid user majordomo from 95.154.249.193 port 34082 ssh2
Feb 3 21:04:43 linux sshd[32684]: Failed password for irc from 95.154.249.193 port 50918 ssh2
Feb 3 21:05:36 linux sshd[32695]: Failed password for root from 95.154.249.193 port 38577 ssh2
Feb 3 21:06:30 linux sshd[32705]: Failed password for bin from 95.154.249.193 port 53564 ssh2
Feb 3 21:06:56 linux sshd[32714]: Failed password for invalid user dev from 95.154.249.193 port 34568 ssh2
Feb 3 21:07:23 linux sshd[32720]: Failed password for root from 95.154.249.193 port 43799 ssh2
Feb 3 21:09:10 linux sshd[32755]: Failed password for invalid user bob from 95.154.249.193 port 50026 ssh2
Feb 3 21:09:36 linux sshd[32761]: Failed password for invalid user r00t from 95.154.249.193 port 58129 ssh2
Feb 3 21:11:50 linux sshd[537]: Failed password for root from 95.154.249.193 port 58358 ssh2

This brute force dictionary attacks often succeed where there is a user with a weak a password, or some old forgotten test user account.
Just recently on one of the servers I administrate I have catched a malicious attacker originating from Romania, who was able to break with my system test account with the weak password tset .

Thanksfully the script kiddie was unable to get root access to my system, so what he did is he just started another ssh brute force scanner to crawl the net and look for some other vulnerable hosts.

As you read in my recent example being immune against SSH brute force attacks is a very essential security step, the administrator needs to take on a newly installed server.

The easiest way to get read of the brute force attacks without using some external brute force filtering software like fail2ban can be done by:

1. By using an iptables filtering rule to filter every IP which has failed in logging in more than 5 times

To use this brute force prevention method you need to use the following iptables rules:
linux-host:~# /sbin/iptables -I INPUT -p tcp --dport 22 -i eth0 -m state -state NEW -m recent -set
linux-host:~# /sbin/iptables -I INPUT -p tcp --dport 22 -i eth0 -m state -state NEW
-m recent -update -seconds 60 -hitcount 5 -j DROP

This iptables rules will filter out the SSH port to an every IP address with more than 5 invalid attempts to login to port 22

2. Getting rid of brute force attacks through use of hosts.deny blacklists

sshbl – The SSH blacklist, updated every few minutes, contains IP addresses of hosts which tried to bruteforce into any of currently 19 hosts (all running OpenBSD, FreeBSD or some Linux) using the SSH protocol. The hosts are located in Germany, the United States, United Kingdom, France, England, Ukraine, China, Australia, Czech Republic and setup to report and log those attempts to a central database. Very similar to all the spam blacklists out there.

To use sshbl you will have to set up in your root crontab the following line:

*/60 * * * * /usr/bin/wget -qO /etc/hosts.deny http://www.sshbl.org/lists/hosts.deny

To set it up from console issue:

linux-host:~# echo '*/60 * * * * /usr/bin/wget -qO /etc/hosts.deny http://www.sshbl.org/lists/hosts.deny' | crontab -u root -

These crontab will download and substitute your system default hosts with the one regularly updated on sshbl.org , thus next time a brute force attacker which has been a reported attacker will be filtered out as your Linux or Unix system finds out the IP matches an ip in /etc/hosts.deny

The /etc/hosts.deny filtering rules are written in a way that only publicly known brute forcer IPs will only be filtered for the SSH service, therefore other system services like Apache or a radio, tv streaming server will be still accessible for the brute forcer IP.

It’s a good practice actually to use both of the methods 😉
Thanks to Static (Multics) a close friend of mine for inspiring this article.