Posts Tagged ‘debugging’

Why du and df reporting different on a filesystem / How to fix inconsistency between used space on FS and disk showing full strangeness

Wednesday, July 24th, 2019

linux-why-du-and-df-shows-different-result-inconsincy-explained-filesystem-full-oddity

If you're a sysadmin on a large server environment such as a couple of hundred of Virtual Machines running Linux OS on either physical host or OpenXen / VmWare hosted guest Virtual Machine, you might end up sometimes at an odd case where some mounted partition mount point reports its file use different when checked with
df
cmd than when checked with du command, like for example:
 

root@sqlserver:~# df -hT /var/lib/mysql
Filesystem   Type  Size Used Avail Use% Mounted On
/dev/sdb5      ext4    19G  3,4G    14G  20% /var/lib/mysql

Here the '-T' argument is used to show us the filesystem.

root@sqlserver:~# du -hsc /var/lib/mysql
0K    /var/lib/mysql/
0K    total

 

1. Simple debug on what might be the root cause for df / du inconsistency reporting

 

Of course the basic thing to do when in that weird situation is to be totally shocked how this is possible and to investigate a bit what is the biggest first level sub-directories that eat up the space on the mounted location, with du:

 

# du -hkx –max-depth=1 /var/lib/mysql/|uniq|sort -n
4       /var/lib/mysql/test
8       /var/lib/mysql/ezmlm
8       /var/lib/mysql/micropcfreak
8       /var/lib/mysql/performance_schema
12      /var/lib/mysql/mysqltmp
24      /var/lib/mysql/speedtest
64      /var/lib/mysql/yourls
144     /var/lib/mysql/narf
320     /var/lib/mysql/webchat_plus
424     /var/lib/mysql/goodfaithair
528     /var/lib/mysql/moonman
648     /var/lib/mysql/daniel
852     /var/lib/mysql/lessn
1292    /var/lib/mysql/gallery

The given output is in Kilobytes so it is a little bit hard to read, if you're used to Mbytes instead, do

 

 # du -hmx –max-depth=1 /var/lib/mysql/|uniq|sort -n|less

 

I've also investigated on the complete /var directory contents sorted by size with:

 

 # du -akx ./ | sort -n
5152564    ./cache/rsnapshot/hourly.2/localhost
5255788    ./cache/rsnapshot/hourly.2
5287912    ./cache/rsnapshot
7192152    ./cache


Even after finding out the bottleneck dirs and trying to clear up a bit, continued facing that inconsistently shown in two commands and if you're likely to be stunned like me and try … to move some files to a different filesystem to free up space or assigned inodes with a hope that shown inconsitency output will be fixed as it might be caused  due to some kernel / FS caching ?? and this will eventually make the mounted FS to refresh …

But unfortunately, if you try it you'll figure out clearing up a couple of Megas or Gigas will make no difference in cmd output.

In my exact case /var/lib/mysql is a separate mounted ext4 filesystem, however same issue was present also on a Network Filesystem (NFS) and thus, my first thought that this is caused by a network failure problem or NFS bug turned to be wrong.

After further short investigation on the inodes on the Filesystem, it was clear enough inodes are available:
 

# df -i /var/lib/mysql
Filesystem       Inodes  IUsed   IFree IUse% Mounted on
/dev/sdb5      1221600  2562 1219038   1% /var/lib/mysql

 

So the filled inodes count assumed issue also has been rejected.
P.S. (if you're not well familiar with them read manual, i.e. – man 7 inode).
 

– Remounting the mounted filesystem

To make sure the filesystem shown inconsistency between du and df is not due to some hanging network mount or bug, first logical thing I did is to remount the filesytem showing different in size, in my case this was done with:
 

# mount -o remount,rw -t ext4 /var/lib/mysql

For machines with NFS remote mounted storage locations, used:

# mount -o remount,rw -t nfs /var/www


FS remount did not solved it so I continued to ponder what oddity and of course I thought of a workaround (in case if this issues are caused by kernel bug or OS lib issue) reboot might be the solution, however unfortunately restarting the VMs was not a wanted easy to do solution, thus I continued investigating what is wrong …

Next check of course was to check, what kind of network connections are opened to the affected hosts with:
 

# netstat -tupanl


Did not found anything that might point me to the reported different Megabytes issue, so next step was to check what is the situation with currently opened files by running processes on the weird df / du reported systems with lsof, and boom there I observed oddity such as multiple files

 

# lsof -nP | grep '(deleted)'

COMMAND   PID   USER   FD   TYPE DEVICE    SIZE NLINK  NODE NAME
mysqld   2588  mysql    4u   REG 253,17      52     0  1495 /var/lib/mysql/tmp/ibY0cXCd (deleted)
mysqld   2588  mysql    5u   REG 253,17    1048     0  1496 /var/lib/mysql/tmp/ibOrELhG (deleted)
mysqld   2588  mysql    6u   REG 253,17       777884290     0  1497 /var/lib/mysql/tmp/ibmDFAW8 (deleted)
mysqld   2588  mysql    7u   REG 253,17       123667875     0 11387 /var/lib/mysql/tmp/ib2CSACB (deleted)
mysqld   2588  mysql   11u   REG 253,17       123852406     0 11388 /var/lib/mysql/tmp/ibQpoZ94 (deleted)

 

Notice that There were plenty of '(deleted)' STATE files shown in memory an overall of 438:

 

# lsof -nP | grep '(deleted)' |wc -l
438


As I've learned a bit online about the problem, I found it is also possible to find deleted unlinked files only without any greps (to list all deleted files in memory files with lsof args only):

 

# lsof +L1|less


The SIZE field (fourth column)  shows a number of files that are really hard in size and that are kept in open on filesystem and in memory, totally messing up with the filesystem. In my case this is temp files created by MYSQLD daemon but depending on the server provided service this might be apache's www-data, some custom perl / bash script executed via a cron job, stalled rsync jobs etc.
 

2. Check all the list open files with the mysql / root user as part of the the server filesystem inconsistency debugging with:

 

– Grep opened files on server by user

# lsof |grep mysql
mysqld    1312                       mysql  cwd       DIR               8,21       4096          2 /var/lib/mysql
mysqld    1312                       mysql  rtd       DIR                8,1       4096          2 /
mysqld    1312                       mysql  txt       REG                8,1   20336792   23805048 /usr/sbin/mysqld
mysqld    1312                       mysql  mem       REG               8,21      24576         20 /var/lib/mysql/tc.log
mysqld    1312                       mysql  DEL       REG               0,16                 29467 /[aio]
mysqld    1312                       mysql  mem       REG                8,1      55792   14886933 /lib/x86_64-linux-gnu/libnss_files-2.28.so

 

# lsof | grep root
COMMAND    PID   TID TASKCMD          USER   FD      TYPE             DEVICE   SIZE/OFF       NODE NAME
systemd      1                        root  cwd       DIR                8,1       4096          2 /
systemd      1                        root  rtd       DIR                8,1       4096          2 /
systemd      1                        root  txt       REG                8,1    1489208   14928891 /lib/systemd/systemd
systemd      1                        root  mem       REG                8,1    1579448   14886924 /lib/x86_64-linux-gnu/libm-2.28.so

Other command that helped to track the discrepancy between df and du different file usage on FS is:
 

# du -hxa  / | egrep '^[[:digit:]]{1,1}G[[:space:]]*'
 

 

3. Fixing large files kept in memory filesystem problem


What is the real reason for ending up with this file handlers opened by running backgrounded programs on the Linux OS?
It could be multiple  but most likely it is due to exceeded server / client interactions or breaking up RAM or HDD drive with writing plenty of logs on the FS without ending keeping space occupied or Programming library bugs used by hanged service leaving the FH opened on storage.

What is the solution to file system files left in memory problem?

The best solution is to first fix custom script or hanged service and then if possible to simply restart the server to make the kernel / services reload or if this is not possible just restart the problem creation processes.

Once the process is identified like in my case this was MySQL on systemd enabled newer OS distros, just do:

 

 

# systemctl restart mysqld.service


or on older init.d system V ones:

# /etc/init.d/service restart


For custom hanged scripts being listed in ps axuwef you can grep the pid and do a kill -HUP (if the script is written in a good way to recognize -HUP and restart the sub-running process properly – BE EXTRA CAREFUL IF YOU'RE RESTARTING BROKEN SCRIPTS as this might cause your running service disruptions …).

# pgrep -l script.sh
7977 script.sh


# kill -HUP PID

 

Now finally this should either mitigate or at best case completely solve the reported disagreement between df and du, after which the calculated / reported disk space should be back to normal and show up approximately the same (note that size changes a bit as mysql service is writting data) constantly extending the size between the two checks.

 

# df -hk /var/lib/mysql; du -hskc /var/lib/mysql
Filesystem       Inodes  IUsed   IFree IUse% Mounted on
/dev/sdb5        19097172 3472744 14631296  20% /var/lib/mysql
3427772    /var/lib/mysql
3427772    total

 

What we learned?

What I've explained in this article is why and how it comes that 'zoombie' files reside on a filesystem
appearing to be eating disk space on a mounted local or network partition, giving strange inconsistent
reports, leading to system service disruptions and impossibility to have correctly shown information on used
disk space on mounted drive.

I went through with some standard logic on debugging service / filesystem / inode issues up explainat, that led me to the finding about deleted files being kept in filesystem and producing the filesystem strange sized / showing not correct / filled even after it was extended with tune2fs and was supposed to have extra 50GBs.

Finally it was explained shortly how to HUP / restart hanging script / service to fix it.

Some few good readings that helped to fix the issue:

What to do when du and df report different usage is here
df in linux not showing correct free space after file removal is here
Why do “df” and “du” commands show different disk usage?
 

Fiddler – Windows web debugging proxy for any browser – Linux web debugging applications

Thursday, May 29th, 2014

fiddler-web-proxy-debugging-http-https-traffic-in-windows-browser
Earlier I've blogged about helpful web developer or a web hosting system administrator Web Browser plugins . Among the list of useful plugins for debugging sent / received web content on your desktop (HTTPWatchm, HTTPFox, Yslow etc.), I've found another one called Fiddler.

Telerik's Fiddler is a Browser plugin  and a Windows Desktop application to monitor HTTP and HTTPS outbound web traffic and report and provide you with various information useful for:

fiddler-web-debugger-for-browser-and-desktop-for-windows-keep-trac-and-optimize-web-traffic-to-web-servers

  • Performance Testing
  • HTTP / HTTPS
  • Traffic recording
  • Security Testing
  • Web Session Manipulation
  • Encode Decode web traffic
  • Convert strings from / to Base64, Hex, DeflatedSAML etc.
  • Log all URL requests originating from all opened browsers on your Desktop
  • Decrypt / encrypt HTTPS traffic using man in the middle techniques
  • Show tuning details for accessed web pages
     

Fiddler is available to install and use as a desktop application (requires .NET 2) or install as a browser plugin. Perhaps the coolest  Fiddler feature from my perspective is its decrypt / encrypt in Base64 and Hex available from TextWizard menu. The tool is relatively easy to use for those who have experience in web debugging, for novice here is a video explaining tool's basics.

Fiddler doesn't have a Linux build yet but it is possible to run it also on Linux using Mono Framework and a few hacks.

charles-proxy-web-debugging-tool-for-linux-fiddler-alternative
A good native Linux / UNIX alternatives to Fiddler are Nettool, Charles Proxy, Paros Proxy and Web Scarab.

Tiny PHP script to dump your browser set HTTP headers (useful in debugging)

Friday, March 30th, 2012

While browsing I stumbled upon a nice blog article

Dumping HTTP headers

The arcitle, points at few ways to DUMP the HTTP headers obtained from user browser.
As I'm not proficient with Ruby, Java and AOL Server what catched my attention is a tiny php for loop, which loops through all the HTTP_* browser set variables and prints them out. Here is the PHP script code:

<?php<br />
foreach($_SERVER as $h=>$v)<br />
if(ereg('HTTP_(.+)',$h,$hp))<br />
echo "<li>$h = $v</li>\n";<br />
header('Content-type: text/html');<br />
?>

The script is pretty easy to use, just place it in a directory on a WebServer capable of executing php and save it under a name like:
show_HTTP_headers.php

If you don't want to bother copy pasting above code, you can also download the dump_HTTP_headers.php script here , rename the dump_HTTP_headers.php.txt to dump_HTTP_headers.php and you're ready to go.

Follow to the respective url to exec the script. I've installed the script on my webserver, so if you are curious of the output the script will be returning check your own browser HTTP set values by clicking here.
PHP will produce output like the one in the screenshot you see below, the shot is taken from my Opera browser:

Screenshot show HTTP headers.php script Opera Debian Linux

Another sample of the text output the script produce whilst invoked in my Epiphany GNOME browser is:

HTTP_HOST = www.pc-freak.net
HTTP_USER_AGENT = Mozilla/5.0 (X11; U; Linux x86_64; en-us) AppleWebKit/531.2+ (KHTML, like Gecko) Version/5.0 Safari/531.2+ Debian/squeeze (2.30.6-1) Epiphany/2.30.6
HTTP_ACCEPT = application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
HTTP_ACCEPT_ENCODING = gzip
HTTP_ACCEPT_LANGUAGE = en-us, en;q=0.90
HTTP_COOKIE = __qca=P0-2141911651-1294433424320;
__utma_a2a=8614995036.1305562814.1274005888.1319809825.1320152237.2021;wooMeta=MzMxJjMyOCY1NTcmODU1MDMmMTMwODQyNDA1MDUyNCYxMzI4MjcwNjk0ODc0JiYxMDAmJjImJiYm; 3ec0a0ded7adebfeauth=22770a75911b9fb92360ec8b9cf586c9;
__unam=56cea60-12ed86f16c4-3ee02a99-3019;
__utma=238407297.1677217909.1260789806.1333014220.1333023753.1606;
__utmb=238407297.1.10.1333023754; __utmc=238407297;
__utmz=238407297.1332444980.1586.413.utmcsr=pc-freak.net|utmccn=(referral)|utmcmd=referral|utmcct=/blog/

You see the script returns, plenty of useful information for debugging purposes:
HTTP_HOST – Virtual Host Webserver name
HTTP_USER_AGENT – The browser exact type useragent returnedHTTP_ACCEPT – the type of MIME applications accepted by the WebServerHTTP_ACCEPT_LANGUAGE – The language types the browser has support for
HTTP_ACCEPT_ENCODING – This PHP variable is usually set to gzip or deflate by the browser if the browser has support for webserver returned content gzipping.
If HTTP_ACCEPT_ENCODING is there, then this means remote webserver is configured to return its HTML and static files in gzipped form.
HTTP_COOKIE – Information about browser cookies, this info can be used for XSS attacks etc. 🙂
HTTP_COOKIE also contains the referrar which in the above case is:
__utmz=238407297.1332444980.1586.413.utmcsr=pc-freak.net|utmccn=(referral)
The Cookie information HTTP var also contains information of the exact link referrar:
|utmcmd=referral|utmcct=/blog/

For the sake of comparison show_HTTP_headers.php script output from elinks text browser is like so:

* HTTP_HOST = www.pc-freak.net
* HTTP_USER_AGENT = Links (2.3pre1; Linux 2.6.32-5-amd64 x86_64; 143x42)
* HTTP_ACCEPT = */*
* HTTP_ACCEPT_ENCODING = gzip,deflate * HTTP_ACCEPT_CHARSET = us-ascii, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16, windows-1250, windows-1251, windows-1252, windows-1256,
windows-1257, cp437, cp737, cp850, cp852, cp866, x-cp866-u, x-mac, x-mac-ce, x-kam-cs, koi8-r, koi8-u, koi8-ru, TCVN-5712, VISCII,utf-8 * HTTP_ACCEPT_LANGUAGE = en,*;q=0.1
* HTTP_CONNECTION = keep-alive
One good reason, why it is good to give this script a run is cause it can help you reveal problems with HTTP headers impoperly set cookies, language encoding problems, security holes etc. Also the script is a good example, for starters in learning PHP programming.