Today I had a report of a server whose Load Avarage keeps at the high level of 86, the machine runs on a bare metal rock solid hardware and even with such high Loads of the kernel it runs fine, but due to the I/O overhead the SANs red from a remote NetApp storage device started to be sluggish and hence it needed to be reviewed, thus I jumped in via the hop station (jump host) into the server.
1. Short investation on root cause for high server load
After a short investigation, I've found an rsync job set by someone on a cron job to be routinely run every 30 minutes, thus the old scheduled rsync, which seemed to run multiple times on the server (about 50 processes) of same rsync (file system synchronization was running) and as expected the storage was saddled with mutiple Input / Output requests.
The root cron job was like that:
server:~# crontab -u root -l |grep -i rsync
/usr/bin/rsync -ax /var/www/htdocs/directory_to_synchronize / /srv/www/synch_back/directory_to_synchrnize
A process list showed the following high number of running mirrored rsyncs:
server:~# ps axuwwf | grep -i rsync | wc -l
2. The Fix – Set Rsync to only via cron only in case if it is not already running in background
In order to fix it, I had to kill all current running rsync (here luckily only same single instance of rsync was running, but generally I was cautious to check no other rsync jobs are running – otherwise I would have mistakenly killed some other rsync job ongoing …)
Then I set the following new cron job one liner quick shell script that does the job to assign a pid file that is created before rsync and deleted after rsync completion.
if [ ! -e /tmp/repo_dba_sync.lock ]; then touch /tmp/repo_dba_sync.lock; /usr/bin/rsync -ax /var/www/htdocs/directory_to_synchronize / /srv/www/synch_back/directory_to_synchrnize ; trap 'rm -f /tmp/repo_dba_sync.lock; fi' EXIT >/dev/null 2>&1
The cron job looked like so:
*/30 * * * * if [ ! -e /tmp/repo_dba_sync.lock ]; then touch /tmp/repo_dba_sync.lock; /usr/bin/rsync -ax /var/www/htdocs/directory_to_synchronize / /srv/www/synch_back/directory_to_synchrnize ; trap 'rm -f /tmp/repo_dba_sync.lock; fi' EXIT >/dev/null 2>&1
Just in case if you're wondering
a trap should be used to verify that the lock file is removed when the script is exited for any reason.
This way the lock file will be removed even if the script exits before the end of the script.
An alternative and more simple ways to do it is via:
pgrep rsync > /dev/null || rsync -ax /var/www/htdocs/directory_to_synchronize / /srv/www/synch_back/directory_to_synchrnize
Or if you don't want to use bash's:
if ; then; fi
condition but still use a file lock the flock command can be used like so:
flock -n lock_file -c "rsync …"