Posts Tagged ‘Shell’

How to delete million of files on busy Linux servers (Work out Argument list too long)

Tuesday, March 20th, 2012

How to Delete million or many thousands of files in the same directory on GNU / Linux and FreeBSD

If you try to delete more than 131072 of files on Linux with rm -f *, where the files are all stored in the same directory, you will get an error:

/bin/rm: Argument list too long.

I've earlier blogged on deleting multiple files on Linux and FreeBSD and this is not my first time facing this error.
Anyways, as time passed, I've found few other new ways to delete large multitudes of files from a server.

In this article, I will explain shortly few approaches to delete few million of obsolete files to clean some space on your server.
Here are 3 methods to use to clean your tons of junk files.

1. Using Linux find command to wipe out millions of files

a.) Finding and deleting files using find's -exec switch:

# find . -type f -exec rm -fv {} \;

This method works fine but it has 1 downside, file deletion is too slow as for each found file external rm command is invoked.

For half a million of files or more, using this method will take "long". However from a server hard disk stressing point of view it is not so bad as, the files deletion is not putting too much strain on the server hard disk.
b.) Finding and deleting big number of files with find's -delete argument:

Luckily, there is a better way to delete the files, by using find's command embedded -delete argument:

# find . -type f -print -delete

c.) Deleting and printing out deleted files with find's -print arg

If you would like to output on your terminal, what files find is deleting in "real time" add -print:

# find . -type f -print -delete

To prevent your server hard disk from being stressed and hence save your self from server normal operation "outages", it is good to combine find command with ionice, e.g.:

# ionice -c 3 find . -type f -print -delete

Just note, that ionice cannot guarantee find's opeartions will not affect severely hard disk i/o requests. On  heavily busy servers with high amounts of disk i/o writes still applying the ionice will not prevent the server from being hanged! Be sure to always keep an eye on the server, while deleting the files nomatter with or without ionice. if throughout find execution, the server gets lagged in serving its ordinary client requests or whatever, stop the execution of the cmd immediately by killing it from another ssh session or tty (if physically on the server).

2. Using a simple bash loop with rm command to delete "tons" of files

An alternative way is to use a bash loop, to print each of the files in the directory and issue /bin/rm on each of the loop elements (files) like so:

for i in *; do
rm -f $i;
done

If you'd like to print what you will be deleting add an echo to the loop:

# for i in $(echo *); do \
echo "Deleting : $i"; rm -f $i; \

The bash loop, worked like a charm in my case so I really warmly recommend this method, whenever you need to delete more than 500 000+ files in a directory.

3. Deleting multiple files with perl

Deleting multiple files with perl is not a bad idea at all.
Here is a perl one liner, to delete all files contained within a directory:

# perl -e 'for(<*>){((stat)[9]<(unlink))}'

If you prefer to use more human readable perl script to delete a multitide of files use delete_multple_files_in_dir_perl.pl

Using perl interpreter to delete thousand of files is quick, really, really quick.
I did not benchmark it on the server, how quick exactly is it, but I guess the delete rate should be similar to find command. Its possible even in some cases the perl loop is  quicker …

4. Using PHP script to delete a multiple files

Using a short php script to delete files file by file in a loop similar to above bash script is another option.
To do deletion  with PHP, use this little PHP script:

<?php
$dir = "/path/to/dir/with/files";
$dh = opendir( $dir);
$i = 0;
while (($file = readdir($dh)) !== false) {
$file = "$dir/$file";
if (is_file( $file)) {
unlink( $file);
if (!(++$i % 1000)) {
echo "$i files removed\n";
}
}
}
?>

As you see the script reads the $dir defined directory and loops through it, opening file by file and doing a delete for each of its loop elements.
You should already know PHP is slow, so this method is only useful if you have to delete many thousands of files on a shared hosting server with no (ssh) shell access.

This php script is taken from Steve Kamerman's blog . I would like also to express my big gratitude to Steve for writting such a wonderful post. His post actually become  inspiration for this article to become reality.

You can also download the php delete million of files script sample here

To use it rename delete_millioon_of_files_in_a_dir.php.txt to delete_millioon_of_files_in_a_dir.php and run it through a browser .

Note that you might need to run it multiple times, cause many shared hosting servers are configured to exit a php script which keeps running for too long.
Alternatively the script can be run through shell with PHP cli:

php -l delete_millioon_of_files_in_a_dir.php.txt.

5. So What is the "best" way to delete million of files on Linux?

In order to find out which method is quicker in terms of execution time I did a home brew benchmarking on my thinkpad notebook.

a) Creating 509072 of sample files.

Again, I used bash loop to create many thousands of files in order to benchmark.
I didn't wanted to put this load on a productive server and hence I used my own notebook to conduct the benchmarks. As my notebook is not a server the benchmarks might be partially incorrect, however I believe still .they're pretty good indicator on which deletion method would be better.

hipo@noah:~$ mkdir /tmp/test
hipo@noah:~$ cd /tmp/test;
hiponoah:/tmp/test$ for i in $(seq 1 509072); do echo aaaa >> $i.txt; done

I had to wait few minutes until I have at hand 509072  of files created. Each of the files as you can read is containing the sample "aaaa" string.

b) Calculating the number of files in the directory

Once the command was completed to make sure all the 509072 were existing, I used a find + wc cmd to calculate the directory contained number of files:

hipo@noah:/tmp/test$ find . -maxdepth 1 -type f |wc -l
509072

real 0m1.886s
user 0m0.440s
sys 0m1.332s

Its intesrsting, using an ls command to calculate the files is less efficient than using find:

hipo@noah:/tmp/test$ time ls -1 |wc -l
509072

real 0m3.355s
user 0m2.696s
sys 0m0.528s

c) benchmarking the different file deleting methods with time

– Testing delete speed of find

hipo@noah:/tmp/test$ time find . -maxdepth 1 -type f -delete
real 15m40.853s
user 0m0.908s
sys 0m22.357s

You see, using find to delete the files is not either too slow nor light quick.

– How fast is perl loop in multitude file deletion ?

hipo@noah:/tmp/test$ time perl -e 'for(<*>){((stat)[9]<(unlink))}'real 6m24.669suser 0m2.980ssys 0m22.673s

Deleting my sample 509072 took 6 mins and 24 secs. This is about 3 times faster than find! GO-GO perl 🙂
As you can see from the results, perl is a great and time saving, way to delete 500 000 files.

– The approximate speed deletion rate of of for + rm bash loop

hipo@noah:/tmp/test$ time for i in *; do rm -f $i; done

real 206m15.081s
user 2m38.954s
sys 195m38.182s

You see the execution took 195m en 38 secs = 3 HOURS and 43 MINUTES!!!! This is extremely slow ! But works like a charm as the running of deletion didn't impacted my normal laptop browsing. While the script was running I was mostly browsing through few not so heavy (non flash) websites and doing some other stuff in gnome-terminal) 🙂

As you can imagine running a bash loop is a bit CPU intensive, but puts less stress on the hard disk read/write operations. Therefore its clear using it is always a good practice when deletion of many files on a dedi servers is required.

b) my production server file deleting experience

On a production server I only tested two of all the listed methods to delete my files. The production server, where I tested is running Debian GNU / Linux Squeeze 6.0.3. There I had a task to delete few million of files.
The tested methods tried on the server were:

– The find . type -f -delete method.

– for i in *; do rm -f $i; done

The results from using find -delete method was quite sad, as the server almost hanged under the heavy hard disk load the command produced.

With the for script all went smoothly. The files were deleted for a long long time (like few hours), but while it was running, the server continued with no interruptions..

While the bash loop was running, the server load avarage kept at steady 4
Taking my experience in mind, If you're running a production, server and you're still wondering which delete method to use to wipe some multitude of files, I would recommend you go  the bash for loop + /bin/rm way. Yes, it is extremely slow, expect it run for some half an hour or so but puts not too much extra load on the server..

Using the PHP script will probably be slow and inefficient, if compared to both find and the a bash loop.. I didn't give it a try yet, but suppose it will be either equal in time or at least few times slower than bash.

If you have tried the php script and you have some observations, please drop some comment to tell me how it performs.

To sum it up;

Even though there are "hacks" to clean up some messy parsing directory full of few million of junk files, having such a directory should never exist on the first place.

Frankly, keeping millions of files within the same directory is very stupid idea.
Doing so will have a severe negative impact on a directory listing performance of your filesystem in the long term.

If you know better (more efficient) ways to delete a multitude of files in a dir please share in comments.

Check and Restart Apache if it is malfunctioning (not returning HTML content) shell script

Monday, March 19th, 2012

Check and Restart Apache Webserver on Malfunction, Apache feather logo

One of the company Debian Lenny 5.0 Webservers, where I'm working as sys admin sometimes stops to properly server HTTP requests.
Whenever this oddity happens, the Apache server seems to be running okay but it is not failing to return requested content

I can see the webserver listens on port 80 and establishing connections to remote hosts – the apache processes show normally as I can see in netstat …:

apache:~# netstat -enp 80
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 xxx.xxx.xxx.xx:80 46.253.9.36:5665 SYN_RECV 0 0 -
tcp 0 0 xxx.xxx.xxx.xx:80 78.157.26.24:5933 SYN_RECV 0 0 -
...

Also the apache forked child processes show normally in process list:

apache:~# ps axuwwf|grep -i apache
root 46748 0.0 0.0 112300 872 pts/1 S+ 18:07 0:00 \_ grep -i apache
root 42530 0.0 0.1 217392 6636 ? Ss Mar14 0:39 /usr/sbin/apache2 -k start
www-data 42535 0.0 0.0 147876 1488 ? S Mar14 0:01 \_ /usr/sbin/apache2 -k start
root 28747 0.0 0.1 218180 4792 ? Sl Mar14 0:00 \_ /usr/sbin/apache2 -k start
www-data 31787 0.0 0.1 219156 5832 ? S Mar14 0:00 | \_ /usr/sbin/apache2 -k start

In spite of that, in any client browser to any of the Apache (Virtual hosts) websites, there is no HTML content returned…
This weird problem continues until the Apache webserver is retarted.
Once webserver is restarted everything is back to normal.
I use Apache Check Apache shell script set on few remote hosts to regularly check with nmap if port 80 (www) of my server is open and responding, anyways this script just checks if the open and reachable and thus using it was unable to detect Apache wasn't able to return back HTML content.
To work around the malfunctions I wrote tiny script – retart_apache_if_empty_content_is_returned.sh

The scripts idea is very simple;
A request is made a remote defined host with lynx text browser, then the output of lines is counted, if the output returned by lynx -dump http://someurl.com is less than the number returned whether normally invoked, then the script triggers an apache init script restart.

I've set the script to periodically run in a cron job, every 5 minutes each hour.
# check if apache returns empty content with lynx and if yes restart and log it
*/5 * * * * /usr/sbin/restart_apache_if_empty_content.sh >/dev/null 2>&1

This is not perfect as sometimes still, there will be few minutes downtime, but at least the downside will not be few hours until I am informed ssh to the server and restart Apache manually

A quick way to download and set from cron execution my script every 5 minutes use:

apache:~# cd /usr/sbin
apache:/usr/sbin# wget -q https://www.pc-freak.net/bscscr/restart_apache_if_empty_content.sh
apache:/usr/sbin# chmod +x restart_apache_if_empty_content.sh
apache:/usr/sbin# crontab -l > /tmp/file; echo '*/5 * * * *' /usr/sbin/restart_apache_if_empty_content.sh 2>&1 >/dev/null

 

How to add multiple email accounts in qmail’s vpopmail with vpasswd via ssh (console) / Little shell script to add multiple email addresses

Sunday, June 12th, 2011

I’ve been assigned the task to add on one of the qmail powered servers I administrate about 50 email addresses via command line.

Each email addresses was required to be configured to have the same mail password.
Adding the email addresses via an interface would be a killing time consuming task and will probably require at least 1 hour of time to add the emails with qmailwebmin, qadmin, qubit or the other vpopmail qmail web administration interfaces available nowdays.

To solve the task, I’ve used a line oner bash shell script which reads all my 80 emails from a file and adds them with vpopmail’s command line tool vpasswd on the mail server.

Here is the one liner shell script I’ve written to solve the task:

debian:~# while read line; do vadduser $line Email_Pass_Phrase; done < email_list_file.txt

In above’s code I’ve used the email_list_file.txt file is a text file on the server and contains list of all my 50 email addresses, where each line in the file contains one email. The Email_Pass_Phrase is actually the password I’ve set for all the new email addresses being created with vpasswd

That’s all now the 50 email addresses on the server are created and I’ve saved at least one hour of boring repeating actions in the browser 😉

How to find and kill Abusers on OpenVZ Linux hosted Virtual Machines (Few bash scripts to protect OpenVZ CentOS server from script kiddies and easify the daily admin job)

Friday, July 22nd, 2011

OpenVZ Logo - Anti Denial Of Service shell scripts

These days, I’m managing a number of OpenVZ Virtual Machine host servers. Therefore constantly I’m facing a lot of problems with users who run shit scripts inside their Linux Virtual Machines.

Commonly user Virtual Servers are used as a launchpad to attack hosts do illegal hacking activities or simply DDoS a host..
The virtual machines users (which by the way run on top of the CentOS OpenVZ Linux) are used to launch a Denial service scripts like kaiten.pl, trinoo, shaft, tfn etc.

As a consequence of their malicious activities, oftenly the Data Centers which colocates the servers are either null routing our server IPs until we suspend the Abusive users, or the servers go simply down because of a server overload or a kernel bug hit as a result of the heavy TCP/IP network traffic or CPU/mem overhead.

Therefore to mitigate this abusive attacks, I’ve written few bash shell scripts which, saves us a lot of manual check ups and prevents in most cases abusers to run the common DoS and “hacking” script shits which are now in the wild.

The first script I’ve written is kill_abusers.sh , what the script does is to automatically look up for a number of listed processes and kills them while logging in /var/log/abusers.log about the abusive VM user procs names killed.

I’ve set this script to run 4 times an hour and it currently saves us a lot of nerves and useless ticket communication with Data Centers (DCs), not to mention that reboot requests (about hanged up servers) has reduced significantly.
Therefore though the scripts simplicity it in general makes the servers run a way more stable than before.

Here is OpenVZ kill/suspend Abusers procs script kill_abusers.sh ready for download

Another script which later on, I’ve written is doing something similar and still different, it does scan the server hard disk using locate and find commands and tries to identify users which has script kiddies programs in their Virtual machines and therefore are most probably crackers.
The scripts looks up for abusive network scanners, DoS scripts, metasploit framework, ircds etc.

After it registers through scanning the server hdd, it lists only files which are preliminary set in the script to be dangerous, and therefore there execution inside the user VM should not be.

search_for_abusers.sh then logs in a files it’s activity as well as the OpenVZ virtual machines user IDs who owns hack related files. Right after it uses nail mailing command to send email to a specified admin email and reports the possible abusers whose VM accounts might need to either be deleted or suspended.

search_for_abusers can be download here

Honestly I truly liked my search_for_abusers.sh script as it became quite nice and I coded it quite quickly.
I’m intending now to put the Search for abusers script on a cronjob on the servers to check periodically and report the IDs of OpenVZ VM Users which are trying illegal activities on the servers.

I guess now our beloved Virtual Machine user script kiddies are in a real trouble ;P
 

How to suspend Cpanel user through ssh / Cpanel shell command to suspend users

Friday, August 5th, 2011

One of the servers running Cpanel has been suspended today and the Data Center decided to completely bring down our server and gave us access to it only through rescue mode running linux livecd.

Thus I had no way to access the Cpanel web interface to suspend the “hacker” who by the way was running a number of instances of this old Romanian script kiddies brute force ssh scanner called sshscan .

Thanksfully Cpanel is equipped with a number of handy scripts for emergency situations in /scripts directory. These shell management scripts are awesome for situations like this one, where no web access is not avaiable.

To suspend the abuser / (abusive user ) I had to issue the command:

root@rescue [/]# /scripts/suspendacct abuse_user
Changing Shell to /bin/false...chsh: Unknown user context is not authorized to change the shell of abuse_user
Done
Locking Password...Locking password for user abuse_user.
passwd: Success
Done
Suspending mysql users
warn [suspendmysqlusers] abuse_user has no databases.
Notification => reports@santrex.net via EMAIL [level => 3]
Account previously suspended (password was locked).
/bin/df: `/proc/sys/fs/binfmt_misc': No such file or directory
Using Universal Quota Support (quota=0)
Suspended document root /home/abuse_user/public_html
Suspended document root /home/abuse_user/public_html/updateverificationonline.com
Using Universal Quota Support (quota=0)
Updating ftp passwords for abuse_user
Ftp password files updated.
Ftp vhost passwords synced
abuse_user's account has been suspended

That’s all now the user is suspended, so hopefully the DC will bring the server online in few minutes.

Poderosa a tabbed Terminal Emulator (PuTTY Windows Alternative)

Tuesday, December 6th, 2011

Even though, I rarely use Windows to connect to remote servers using SSH or Telnet protocols in some cases I’m forced to do that (in cases I’m away from my Linux notebook). I’m doing my best to keep away from logging anywhere via SSH using Windows as when using Windows you never know what kind of spyware, malware or Viruses is already on the system, not to mention Microsoft are sniffing a lot if not everything which is typed on the keyboard… Anyways, usually I use Putty as a quick way to access a remote SSH, however pitily PuTTY lacks an embedded functionality for Tabs and each new connection to a server I had to run a new instance of PuTTY. This is okay if you need to access a single server but in some cases where access to multple servers is necessery lacking the tab functionality and starting 10 times putty is really irritating and one forgets what kind of connection is present on which PuTTY instance.

Earlier on, I’ve blogged about the existence of PuTTY Connection Manager PuTTY add-on program which is a PuTTY wrapper which enables PuTTY to be used with Connection Tabs feature, however installing two programs is quite inconvenient, especially if you have to do this every few days (in case if travelling a lot).

Luckily there is another terminal emulator free program for Windows called PodeRoSA which natively supports a tabbed Secure Shell connections.
If you want to get some experience with it check out Poderosa’s website , here is also a screenshot of the program running few ssh encrypted connections in tabs on a Windows host.

Poderosa Windows ssh / telnet tabs terminal emulator screenshot
Another good reason that one might consider using Poderosa instead of PuTTY is the Apache License under which Poderosa is developed. Currently the Apache License is compatible with GPL free software license which makes the program fully free software. The PuTTY license is under BSD and MIT and some other weird custom license not 100% compatible with GPL and hence PuTTY can be considered less free software in terms of freedom.

How to add repository manually from command line in Ubuntu Linux

Sunday, January 8th, 2012

I'm on a way trying to install Free Mega Games Pack and I'm facing troubles in following the instructions to add a the latest development wine version described on http://www.winehq.org/download/ubuntu
The guys from WineHQ has to update the wine install instructions, since the instructions are targetting older versions of Ubuntu which are not compatible with newer Ubuntus which comes natively with Unity
In order to complete the step in adding the WineHQ Ubuntu PPA development repository my only way was to add it using command line.
Here is how:

root@ubuntu:~# apt-add-repository ppa:ubuntu-wine/ppa
You are about to add the following PPA to your system:
Latest official WineHQ releases
Welcome to the Wine Team PPA. Here you can get the latest available Wine betas for every supported version of Ubuntu. This PPA is managed by Scott Ritchie, and is a replacement for the WineHQ budgetdedicated.com repository used for Jaunty and earlier.
More info: https://launchpad.net/~ubuntu-wine/+archive/ppa
Press [ENTER] to continue or ctrl-c to cancel adding it
Executing: gpg --ignore-time-conflict --no-options --no-default-keyring --secret-keyring /tmp/tmp.bvo21sFWKG --trustdb-name /etc/apt/trustdb.gpg --keyring /etc/apt/trusted.gpg --primary-keyring /etc/apt/trusted.gpg --keyserver hkp://keyserver.ubuntu.com:80/ --recv 883E8688397576B6C509DF495A9A06AEF9CB8DB0
gpg: requesting key F9CB8DB0 from hkp server keyserver.ubuntu.com
gpg: key F9CB8DB0: public key "Launchpad PPA for Ubuntu Wine Team" imported
gpg: no ultimately trusted keys found
gpg: Total number processed: 1
gpg: imported: 1 (RSA: 1)

Similarly adding a PPA repository on Debian is also possible by using a little shell script add-apt-repository.sh . add-apt-repository.sh simulates what ubuntu's apt-add-repositry python script does.

It is educative to mention PPA stands for (Personal package Archive) and the difference between normal repository and PPA is mainly in the fact that PPA repositories makes a package distributed by the repository like the native Ubuntu packages issued by Canonical.
Once for example a new version of a file is placed in PPA deb package repository, the newer package will be automatically installed to the system using it.

Convert single PDF pages to multiple SVG files on Debian Linux with pdf2svg

Sunday, February 26th, 2012

In my last article, I've explained How to create PNG, JPG, GIF pictures from one single PDF document
Convertion of PDF to images is useful, however as PNG and JPEG graphic formats are raster graphics the image quality gets crappy if the picture is zoomed to lets say 300%.
This means convertion to PNG / GIF etc. is not a good practice especially if image quality is targetted.

I myself am not a quality freak but it was interesting to find out if it is possible to convert the PDF pages to SVG (Scalable Vector Graphics) graphics format.

Converting PDF to SVG is very easy as for GNU / Linux there is a command line tool called pdf2svg
pdf2svg's official page is here

The traditional source way compile and install is described on the homepage. For Debian users pdf2svg has already existing a deb package.

To install pdf2svg on Debian use:

debian:~# apt-get install --yes pdf2svg
...

Once installed usage of pdf2svg to convert PDF to multiple SVG files is analogous to imagemagick's convert .
To convert the 44 pages Projects.pdf to multiple SVG pages – (each PDF page to a separate SVG file) issue:

debian:~/project-pdf-to-images$ for i in $(seq 1 44); do \
pdf2svg Projects.pdf Projects-$i.SVG $i; \
done

This little loop tells each page number from the 44 PDF document to be stored in separate SVG vector graphics file:

debian:~/project-pdf-to-images$ ls -1 *.svg|wc -l
44

For BSD users and in particular FreeBSD ones png2svg has a bsd port in:

/usr/ports/graphics/pdf2svg

Installing on BSD is possible directly via the port and convertion of PDF to SVG on FreeBSD, should be working in the same manner. The only requirement is that bash shell is used for the above little bash loop, as by default FreeBSD runs the csh. 
On FreeBSD launch /usr/local/bin/bash, before following the Linux instructions if you're not already in bash.

Now the output SVG files are perfect for editting with Inkscape or Scribus and the picture quality is way superior to old rasterized (JPEG, PNG) images

Fix 503 AUTH first (#5.5.1) mail receive errors in Qmail

Friday, September 2nd, 2011

I have one qmail rocks install based on Thibbs Qmalrocks tutorial

I had to do some changes, to:
/etc/service/qmail-smtpd/run and /etc/service/qmail-smtpdssl/run init scripts.

After a qmail restart suddenly qmail stopped receiving any mail messages and my sent messages was returned with an error:

Connected to xx.xxx.xx.xx but sender was rejected.
Remote host said: 503 AUTH first (#5.5.1)

After investigating the issue I finally found, that one value I’ve changed in /etc/service/qmail-smtpd/run and /etc/service/qmail-smtpdssl was causing the whole mess:

The problematic variable was:

REQUIRE_AUTH=1

To solve the issue I had to disable the value which it seems, I have enabled by mistake.

Below is a quote from http://qmail.jms1.net which explains what REQUIRE_AUTH shell variable does:

Setting REQUIRE_AUTH=1 will make the service not accept ANY mail unless the client has sent a valid AUTH command. This also prevents incoming mail from being accepted for your own domains, so do not use this setting if the service is accepting “normal” mail from the outside world.
Restarting via qmailctl restart and qmail started receiving messages normal 😉