Posts Tagged ‘g man’

How to exclude files on copy (cp) on GNU / Linux / Linux copy and exclude files and directories (cp -r) exclusion

Saturday, March 3rd, 2012

I've recently had to make a copy of one /usr/local/nginx directory under /usr/local/nginx-bak, in order to have a working copy of nginx, just in case if during my nginx update to new version from source mess ups.

I did not check the size of /usr/local/nginx , so just run the usual:

nginx:~# cp -rpf /usr/local/nginx /usr/local/nginx-bak
...

Execution took more than 20 seconds, so I check the size and figured out /usr/local/nginx/logs has grown to 120 gigabytes.

I didn't wanted to extra load the production server with copying thousands of gigabytes so I asked myself if this is possible with normal Linux copy (cp) command?. I checked cp manual e.g. man cp, but there is no argument like –exclude or something.

Even though the cp command exclude feature is not implemented by default there are a couple of ways to copy a directory with exclusion of subdirectories of files on G / Linux.

Here are the 3 major ones:

1. Copy directory recursively and exclude sub-directories or files with GNU tar

Maybe the quickest way to copy and exclude directories is through a littke 'hack' with GNU tar nginx:~# mkdir /usr/local/nginx-new;
nginx:~# cd /usr/local/nginx#
nginx:/usr/local/nginx# tar cvf - \. --exclude=/usr/local/nginx/logs/* \
| (cd /usr/local/nginx-new; tar -xvf - )

Copying that way however is slow, in my case it fits me perfectly but for copying large chunks of data it is better not to use pipe and instead use regular tar operation + mv

# cd /source_directory
# tar cvf test.tar --exclude=dir_to_exclude/*\--exclude=dir_to_exclude1/* . \
# mv test.tar /destination_directory
# cd /destination# tar xvf test.tar

2. Copy folder recursively excluding some directories with rsync

P>eople who has experience with rsync , already know how invaluable this tool is. Rsync can completely be used as for substitute=de.a# rsync -av –exclude='path1/to/exclude' –exclude='path2/to/exclude' source destination

This example, can also be used as a solution to my copy nginx and exclude logs directory casus like so:

nginx:~# rsync -av --exclude='/usr/local/nginx/logs/' /usr/local/nginx/ /usr/local/nginx-new

As you can see for yourself, this is a way more readable for the tar, however it will not work on servers, where rsync is not installed and it is unusable if you have to do operations as a regular users on such for that case surely the GNU tar hack is more 'portable' across systems.
rsync has also Windows version and therefore, the same methodology should be working on MS Windows and good for batch scripting.
I've not tested it myself, yet as I've never used rsync on Windows, if someone has tried and it works pls drop me a short msg in comments.
3. Copy directory and exclude sub directories and files with find

Find in collaboration with cp can also be used to exclude certain directories while copying. Actually this method is better than the GNU tar hack and surely more efficient. For machines, where rsync is not installed it is just a perfect way to copy files from location to location, while excluding some directories, here is an example use of find and cp, for the above nginx case:

nginx:~# cd /usr/local/nginx
nginx:~# mkdir /usr/local/nginx
nginx:/usr/local/nginx# find . -type d \( ! -name logs \) -print -exec cp -rpf '{}' /usr/local/nginx-bak \;

This will find all directories inside /usr/local/nginx with find command print them on the screen, then execute recursive copy over each found directory and copy to /usr/local/nginx-bak

This example will work fine in the nginx case because /usr/local/nginx does not contain any files but only sub-directories. In other occwhere the directory does contain some files besides sub-directories the files had to also be copied e.g.:

# for i in $(ls -l | egrep -v '^d'); do\
cp -rpf $i /destination/directory

This will copy the files from source directory (for instance /usr/local/nginx/my_file.txt, /usr/local/nginx/my_file1.txt etc.), which doesn't belong to a subdirectory.

The cmd expression:

# ls -l | egrep -v '^d'

Lists only the files while excluding all the directories and in a for loop each of the files is copied to /destination/directory

If someone has better ideas, please share with me 🙂

How to mount directory in memory on GNU / Linux and FreeBSD / Mount directory in RAM memory to increase performance on Linux and BSD

Tuesday, October 11th, 2011

One of the websites hosted on a server I currently manage has a cache directory which doesn’t take much space but tens of thousands of tiny files. Each second a dozen of files are created in the cache dir. Hence using a hard disk directory puts some serious load on the server consequence of the many fopen and fclose HDD I/O operations.

To get through the problem, the solution was obvious use a directory which stores its information in memory.
There are of course other benefits of using a memory to store data in as we all know as access to RAM is so many times faster.

GNU/Linux is equipped with a tmpfs since kernel version 2.4.x, primary usage of tmpfs file system across many G / Linux distributious is the /tmp directory.

Some general tmpfs information about tmpfs is explained in mount’s manual e.g.: man mount, a good other reading is the tmpfs kernel documentation file

An implementation of tmpfs is /dev/shm .

/dev/shm is a standard memory device used among Linuces, its actually an ordinary directory one can list with ls . /dev/shm is used as a “virtual directory” memory space. Below is an output of /dev/shm from my notebook, one can see few files stored in memory which belong to the pulse audio linux architecture:

linux:~$ ls -al /dev/shm
ls -al /dev/shm/total 7608drwxrwxrwt 2 root root 160 Oct 10 18:05 .
drwxr-xr-x 16 root root 3500 Oct 10 10:57 ..
p-w------- 1 root root 0 Oct 10 10:57 acpi_fakekey
-r-------- 1 hipo hipo 67108904 Oct 10 17:20 pulse-shm-2067018443
-r-------- 1 hipo hipo 67108904 Oct 10 10:59 pulse-shm-2840042043
-r-------- 1 hipo hipo 67108904 Oct 10 10:59 pulse-shm-3215031142
-r-------- 1 hipo hipo 67108904 Oct 10 18:05 pulse-shm-4157723670
-r-------- 1 hipo hipo 67108904 Oct 10 18:06 pulse-shm-702872358

To measure the size of /dev/shm across different Linux distriubtions one can use the usual df cmd, e.g.:

[root@centos: ~]$ df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 16G 0 16G 0% /dev/shm

Above I show a df -h /dev/shm output from a CentOS server equipped with 32 GB of memory, as you can see CentOS has reserved half of the size of the system memory (16GB) for the purposes of creating files in memory through /dev/shm. The memory is dynamically assigned, so if its not use the assigned memory by it can still be used for the purposes of the services running (which by the way is very nice).

Accoring to what, I’ve read in wikipedia about tmpfs, tmpfs defaults in Linux to half of the system physical memory.
However I’ve noticed Debian Linux hosts usually reserve less memory for /dev/shm, on my personal notebook Debian /dev/shm is only 1 Giga, where on a Debian running server, Debian automatically has set it to the humble 2GB. Setting less by the way as with the Debian example, is a rather good idea since not many desktop or server applications are written to get actively advantage of the virtual /dev/shm directory.

One can directly drop files in /dev/shm which will immediately be stored in memory, and after a system reboot the files will disappear.
Let’s say you zip archive file, testing.zip and you like to store the file in memory to do so, just copy the file in /dev/shm.

linux:~$ cp -rpf testing.zip /dev/shm

You don’t even need to be root to copy files in the “virtual memory directory”. This is a reason many crackers (script kiddies), are storing their cracking tools in /dev/shm 😉

A rather funny scenario, I’ve witness before is when you list /dev/shm on some Linux server and suddenly you see a tons of brute forcing tools and all kind of “hack” related stuff belonging to some system user. Sometimes even this malicious script tools belong to the root user…

Now as I’ve said a few words on how linux’s tmpfs works here is how to mount a directory which cache content will be stored in volatile memory:

linux:~# mount -t tmpfs -o size=3G,mode=0755 tmpfs /var/www/site/cache

As you can see the above command will dynamically assign a tmpfs directory taking up from the system RAM mem which could expand up to 3GB within the system memory.

Of course before mounting, its necessery to create the /var/www/site/cache and set proper permissions in the above example I use /var/www/site/cache with a default permissions of 755 which is owned by the use with which the Apache server is running, e.g.:

linux:~# mkdir -p /var/www/site/cache
linux:~# chown -R www-data:www-data /var/www/site/cache
linux:~# chmod -R 755 /var/www/site/cache

Using a tmpfs is very handy and have many advantages, however one should be very careful with the data stored inside a tmpfs dir resource, all data will be lost in case of sudden system restart as the data is stored in the memory.
One other problems one might expect with tmpfs would be if the assigned virtual disk space gets filled with data. It never happened to me but, I’ve red online some stories that in the past this led to system crashes, today as the dox I’ve checked prescribed overfilling it will start swapping make the system terribly sluggish and eventually afred depleting the reserver swap space will start killing processes.

Using tmpfs as a cache directory is very useful on servers running Apache+PHP/Perl/Python/Ruby etc. as it can be used for stroring script generated temorary data.

Using a tmpfs can signifantly decrease server i/o created disk overheads.

Some other application I can think of though, I haven’t tested it would be if tmpfs mounted directory is used to store scripting executable files, copied after restart. Executing the script reading it directly from the “virtual directory” could for sure have very good impact especially on huge websites.
One common service which takes advantage of the elegancy of tmpfs nowdays almost all modern GNU/Linux has is udevd – The Linux dynamic device management. By the way (man udev) is a very good and must read manual especially for Linux novices to get a good basic idea on how /dev/ mamagement occurs via udev.

To make permenant directory contained in memory on Linux the /etc/fstab file should be used.

In order to mount permanently a directory as a memory device of a size of 3GB with 0755 permissions in /var/www/site/cache, as shown in the earlier example, one can use the command:

linux:~# echo 'tmpfs /var/www/site/cache/ tmpfs size=3G,mode=0755 0 0' >> /etc/fstab

This will assure the directory stored in memory would be recreated on next boot.

Nowdays the use of tmpfs is constantly growing, I’ve seen it to be used as a way to substitute ordinary disk based /tmp with a tmpfs directory contained in memory in Cloud Linux OS.
The applications of tmpfs is pretty much to the imagination of the one who wants to get advantage of it. For sure using tmpfs will be seen by the Linux GUI programs.

Going to FreeBSD and the BSD world, tmpfs is also available, however it is still considered a bit experimental. To get use of tmpfs to gain some performance, one should first enable it via bsd’s /etc/rc.conf:

freebsd# echo 'tmpfs_load="YES"' >> /etc/rc.conf

Mounting a directory permanently using tmpfs persmanently it again is doable via /etc/fstab to add a new directory inside memory with tmpfs: is done with adding:

freebsd# echo 'tmpfs /var/www/site/cache tmpfs rw 0 0' >> /etc/fstab

The native equivallent of tmpfs in FreeBSD is called mdmfs.
As I said it is slower than tmpfs but rock solid.

To mount a 4gigabyte size mdmfs “ram directory” on BSD from csh:

freebsd# mdmfs -s 4g md /var/www/site/cache

Mounting a directory permanently using tmpfs persmanently it again is doable via /etc/fstab to add a new directory inside memory with tmpfs: is done with adding:

freebsd# echo 'tmpfs /var/www/site/cache tmpfs rw 0 0' >> /etc/fstab

The native equivallent of tmpfs in FreeBSD is called mdmfs.
As I said it is slower than tmpfs but rock solid.

To mount a 4gigabyte size mdmfs “ram directory” on BSD from csh:

freebsd# mdmfs -s 4g md /var/www/site/cache

Mounting a directory permanently using tmpfs persmanently it again is doable via /etc/fstab to add a new directory inside memory with tmpfs: is done with adding:

freebsd# echo 'md /var/www/site/cache mfs rw,-s4G 2 0' >> /etc/fstab There are some reports of users who presumable use it to increase the ports / kernel compile times, but I haven’t tried it yet so I don’t know for sure

In huge corporations like Google and Yahoo tmpfs is certanly used a lot as this technology can dramatically can improve access times to information.I’m curious to know for some good ways to get use of tmpfs to improve efficiency.
If someone has some readings or has some idea please shar with me 😉