Posts Tagged ‘Deleting’

Analyze disk space usage in Linux / BSD with du / find and filelight /qdirstat / baobab GUI disk usage analyzers to check what takes up your disk space on Unix like OSes

Friday, April 21st, 2023

linux-how-to-find-out-what-files-and-directories-has-occupied-all-your-disk-space-partition-from-console-and-GUI_du-find-filelight-baobab-qdirstat-duff-linux-450x450

If you're a Desktop Linux or BSD UNIX user and your hard disk / external SSD / flash drive etc. space starts to be misteriously disapper due to whatever reaseon such as a crashing applications producing rapidly log error / warning messages leading quickly to filling up the disk or out of a sudden you have some Disk space lost without knowing what kind of data filled up the disk or you're downloading some big sized bittorrent files forgotten in your bittorrent client or complete mirroring a large website and you suddenly get the result of root directory ( / ) getting fully or nearly filled up, then you definitely would want to check out what has disk activity has eaten up your disk space and leaing to OS and Aplication slow responsiveness.

For the Linux regular *nix user finding out what is filling the disk is a trivial task with with find / du -hsc * but as people have different habits to use find and du I'll show you the most common ways I use this two command line tools to identify disk space low issues for the sake of comparison.
Others who have better easier ways to do it are very welcome to share it with me in the comments.
 

1. Finding large files on hard disk with find Linux command tool
 

host:~# find /home -type f -printf "%s\t%p\n" | sort -n | tail -10
2100000000    /home/hipo/Downloads/MameUIfx incl. ROMs/MameUIfx incl. ROMs-6.bin
2100000000    /home/hipo/Downloads/MameUIfx incl. ROMs/MameUIfx incl. ROMs-7.bin
2100000000    /home/hipo/Downloads/MameUIfx incl. ROMs/MameUIfx incl. ROMs-8.bin
2100000000    /home/hipo/Downloads/MameUIfx incl. ROMs/MameUIfx incl. ROMs-9.bin
2815424080    /home/hipo/.thunderbird/h3dasfii.default\
/ImapMail/imap.gmail.com/INBOX
2925584895    /home/hipo/Documents/.git/\
objects/pack/pack-8590b069cad26ac0af7560fb42b51fa9bfe41050.pack
4336918067    /home/hipo/Games/Mames_4GB-compilation-best-arcade-games-of-your-14_04_2021.tar.gz
6109003776    /home/hipo/VirtualBox VMs/CentOS/CentOS.vdi
23599251456    /home/hipo/VirtualBox VMs/Windows 7/Windows 7.vdi
33913044992    /home/hipo/VirtualBox VMs/Windows 10/Windows 10.vdi

I use less rarely find on Desktops and more when I have to do some kind of data usage analysis on servers, of course for my Linux home computer and any other Linux desktop machines, or just a small incomprehensive analysis du cmd is much more appropriate to use.


2. Finding large files Megabyte occupying space files sorted in Megabytes and Gigas with du
 

  • Check main 10 files sorted in megabytes that are hanging in a directory

pcfkreak:~# du -hsc /home/hipo/*|grep 'M\s'|sort -rn|head -n 10
956M    /home/hipo/last_dump1.sql
711M    /home/hipo/hipod
571M    /home/hipo/from-thinkpad_r61
453M    /home/hipo/ultimate-edition-themes
432M    /home/hipo/metasploit-framework
355M    /home/hipo/output-upgrade.txt
333M    /home/hipo/Плот
209M    /home/hipo/Work-New.tar.gz
98M    /home/hipo/DOOM64
90M    /home/hipo/mp3

  • Get 10 top larges files in Gigabytes that are space hungry and eating up your space

pcfkreak:~# du -hsc /home/hipo/*|grep 'G\s'|sort -rn|head -n 10
156G    total
60G    /home/hipo/VirtualBox VMs
37G    /home/hipo/Downloads
18G    /home/hipo/Desktop
11G    /home/hipo/Games
7.4G    /home/hipo/ownCloud
7.1G    /home/hipo/Документи
4.6G    /home/hipo/music
2.9G    /home/hipo/root
2.8G    /home/hipo/Documents


If you want to still work on the console terminal but you don't want to type too much you can use ncdu (ncurses) text tool, install it with

# apt install –yes ncdu


https://www.pc-freak.net/images/ncdu-gnu-linux-debian-screenshot.png

 For the most lazy ones or complete Linux newbies that doesn't want to spend time typing / learing or using text commands or softwares you can also check what has eaten up your full disk space with GUI tools as well.

There are at least 3 tools to use to check in Graphical Interface what has occupied your disk space on Linux / BSD, I'm aware of:

3. Filelight GUI disk usage analysis Linux tool

For those using KDE or preferring a shiny GUI interface that will capture the eye, perhaps filelight would be the option of choice tool to get analysis sum of your directory sturctures and file use on the laptop or desktop *unix OS.

unix-desktop:~# apt-cache show filelight|grep -i description-en -A 7
Description-en: show where your diskspace is being used
 Filelight allows you to understand your disk usage by graphically
 representing your filesystem as a set of concentric, segmented rings.
 .
 It is like a pie-chart, but the segments nest, allowing you to see both
 which directories take up all your space, and which directories
 and files inside those directories are the real culprits.
Description-md5: 397ff9a469e07a772f22460c66b66875


To use it simply go ahead and install it with apt or yum / dnf or whatever Linux package manager your distro uses:

unix-desktop:~# apt-get install –yes filelight

filelight-show-where-disk-space-is-being-used-graphically-tool-linux

4. GNOME DIsk Usage Analyzer Baobab GUI tool

For those being a GNOME / Mate / Budgie / Cinnamon Graphical interface users baobab shold be the program to use as it uses the famous LibGD library.

unix-desktop:~# apt-cache show baobab|grep -i description-en -A10
Description-en: GNOME disk usage analyzer
 Disk Usage Analyzer is a graphical, menu-driven application to analyse
 disk usage in a GNOME environment. It can easily scan either the whole
 filesystem tree, or a specific user-requested directory branch (local or
 remote).
 .
 It also auto-detects in real-time any changes made to your home
 directory as far as any mounted/unmounted device. Disk Usage Analyzer
 also provides a full graphical treemap window for each selected folder.
Description-md5: 5f6072b89ebb1dc83433fa7658814dc6
Homepage: https://wiki.gnome.org/Apps/Baobab

 

gnome-disk-analyzer-baobab-tool-screenshot-of-hard-disk-directory-locations-sorted-by-size

5. Qdirstat graphical application to show where your disk space has gone on Linux

Qdirstat is perhaps well known tool to track disk space issues on Linux desktop hosts, known by the hardcore KDE / LXDE / LXQT / DDE GUI interface / environment lovers and as a KDE tool uses the infamous Qt library. I personally don't like it and don't put it on machines I use because I never use kde and don't want to waste my disk space with additional libraries such as the QT Library which historically was not totally free in terms of licensing and even now is in both free and non free licensing GPL / LGPL and QT Commercial Licensing license.

unix-desktop:~# apt-cache show qdirstat|grep -i description-en -A10
Description-en: Qt-based directory statistics
 QDirStat is a graphical application to show where your disk space has gone and
 to help you to clean it up.
 .
 QDirStat has a number of new features compared to KDirStat. To name a few:
  * Multi-selection in both the tree and the treemap.
  * Unlimited number of user-defined cleanup actions.
  * Properly show errors of cleanup actions (and their output, if desired).
  * File categories (MIME types) and their treemap color are now configurable.
  * Exclude rules for directories are easily configurable.
  * Desktop-agnostic; no longer relies on KDE or any other specific desktop.


qdirstat-linux-screenshot-show-what-directory-uses-most-hard-disk-space

That shiny fuzed graphics is actually a repsesantation of all directories the bigger and if one scrolls on the colorful gamma a text with directory and size or file will appear. Though the graphical represantation is really c00l to me it is a bit unreadable, thus I prefer and recommend the other two GUI tools filelight or baobab instead.

6. Finding duplicate files on Linux system with duff command tool

Talking about big unknown left-over files on your hard drives, it is appropriate to mention one tool here that is a console one but very useful to anyone willing to get rid of old duplicate files that are hanging around on the disk. Sometimes such copies are produced while copying large amount of files from place to place or simply by mistake while copying Photo / Video files from your Smart Phone to Linux desktop etc. 

This is where the duff command line utility might be super beneficial for you.

unix-desktop:~# apt-cache show duff|grep -i description-en -A3
Description-en: Duplicate file finder
 Duff is a command-line utility for identifying duplicates in a given set of
 files.  It attempts to be usably fast and uses the SHA family of message
 digests as a part of the comparisons.

Using duff tool is very straight forward to see all the duplicate files hanging in a directory lets say your home folder.

unix-desktop:~#  duff -rP /home/hipo

/home/hipo/music/var/Quake II Soundtrack – Kill Ratio.mp3
/home/hipo/mp3/Quake II Soundtrack – Kill Ratio.mp3
2 files in cluster 44 (7913472 bytes, digest 98f38be49e2ffcbf90927f9357b3e24a81d5a649)
/home/hipo/music/var/HYPODIL_01-Scakauec.mp3
/home/hipo/mp3/HYPODIL_01-Scakauec.mp3
2 files in cluster 45 (2807808 bytes, digest ce9067ce1f132fc096a5044845c7fac73e99c0ed)
/home/hipo/music/var/Quake II Suondtrack – March Of The Stoggs.mp3
/home/hipo/mp3/Quake II Suondtrack – March Of The Stoggs.mp3
2 files in cluster 46 (3506176 bytes, digest efcc401b4ebda9b0b2367aceb8e334c8ba1a357d)
/home/hipo/music/var/Quake II Suondtrack – Quad Machine.mp3
/home/hipo/mp3/Quake II Suondtrack – Quad Machine.mp3
2 files in cluster 47 (7917568 bytes, digest 0905c1d790654016c2ecf2949f78d47a870c3822)
/home/hipo/music/var/Cyberpunk Group – Futureshock!.mp3
/home/hipo/mp3/Cyberpunk Group – Futureshock!.mp3

-r (Recursively search into all specified directories.)

P (Don't follow any symbolic links.  This overrides any previous -H or -L option.  This is the default.  Note that this only applies to directories, as sym‐
             bolic links to files are never followed.)

7. Deleting duplicate files with duff

If you're absolutely sure you know what you're doing and you have a backup in case if something messes up during duplicate teletions, to get rid of lets say any duplicate Picture files found by duff run sommething like:

# duff -e0 -r /home/hipo/Pictures/ | xargs -0 rm

!!! Please note that using duff is for those who absolutely know what they're doing and have their data recent data. Deleting the wrong data by mistake with the tool might put you in the first grade and you'll be the only one to blame  🙂 !!!

Wrap it Up

Filling up the disk with unknown large files is a task to resolve that happens often. For the unlazy on Linux / BSD / Mac OS and other UNIX like OS-es the easiest way is to use find or du with some one liner command. For the lazy Windows addicted Graphical users filelightqdirstat or baobab GUI disk usage analysis tools are there.
If you have a lot of files and many of thems are duplicates you can use duff to check them out and remove all unneded duplicates and save space. 
Hope this article, was helpful for someone.
That's all folks, enjoy your data profilactics, if you know any other good easy command or GUI tools or hints for drive disk space profilactics please share.

Run 2 and more Skypes simultaneously on Mac OS X – Run multiple Skype acccounts on same Mac

Saturday, June 21st, 2014

run-2-and-more-skypes-simultaneously-on-mac-os-x-multiple-skype-account-login-on-mac
For people running Mac OS X, the question of 
how is it possible to use 2 skype accounts in parallel on Mac probably makes good sense?

I don't own a Mac notebook and thefore I'm a Mac newbie, however, I'm into situation where I and my wife Svetlana went (for 3 days) to my hometown Dobrich and we have with us only her Mac OS X powered Mac Book air.

 

One user is already logged in Skype, (my wife) is expecting some relatives and friends to contact us and  same time I had to login to check few servers via ssh and discuss some server downtime issues from yesterday in Skype .
Thus we
need 2 skype instances to run separately on her Macbook air powered PC with Mac OS X Leopard
 

Earlier I've blogged how to make 2 and more Skype accounts work simultaneously on one Windows PC because I had to set it up for a company, in this short article I will explain how is possible to run many skype clients on Mac OS X.

 

1. Open Mac Terminal from Finder

finder-terminal-screenshot-mac-os-x-leopard-run-many-skypes-mac-os

2. In Terminal run the first Skype Instance

Type in Terminal:

open /Applications/Skype.app/Contents/MacOS/Skype

3. Run Second Skype instance

In older Skype Mac OS versions, I read the

/secondary

Skype command option was there and could be used to run a second parallel skype instance on Mac, however in newer releases this option was removed and if you try to invoke it warning window pops up saying an instance is already running.

mac-os-x-you-have-another-copy-of-skype-running-screenshot

To get around the issue and run the second Skype, quickest way is to run another Skype client under privileged user through sudo command (this is unsecure – but anyways as Mac OS is proprietary and we don't have access to code and probably there are tons of spy and report software integrated into the OS, it doesn't really matter.)

mac-os-x-skype-run-screenshot-pic

To get around the issue and run the second Skype, quickest way is to run another Skype client under privileged user through sudo command (this is unsecure – but anyways as Mac OS is proprietary and we don't have access to code and probably there are tons of spy and report software integrated into the OS, it doesn't really matter.)

4. Script it into 2nd_skype.sh for later use

To run and use two parallel skypes regularly it might be useful to make shell script out of it and place it somewhere, 2nd_skype.sh script should be something like:


#!/bin/bash
open /Applications/Skype.app/Contents/MacOS/Skype
sudo /Applications/Skype.app/Contents/MacOS/Skype

Then make the script executable with:

chmod a+x 2nd_skype.sh

5. Run more than 2 Skypes (Run multiple Skypes on same Mac PC hack)

There is another "hack" method with deleting the Skype.pid (Process ID). Skype recognize where it is running by checking its Skype.pid on start up.

Deleting the pid after each next Skype client launch,  allow the user to run as many Skypes as you want on Mac OS X but it is not clear for how long it time it will work.

rm -f ~/Library/Application Support/Skype/Skype.pid

Then launch again Skype in background from Mac Terminal

open -nW '/Application/Skype.app' &

In case if you wonder why the open command is used, since above line could be run also directly and Skype will pop-up, by using open command you instruct the program to detach itself from Terminal from which it run, so later if Terminal is closed Skype app. will not terminate.

Another approach is to create, a many users lets say 5 users and use the Skype sudo run method each client with a separate user.

sudo user1 /Applications/Skype.app/Contents/MacOS/Skype
sudo user2 /Applications/Skype.app/Contents/MacOS/Skype
sudo user3 /Applications/Skype.app/Contents/MacOS/Skype
sudo user4 /Applications/Skype.app/Contents/MacOS/Skype

sudo user5 /Applications/Skype.app/Contents/MacOS/Skype

I enclose the script with the custom icon (Skype) ready to be launched and Voila, on script launch Skype multiple login prompts pops up.

For the lazy ones who don't want to tamper with writting scripts or doing hacks to run Skype multiple times on Mac there is even a Multi Skype Launcher app for Mac.

 


 

How to delete million of files on busy Linux servers (Work out Argument list too long)

Tuesday, March 20th, 2012

How to Delete million or many thousands of files in the same directory on GNU / Linux and FreeBSD

If you try to delete more than 131072 of files on Linux with rm -f *, where the files are all stored in the same directory, you will get an error:

/bin/rm: Argument list too long.

I've earlier blogged on deleting multiple files on Linux and FreeBSD and this is not my first time facing this error.
Anyways, as time passed, I've found few other new ways to delete large multitudes of files from a server.

In this article, I will explain shortly few approaches to delete few million of obsolete files to clean some space on your server.
Here are 3 methods to use to clean your tons of junk files.

1. Using Linux find command to wipe out millions of files

a.) Finding and deleting files using find's -exec switch:

# find . -type f -exec rm -fv {} \;

This method works fine but it has 1 downside, file deletion is too slow as for each found file external rm command is invoked.

For half a million of files or more, using this method will take "long". However from a server hard disk stressing point of view it is not so bad as, the files deletion is not putting too much strain on the server hard disk.
b.) Finding and deleting big number of files with find's -delete argument:

Luckily, there is a better way to delete the files, by using find's command embedded -delete argument:

# find . -type f -print -delete

c.) Deleting and printing out deleted files with find's -print arg

If you would like to output on your terminal, what files find is deleting in "real time" add -print:

# find . -type f -print -delete

To prevent your server hard disk from being stressed and hence save your self from server normal operation "outages", it is good to combine find command with ionice, e.g.:

# ionice -c 3 find . -type f -print -delete

Just note, that ionice cannot guarantee find's opeartions will not affect severely hard disk i/o requests. On  heavily busy servers with high amounts of disk i/o writes still applying the ionice will not prevent the server from being hanged! Be sure to always keep an eye on the server, while deleting the files nomatter with or without ionice. if throughout find execution, the server gets lagged in serving its ordinary client requests or whatever, stop the execution of the cmd immediately by killing it from another ssh session or tty (if physically on the server).

2. Using a simple bash loop with rm command to delete "tons" of files

An alternative way is to use a bash loop, to print each of the files in the directory and issue /bin/rm on each of the loop elements (files) like so:

for i in *; do
rm -f $i;
done

If you'd like to print what you will be deleting add an echo to the loop:

# for i in $(echo *); do \
echo "Deleting : $i"; rm -f $i; \

The bash loop, worked like a charm in my case so I really warmly recommend this method, whenever you need to delete more than 500 000+ files in a directory.

3. Deleting multiple files with perl

Deleting multiple files with perl is not a bad idea at all.
Here is a perl one liner, to delete all files contained within a directory:

# perl -e 'for(<*>){((stat)[9]<(unlink))}'

If you prefer to use more human readable perl script to delete a multitide of files use delete_multple_files_in_dir_perl.pl

Using perl interpreter to delete thousand of files is quick, really, really quick.
I did not benchmark it on the server, how quick exactly is it, but I guess the delete rate should be similar to find command. Its possible even in some cases the perl loop is  quicker …

4. Using PHP script to delete a multiple files

Using a short php script to delete files file by file in a loop similar to above bash script is another option.
To do deletion  with PHP, use this little PHP script:

<?php
$dir = "/path/to/dir/with/files";
$dh = opendir( $dir);
$i = 0;
while (($file = readdir($dh)) !== false) {
$file = "$dir/$file";
if (is_file( $file)) {
unlink( $file);
if (!(++$i % 1000)) {
echo "$i files removed\n";
}
}
}
?>

As you see the script reads the $dir defined directory and loops through it, opening file by file and doing a delete for each of its loop elements.
You should already know PHP is slow, so this method is only useful if you have to delete many thousands of files on a shared hosting server with no (ssh) shell access.

This php script is taken from Steve Kamerman's blog . I would like also to express my big gratitude to Steve for writting such a wonderful post. His post actually become  inspiration for this article to become reality.

You can also download the php delete million of files script sample here

To use it rename delete_millioon_of_files_in_a_dir.php.txt to delete_millioon_of_files_in_a_dir.php and run it through a browser .

Note that you might need to run it multiple times, cause many shared hosting servers are configured to exit a php script which keeps running for too long.
Alternatively the script can be run through shell with PHP cli:

php -l delete_millioon_of_files_in_a_dir.php.txt.

5. So What is the "best" way to delete million of files on Linux?

In order to find out which method is quicker in terms of execution time I did a home brew benchmarking on my thinkpad notebook.

a) Creating 509072 of sample files.

Again, I used bash loop to create many thousands of files in order to benchmark.
I didn't wanted to put this load on a productive server and hence I used my own notebook to conduct the benchmarks. As my notebook is not a server the benchmarks might be partially incorrect, however I believe still .they're pretty good indicator on which deletion method would be better.

hipo@noah:~$ mkdir /tmp/test
hipo@noah:~$ cd /tmp/test;
hiponoah:/tmp/test$ for i in $(seq 1 509072); do echo aaaa >> $i.txt; done

I had to wait few minutes until I have at hand 509072  of files created. Each of the files as you can read is containing the sample "aaaa" string.

b) Calculating the number of files in the directory

Once the command was completed to make sure all the 509072 were existing, I used a find + wc cmd to calculate the directory contained number of files:

hipo@noah:/tmp/test$ find . -maxdepth 1 -type f |wc -l
509072

real 0m1.886s
user 0m0.440s
sys 0m1.332s

Its intesrsting, using an ls command to calculate the files is less efficient than using find:

hipo@noah:/tmp/test$ time ls -1 |wc -l
509072

real 0m3.355s
user 0m2.696s
sys 0m0.528s

c) benchmarking the different file deleting methods with time

– Testing delete speed of find

hipo@noah:/tmp/test$ time find . -maxdepth 1 -type f -delete
real 15m40.853s
user 0m0.908s
sys 0m22.357s

You see, using find to delete the files is not either too slow nor light quick.

– How fast is perl loop in multitude file deletion ?

hipo@noah:/tmp/test$ time perl -e 'for(<*>){((stat)[9]<(unlink))}'real 6m24.669suser 0m2.980ssys 0m22.673s

Deleting my sample 509072 took 6 mins and 24 secs. This is about 3 times faster than find! GO-GO perl 🙂
As you can see from the results, perl is a great and time saving, way to delete 500 000 files.

– The approximate speed deletion rate of of for + rm bash loop

hipo@noah:/tmp/test$ time for i in *; do rm -f $i; done

real 206m15.081s
user 2m38.954s
sys 195m38.182s

You see the execution took 195m en 38 secs = 3 HOURS and 43 MINUTES!!!! This is extremely slow ! But works like a charm as the running of deletion didn't impacted my normal laptop browsing. While the script was running I was mostly browsing through few not so heavy (non flash) websites and doing some other stuff in gnome-terminal) 🙂

As you can imagine running a bash loop is a bit CPU intensive, but puts less stress on the hard disk read/write operations. Therefore its clear using it is always a good practice when deletion of many files on a dedi servers is required.

b) my production server file deleting experience

On a production server I only tested two of all the listed methods to delete my files. The production server, where I tested is running Debian GNU / Linux Squeeze 6.0.3. There I had a task to delete few million of files.
The tested methods tried on the server were:

– The find . type -f -delete method.

– for i in *; do rm -f $i; done

The results from using find -delete method was quite sad, as the server almost hanged under the heavy hard disk load the command produced.

With the for script all went smoothly. The files were deleted for a long long time (like few hours), but while it was running, the server continued with no interruptions..

While the bash loop was running, the server load avarage kept at steady 4
Taking my experience in mind, If you're running a production, server and you're still wondering which delete method to use to wipe some multitude of files, I would recommend you go  the bash for loop + /bin/rm way. Yes, it is extremely slow, expect it run for some half an hour or so but puts not too much extra load on the server..

Using the PHP script will probably be slow and inefficient, if compared to both find and the a bash loop.. I didn't give it a try yet, but suppose it will be either equal in time or at least few times slower than bash.

If you have tried the php script and you have some observations, please drop some comment to tell me how it performs.

To sum it up;

Even though there are "hacks" to clean up some messy parsing directory full of few million of junk files, having such a directory should never exist on the first place.

Frankly, keeping millions of files within the same directory is very stupid idea.
Doing so will have a severe negative impact on a directory listing performance of your filesystem in the long term.

If you know better (more efficient) ways to delete a multitude of files in a dir please share in comments.

How to block IP address with pf on FreeBSD, NetBSD and OpenBSD

Wednesday, July 27th, 2011

Pf Firewall BSD logo

I’ve noticed some IPs which had a kind of too agressive behaviour towards my Apache webserver and thus decided to filter them out with the Firewall.
As the server is running FreeBSD and my firewall choise is bsd’s pf I added the following lines to my /etc/pf.conf to filter up the abiser IP:

table persist file "/etc/pf.blocked.ip.conf"
EXT_NIC="ml0" # interface connected to internet
block drop in log (all) quick on $EXT_NIC from to any
echo '123.123.123.123' >> /etc/pf.blocked.ip.conf

As you see I’m adding the malicious IP to /etc/pf.blocked.ip.conf, if I later decide to filter some other IPs I can add them up there and they will be loaded and filtered by pf on next pf restart.

Next I restarted my pf firewall definitions to make the newly added rules in pf.conf to load up.

freebsd# pfctl -d
freebsd# pfctl -e -f /etc/pf.conf

To show all IPs which will be inside the blockips filtering tables, later on I used:

pfctl -t blockips -T show

I can also later use pf to add later on new IPs to be blocked without bothering to restart the firewall with cmd:

freebsd# pfctl -t blockedips -T add 111.222.333.444

Deleting an IP is analogous and can be achieved with:

freebsd# pfctl -t blockedips -T delete 111.222.333.444

There are also logs stored about pf IP blocking as well as the other configured firewall rules in /var/log/pflog file.
Hope this is helpful to somebody.