Linux / UNIX find command is very helpful to do a lot of tasks to us admins such as Deleting empty directories to free up occupied inodes or finding and printing only empty files within a root file system within all sub-directories
There is too much of uses of find, however one that is probably rarely used known by sysadmins find command use is how to search for duplicate files on a Linux server:
find -not -empty -type f -printf “%sn” | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=separate
If you're curious how does duplicate files finding works, they are found by comparing file sizes and MD5 signatures, followed by a byte-by-byte comparison.
Most common application of below command is when you want to search and get rid of some old obsolete files which you forgot to delete such as old /etc/ configurations, old SQL backups and PHP / Java / Python programming code files etc.
If you have to do a regular duplicate file find on multiple servers Linux servers perhaps you should install and use fdupes command.
On Debian Linux to install it:
root@pcfreak:/# apt-cache show fdupes|grep -i descr -A 4
Description: identifies duplicate files within given directories
FDupes uses md5sums and then a byte by byte comparison to find
duplicate files within a set of directories. It has several useful
options including recursion.
Homepage: http://code.google.com/p/fdupes/
root@www.pc-freak.net:/# apt-get install –yes fdupes
To search for duplicate files with fdupes in lets /etc/ just run fdupes without arguments:
root@pcfreak:/# fdupes /etc/
/etc/magic
/etc/magic.mime
/etc/odbc.ini
/etc/.pwd.lock
/etc/environment
/etc/odbcinst.ini
/etc/shadow-
/etc/shadow
If you want to look up for all duplicate files within root directory:
root@pcfreak:/# fdupes -r /etc/
Building file list /
…
You can also find duplicate files for multiple directories by just passing all directories as arguments to fdupes
root@pcfreak:/# fdupes -r /etc/ /usr/ /root /disk /nfs_mount /nas
…
The -r argument (makes a recursive subdirectory search for duplicates), if you want to also see what is the size of duplicate files found add -S option
fdupes -r -S /etc/ /usr/ /root /disk /nfs_mount /nas
If you want to delete all duplicate files within lets say /etc/
root@pcfreak:/# fdupes -d /etc/
…
fdupes is also available and installable also on RPM based Linux distros Fedora / RHEL / CentOS etc., install on CentOS with:
[root@centos~ ]# yum -y install fdupes
There is also a port available for those who want to run it on FreeBSD on BSD install it from ports:
freebsd# cd /usr/ports/sysutils/fdupes
freebsd# make install clean
If you have a GUI environment installed on the server and you don't want to bother with command line to search for all duplicate files under main filesystem and other lint (junk) files take a look at FSlint
If you're looking for a GUI cross platform duplicate file finder tool that runs on all major used Operating Systems Mac OS X / Windows / Linux take a look at dupeGuru
More helpful Articles
Tags: command, Delete Duplicate, directory, filesystem, freebsd, How to, ini, Linux, look, multiple, obsolete files, pcfreak, root, use, usr
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36
i would suggest you to try DuplicateFilesDeleter , it can help resolve duplicate files issue.
View CommentView Comment