How to find and Delete Duplicate files in directory on Linux server with find and fdupes command

Monday, 16th March 2015

search-duplcate-files-linux-command-and-graphical-tools-how-to-find-duplicate-files-on-linux-mac-and-windows-os

Linux / UNIX find command is very helpful to do a lot of tasks to us admins such as Deleting empty directories to free up occupied inodes or finding and printing only empty files within a root file system within all sub-directories
There is too much of uses of find, however one that is probably rarely used known by sysadmins find command use is how to search for duplicate files on a Linux server:
 

find -not -empty -type f -printf “%sn” | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=separate

If you're curious how does duplicate files finding works, they are found by comparing file sizes and MD5 signatures, followed by a byte-by-byte comparison.

Most common application of below command is when you want to search and get rid of some old obsolete files which you forgot to delete such as old /etc/ configurations, old SQL backups and PHP / Java / Python programming code files etc.

If you have to do a regular duplicate file find on multiple servers Linux servers perhaps you should install and use  fdupes command.
On Debian Linux to install it:

root@pcfreak:/# apt-cache show fdupes|grep -i descr -A 4
Description: identifies duplicate files within given directories
 FDupes uses md5sums and then a byte by byte comparison to find
 duplicate files within a set of directories. It has several useful
 options including recursion.
Homepage: http://code.google.com/p/fdupes/

 

root@www.pc-freak.net:/# apt-get install –yes fdupes

To search for duplicate files with fdupes in lets /etc/ just run fdupes without arguments:

 

root@pcfreak:/# fdupes /etc/
/etc/magic
/etc/magic.mime

/etc/odbc.ini
/etc/.pwd.lock
/etc/environment
/etc/odbcinst.ini

/etc/shadow-
/etc/shadow


If you want to look up for all duplicate files within root directory:
 

root@pcfreak:/# fdupes -r /etc/
Building file list /

 

You can also find duplicate files for multiple directories by just passing all directories as arguments to fdupes

 

root@pcfreak:/# fdupes -r /etc/ /usr/ /root /disk /nfs_mount /nas


The -r argument (makes a recursive subdirectory search for duplicates), if you want to also see what is the size of duplicate files found add -S option

 

fdupes -r -S /etc/ /usr/ /root /disk /nfs_mount /nas

 


If you want to delete all duplicate files within lets say /etc/

 

root@pcfreak:/# fdupes -d /etc/

fdupes is also available and installable also on RPM based Linux distros Fedora / RHEL / CentOS etc., install on CentOS with:
 

[root@centos~ ]# yum -y install fdupes


There is also a port available for those who want to run it on FreeBSD on BSD install it from ports:

 

freebsd# cd /usr/ports/sysutils/fdupes
freebsd# make install clean


If you have a GUI environment installed on the server and you don't want to bother with command line to search for all duplicate files under main filesystem and other lint (junk) files take a look at FSlint

FSlint-2.02-search-for-duplicate-and-lint-files-linux-gui-tool

If you're looking for a GUI cross platform duplicate file finder tool that runs on all major used Operating Systems Mac OS X / Windows / Linux take a look at dupeGuru

 

Share this on:

Download PDFDownload PDF

Tags: , , , , , , , , , , , , , ,

One Response to “How to find and Delete Duplicate files in directory on Linux server with find and fdupes command”

  1. angel ikaz says:
    Google Chrome 48.0.2564.116 Google Chrome 48.0.2564.116 Windows 7 x64 Edition Windows 7 x64 Edition
    Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36

    i would suggest you to try DuplicateFilesDeleter , it can help resolve duplicate files issue.

    View CommentView Comment

Leave a Reply

CommentLuv badge