Friday, May 21, 2010

Finally sorted my duplicate photo collection.

I figured out recently that I have accumulated 21GB of photos and movie clips from my small Pentax S4 camera and was wondering what should I do to remove the duplicates. Having searched google for linux photo management softwares I installed digiKam (from KDE) and it took almost half an hour to catalog them and used fuzzy search (I think I used the right term) to find duplicates and shocked me with many photos with as many as 11 copies in different folders from backups.

I took the painful way of manually cleaning them from duplicates list in digiKam and spent almost 1 whole day in all to do this. In the end I was expecting my original collection to be clean but it wasn't but digiKam showed it clean. So thought of deleting and recreating new album (biggest mistake) and was shocked again by it found all the duplicates and then I realized when you add an album it does the whole copy of the collection and edits the copy. (this is really annoying for me but it may there for good reason of not loosing stuff accidentally) Having said all this I do think it's a very good photo management application.

I was no way ready for the same exercise again so back to Google again and found another solution fslint which shows duplicate files but is still manual work to remove duplicates.

Further search gave me wonderful command on one of the forums using fdupes which did a very good job for me. I was bit skeptical in first place with fear to loose some files so ran it without -d option like
fdupes -r photos/clean_album and reviewed output and was happy with the listing so when ahead and used it as suggested by forum
yes 1 | fdupes -rd photos/clean_album and I had my 95% or duplicate removal work done. (I wish I should have used it in first place) Rest of the work was to clean up using Digikam and then flatten them into one directory where I have some name conflicts which I used as my blog name low-tech approach and added prefixes to avoid clashes.

End result 21GB is now reduced to 8.5GB.

I think I should be more careful going forward about duplicates and backups.

EDIT: Another note is I did miss the avi clips when I did manual cleanup using digiKam but fslint and fdupes sorted those out as well.

6 comments:

Bob said...

Thank-you for this post! You have saved the next ten years of my life.

I had recovered all of my family photos from a failed hard drive, but in the process had duplicate files with a random naming scheme. I had 5200 files to go through and was sick when I ran "find duplicates" in Digikam. This command was a life saver.

Mazya uchapati (my low-tech efforts) said...

@Bob I am glad it helped you. I had to play a lot to get it right and even at stage once considered writing my own java/C application to do the job.

wingnux said...

THANK YOU VERY MUCH!!! It was a pain manually removing the dupes on digikam and you helped me saving a lot of time!

Anonymous said...

Doesn't the fdupes-option -N result in the same result instead of using "yes"?

Manpage says:
"-N --noprompt
when used together with --delete, preserve the first file in
each set of duplicates and delete the others without prompting
the user"

Anonymous said...

Thanks for the tip!
fdupes works fast and great for binary-equal files; then I had to use digikam because of its fuzzy search, yes it's painful but it does the job.

Greetings from Mexico.

Angel Ikaz said...

i would suggest you to try DuplicateFilesDeleter , it can help resolve duplicate files issue.