Friday, May 21, 2010

Finally sorted my duplicate photo collection.

I figured out recently that I have accumulated 21GB of photos and movie clips from my small Pentax S4 camera and was wondering what should I do to remove the duplicates. Having searched google for linux photo management softwares I installed digiKam (from KDE) and it took almost half an hour to catalog them and used fuzzy search (I think I used the right term) to find duplicates and shocked me with many photos with as many as 11 copies in different folders from backups.

I took the painful way of manually cleaning them from duplicates list in digiKam and spent almost 1 whole day in all to do this. In the end I was expecting my original collection to be clean but it wasn't but digiKam showed it clean. So thought of deleting and recreating new album (biggest mistake) and was shocked again by it found all the duplicates and then I realized when you add an album it does the whole copy of the collection and edits the copy. (this is really annoying for me but it may there for good reason of not loosing stuff accidentally) Having said all this I do think it's a very good photo management application.

I was no way ready for the same exercise again so back to Google again and found another solution fslint which shows duplicate files but is still manual work to remove duplicates.

Further search gave me wonderful command on one of the forums using fdupes which did a very good job for me. I was bit skeptical in first place with fear to loose some files so ran it without -d option like
fdupes -r photos/clean_album and reviewed output and was happy with the listing so when ahead and used it as suggested by forum
yes 1 | fdupes -rd photos/clean_album and I had my 95% or duplicate removal work done. (I wish I should have used it in first place) Rest of the work was to clean up using Digikam and then flatten them into one directory where I have some name conflicts which I used as my blog name low-tech approach and added prefixes to avoid clashes.

End result 21GB is now reduced to 8.5GB.

I think I should be more careful going forward about duplicates and backups.

EDIT: Another note is I did miss the avi clips when I did manual cleanup using digiKam but fslint and fdupes sorted those out as well.