I’m in the process of deleting duplicate copies of photos from my system in the interest of saving disk space (320MB just isn’t all that much when one is playing with video!). I’ve used find and md5 to build a list of all the .jpg files on my system with their hashes, and wrote some simple Python code to find duplicates. The problem now is to choose the best one of the duplicates (“best” in this case means the one with the most useful file and directory name). But my code just creates a pile of small files, one for each set of identical photos. This is a pain.
So I decided to install MySQL so that I could crawl through the duplicates in some interesting manner (yes, I did have wine with dinner tonight. Why do you ask?). The install of MySQL itself was trivial, but trying to install the Python binding (MySQLdb) was a pain. The important tip came from a posting on Jeremy Dunck’s blog — I was using gcc 3.3, and needed gcc 4. His posting explained what to do; I did it, and now I’m all set.
Now I have to actually put the data into the database and do something with it — that’s a problem for another evening, though….