Archive for December, 2008

Remove duplicate files

Wednesday, December 3rd, 2008

This is a slightly modified version of the script published here. It allows you to scan files for duplicates based on md5 checksums.

#!/bin/bash
# rd - remove dupliactes

# find the files using the specified 'find arguments'
find "$@" -type f -print0 |

# calculate checksum for each file
xargs -0 -n1 md5sum |

# sort on the checksum
sort --key=1,32 |

# show remove command for each duplicate file
awk 'dup[$1]++{print "rm -f " $2}'

exit 0

The script is safe to use, it is not able to actually delete files itself. Instead, it generates a script that does the risky stuff.

Usage
To see what files are marked as duplicate in the current working directory:

$ rd .
rm -f ./config_backup_2008-11-06_11.30.01.tar.bz2
rm -f ./config_backup_2008-11-07_11.30.01.tar.bz2
rm -f ./config_backup_2008-11-08_11.30.01.tar.bz2

If you like the result, you can execute the generated commands. This can by piping the output to the shell:

$ rd . | sh

Processing the rd command might take some time. So you can also copy and paste the output in the terminal when there are a lot of (big) files.
Since the script passes all arguments to the find command. It’s also possible to fine tune the find command. For example, you only want to remove duplicates in the current directory, without searching in sub directories:

#  rd . -maxdepth 1

I’m using the script to remove duplicate backup sets.

Cool simple CMS: Pluck!

Monday, December 1st, 2008

Recently I’ve put a simple cms driven website online (www.ik-zie-je.nl). I’ts using Pluck. Pluck is easy to install (doesn’t need a database) and easy to use. It can be extended with custom modules and themes.