rsync, basic is best

I’ve tried countless backup programs over the years in search of the best solution for my needs. Virtually all of them do the standard full, incremental or differential backups. This is fine for archival purposes, since you can recover all documents lost after a failure.

The downside is that, unless you do a full backup each time, in the case of a failure you have to piece together the last state of the files combining the full and incremental backups. This manual piecing together may require a lot of manual merging if files have been moved between incremental backups that result in two copies in the recovered file system. Similarly obsolete files that were purposely deleted will be restored, since file deletions are not recorded by incremental or differential backups.

What I want is essentially a full backup or mirror each time, so that the backup always represents an exact copy of what I’m backing up. Then in the case of a failure, it is a simple file copy to a new disk to restore, or in an emergency just use the backup directly since it is identical to the original.

Backing up terabytes of data is still to slow and costly to keep a sequence of full backups. Mirrors can be kept in sync quicker, but if you accidentally delete a file and it gets mirrored, you lost it in the backup as well. What would be ideal is mirroring where any deleted or changed files are kept in a side folder. The main mirror folder always contains an exact copy of the original but any previous revisions or deleted files can still be recovered. This is similar to what Time Machine does for OSX, but rsync can do this for linux and other platforms using related projects.

Wading through the large number of options for rsync can be intimidating. Its a powerful tool and so can be disastrous if incorrectly used. You don’t want to accidentally mix source and destination with the –delete option for example. Try out the options in a sandbox first until you see how it works. A good first step may be to use a graphical front end such as grsync where all the options are clearly labeled with context help.

The mirroring command I’d use to backup a drive called nas1 would be as follows:

rsync -r -t --progress --delete -b --backup-dir=/mnt/backup/$(date +%F)/nas1 /mnt/nas1 /mnt/backup/nas1

  • -r to recurse through the directories
  • -t to preserve the timestamps of the original file
  • –progress to display the progress
  • –delete to remove deleted source files from the destination. This makes it a true mirror, identical to the source
  • -b backup changed and deleted files from the destination
  • –backup-dir where to move the backups, in a date specific folder. Files on the mirror which are being replaced or deleted will be moved here instead, so they can be recovered if needed.
  • /mnt/nas1 the drive being backed up
  • /mnt/backup/nas1 the folder to place the mirror image into

What this results in is the drive ‘nas1′ mirrored into the folder ‘nas1′ on drive backup. Previous versions and deleted files moved to a date specific folder such as 2009-04-20/nas1 on drive backup.