Loading
Jun 11, 2018

Why we ❤ rsync

written by Björn Korella

We provide service for the EU parliament that requires us to keep data all across Europe synchronized. The customer controls about 50 GiB of media content through a content management system and some of that data is copied to other locations for quick local access.

This process needs to be fast, reliable and robust. In this article, we’ll introduce you to rsync, one of the most useful tools when it comes to copying files.

 

 

Why rsync?

We use rsync to keep about 50.000 files totaling 150 GiB synchronized between European locations every night, create backups or set up new servers and it has been a tremendous help, both saving us time and giving us better results than other solutions. Some of the advantages are:

  • It’s available on all operating systems. Natively on Linux and MacOS, and through different implementations on Windows (DeltaCopy, Cygwin, Ubuntu subsystem).
  • It can preserve owners, groups and permissions.
  • It has built-in support for remote shells like SSH.
  • Delta transfers reduce traffic.
  • Traffic can be further reduced by using compression.
  • Creating detailed log files is very easy.
  • Daemon mode provides a kind of file share that clients can access to download exactly the data they need.
  • For manual transfers, –dry-run offers a way to double check that you are transferring the correct files and gives an estimate about how long the transfer will take.

 

Usage

Despite its versatility, rsync is not difficult to use. While there are graphical tools to make things easier, the main use of rsync is through command line scripts that allow you to automate regular file copying operations. Here are some parameters and usage examples you might find useful.

Parameters

-r, --recursive recurse into directories

-t, --times preserve modification times

This one is important. If you don’t use it, the modify time of the files will not be preserved and they will all be transferred again on the next run, even if the file contents are identical.

-l, --links copy symlinks as symlinks

-p, --perms preserve permissions

 

-o, --owner                 preserve owner (super-user only)
-g, --group                 preserve group
-D                          same as --devices --specials
--devices                   preserve device files (super-user only)
--specials                  preserve special files

-a, --archive archive mode; equals -rlptgoD (no -H,-A,-X)

Shortcut for the parameters mentioned above.

--delete delete extraneous files from dest dirs

Delete all files from the destination that are not present at the source. Make sure you got the directories right when using this.

 

--usermap=STRING            custom username mapping
--groupmap=STRING           custom groupname mapping
--chown=USER:GROUP          simple username/groupname mapping

User and group mapping for when user names or IDs don’t match on source and destination systems.

-z, --compress compress file data during the transfer

Useful when you have a lot of files that compress well (text files) and the bandwidth is low. By default, these formats will not be compressed:

7z ace avi bz2 deb gpg gz iso jpeg jpg lz lzma lzo mov mp3 mp4 ogg png rar rpm rzip tbz tgz tlz txz xz z zip

-n, --dry-run perform a trial run with no changes made

Very useful to check if the transfer you’re about to start actually does what you want, especially when you’re using –delete.

-v, --verbose increase verbosity

Rsync is silent by default. Increasing the verbosity shows files that are transferred.

-h, --human-readable output numbers in a human-readable format

Make files sizes easier to read.

--progress show progress during transfer

-i, --itemize-changes output a change-summary for all updates

Itemizing changes is useful if you want to figure out why the transfer is not going as planned. It lists the differences between source and destination files, most importantly file size and modification time.

Examples

The most basic command to copy files, preserving file ownership, permissions and special files:

rsync -avh SOURCE DESTINATION

Same as before, but also delete extraneous files at the destination:

rsync -avh --delete SOURCE DESTINATION

The same command via SSH:

rsync -avh --delete -e ssh SOURCE USER@IP:/target/directory

Simple incremental backup script for a Linux machine using rsync. It keeps the last seven snapshots of the system and uses hard links to store identical files only once.

#!/bin/bash

logfile=/var/log/backup.log
ddir=/backup

mkdir "$ddir"
echo "$(date) Starting backup..." >> $logfile
rm -rf $ddir/backup.7
mv $ddir/backup.6 $ddir/backup.7
mv $ddir/backup.5 $ddir/backup.6
mv $ddir/backup.4 $ddir/backup.5
mv $ddir/backup.3 $ddir/backup.4
mv $ddir/backup.2 $ddir/backup.3
cp -al $ddir/backup.1 $ddir/backup.2
rsync -aAXv --delete --exclude $ddir --exclude dev/* --exclude proc/* --exclude sys/* --exclude tmp/* --exclude media/* --exclude mnt/* / $ddir/backup.1
echo "$(date) Backup finished." >> $logfile; echo "" >> $logfile

 

Trailing slashes

A word of warning about trailing slashes on source directories. Rsync follows BSD conventions. That means

rsync -avh source /home/example

will create a new directory „source“ at the destination, i.e. „/home/example/source“.

rsync -avh source/ /home/example

will copy the files in „source“ to „/home/example“.

Getting this wrong, especially with –delete, can be quite devastating, so remember it well.

 

Wrapping up

We trust rsync to help us stay in control of our customers‘ data and we plan to use it for future installations, together with our custom-made CMS and data ontologies. Let us know if you plan an interactive installation that spreads over several locations: We are happy to help.