We provide service for the EU parliament that requires us to keep data all across Europe synchronized. The customer controls about 50 GiB of media content through a content management system and some of that data is copied to other locations for quick local access.
This process needs to be fast, reliable and robust. In this article, we’ll introduce you to rsync, one of the most useful tools when it comes to copying files.
We use rsync to keep about 50.000 files totaling 150 GiB synchronized between European locations every night, create backups or set up new servers and it has been a tremendous help, both saving us time and giving us better results than other solutions. Some of the advantages are:
- It’s available on all operating systems. Natively on Linux and MacOS, and through different implementations on Windows (DeltaCopy, Cygwin, Ubuntu subsystem).
- It can preserve owners, groups and permissions.
- It has built-in support for remote shells like SSH.
- Delta transfers reduce traffic.
- Traffic can be further reduced by using compression.
- Creating detailed log files is very easy.
- Daemon mode provides a kind of file share that clients can access to download exactly the data they need.
- For manual transfers, –dry-run offers a way to double check that you are transferring the correct files and gives an estimate about how long the transfer will take.
Despite its versatility, rsync is not difficult to use. While there are graphical tools to make things easier, the main use of rsync is through command line scripts that allow you to automate regular file copying operations. Here are some parameters and usage examples you might find useful.
-r, --recursive recurse into directories
-t, --times preserve modification times
This one is important. If you don’t use it, the modify time of the files will not be preserved and they will all be transferred again on the next run, even if the file contents are identical.
-l, --links copy symlinks as symlinks
-p, --perms preserve permissions
-o, --owner preserve owner (super-user only) -g, --group preserve group
-D same as --devices --specials --devices preserve device files (super-user only) --specials preserve special files
-a, --archive archive mode; equals -rlptgoD (no -H,-A,-X)
Shortcut for the parameters mentioned above.
--delete delete extraneous files from dest dirs
Delete all files from the destination that are not present at the source. Make sure you got the directories right when using this.
--usermap=STRING custom username mapping --groupmap=STRING custom groupname mapping --chown=USER:GROUP simple username/groupname mapping
User and group mapping for when user names or IDs don’t match on source and destination systems.
-z, --compress compress file data during the transfer
Useful when you have a lot of files that compress well (text files) and the bandwidth is low. By default, these formats will not be compressed:
7z ace avi bz2 deb gpg gz iso jpeg jpg lz lzma lzo mov mp3 mp4 ogg png rar rpm rzip tbz tgz tlz txz xz z zip
-n, --dry-run perform a trial run with no changes made
Very useful to check if the transfer you’re about to start actually does what you want, especially when you’re using –delete.
-v, --verbose increase verbosity
Rsync is silent by default. Increasing the verbosity shows files that are transferred.
-h, --human-readable output numbers in a human-readable format
Make files sizes easier to read.
--progress show progress during transfer
-i, --itemize-changes output a change-summary for all updates
Itemizing changes is useful if you want to figure out why the transfer is not going as planned. It lists the differences between source and destination files, most importantly file size and modification time.
The most basic command to copy files, preserving file ownership, permissions and special files:
rsync -avh SOURCE DESTINATION
Same as before, but also delete extraneous files at the destination:
rsync -avh --delete SOURCE DESTINATION
The same command via SSH:
rsync -avh --delete -e ssh SOURCE USER@IP:/target/directory
Simple incremental backup script for a Linux machine using rsync. It keeps the last seven snapshots of the system and uses hard links to store identical files only once.
#!/bin/bash logfile=/var/log/backup.log ddir=/backup mkdir "$ddir" echo "$(date) Starting backup..." >> $logfile rm -rf $ddir/backup.7 mv $ddir/backup.6 $ddir/backup.7 mv $ddir/backup.5 $ddir/backup.6 mv $ddir/backup.4 $ddir/backup.5 mv $ddir/backup.3 $ddir/backup.4 mv $ddir/backup.2 $ddir/backup.3 cp -al $ddir/backup.1 $ddir/backup.2 rsync -aAXv --delete --exclude $ddir --exclude dev/* --exclude proc/* --exclude sys/* --exclude tmp/* --exclude media/* --exclude mnt/* / $ddir/backup.1 echo "$(date) Backup finished." >> $logfile; echo "" >> $logfile
A word of warning about trailing slashes on source directories. Rsync follows BSD conventions. That means
rsync -avh source /home/example
will create a new directory „source“ at the destination, i.e. „/home/example/source“.
rsync -avh source/ /home/example
will copy the files in „source“ to „/home/example“.
Getting this wrong, especially with –delete, can be quite devastating, so remember it well.
We trust rsync to help us stay in control of our customers‘ data and we plan to use it for future installations, together with our custom-made CMS and data ontologies. Let us know if you plan an interactive installation that spreads over several locations: We are happy to help.