Home Blog CV Projects Patterns Notes Book Colophon Search

Rsync Basics

2 Jun, 2007

The rsync man page describes rsync as a "faster, flexible replacement for rcp". Rsync is used for copying files from between remote hosts and local filesystems although it can also be used to copy files from locations on the same filesystem.

Here is a quick rsync test where we setup a directory root with two files test1 and test2 and rsync them to another directory backup with the command rsync -av root/ backup/. Here's the shell output:

james@bose:~$ mkdir root
james@bose:~$ cd root/
james@bose:~/root$ echo "Test1" > test1
james@bose:~/root$ echo "Test2" > test2
james@bose:~/root$ ls -li
total 8
33132 -rw-r--r-- 1 james james 6 2007-06-02 18:03 test1
33182 -rw-r--r-- 1 james james 6 2007-06-02 18:03 test2
james@bose:~/root$ cd ..
james@bose:~$ mkdir backup
james@bose:~$ rsync -a -v root/ backup/
building file list ... done
./
test1
test2

sent 209 bytes  received 70 bytes  558.00 bytes/sec
total size is 12  speedup is 0.04
james@bose:~$ ls -li backup/
total 8
33184 -rw-r--r-- 1 james james 6 2007-06-02 18:03 test1
33187 -rw-r--r-- 1 james james 6 2007-06-02 18:03 test2
james@bose:~$

The -a option copies files in archive mode and the -v option stands for verbose and outputs information while rsync is working. You can also use -vv and -vvv to get progressively more verbose messages.

This works nicely but it is much more interesting to copy files from a remote server. Using the -e option we can specify the remote shell to use, in this case ssh. You can also specify the remote shell by setting the RSYNC_RSH environment variable instead of specifying a value with -e.

rsync -a -v -e ssh user@3aims.com: web10

This copies user's home directory to a directory called web10 on the local machine. Bear in mind rsync will only be able to copy files which the user you specified has permissions to.

To specify a different remote directory you can put a path after the : character, for example:

rsync -a -v -e ssh user@3aims.com:/home/user web10

You might want to customise how the remote shell behaves, for example this might be a better option for the backups:

rsync -a -v -e "ssh -c arcfour -o Compression=no -x" user@3aims.com:/home/user web10

Here is what the options to -e mean:

ssh - use ssh instead of the default of rsh
-c arcfour - uses the weakest but fastest encryption that ssh supports
-o Compression=no - Turns off ssh's compression - rsync has its own if you want it which we'll discuss in a minute
-x - turns off ssh's X tunneling feature (if you actually have it on by default)

If bandwidth is a problem you might want to use the -z option to have rsync compress data it sends across the network. If you are using rsync compression it makes sense not to use ssh's compression in the way demonstrated above. Here's the command using rsync compression:

rsync -a -v -z -e "ssh -c arcfour -o Compression=no -x" user@3aims.com:/home/user web10

If you want to test these how effective each of these commands are you will need to delete the web10 directory rsync creates otherwise rsync will only copy files which have changed. Whilst that's normally what you want, it isn't too useful for tests.

Finally, since rsync is very efficient it can saturate a network connection. If you still want to be able to use your network connection whilst rsync is running you can use the --bwlimit option which allows you to specify a maximum transfer rate in kilobytes per second. Due to the nature of rsync transfers, blocks of data are sent, then if rsync determines the transfer was too fast, it will wait before sending the next data block. The result is an average transfer rate equaling the specified limit. For example to limit rsync to using 100KB/sec you could do this:

rsync -a -v -z --bwlimit=100 -e "ssh -c arcfour -o Compression=no -x" user@3aims.com:/home/user web10

You might also want to use --progress so that rsync prints out a %completion and transfer speed while transferring large files (but this isn't worth adding if you are running from a cron job). If you are performing a backup which you think you might want to restore at some point in the future you should use --numeric-ids. This tells rsync to not attempt to translate UID <> userid or GID <> groupid which is very important for avoiding permission problems when restoring. You might also want the -H option which forces rsync to maintain hardlinks on the server and the -x option which causes rsync to only copy files from one filesystem and not any other files which might be mounted as part of that directory structure. You can also use the --delete option which deletes files from the backup if they don't exist on the server. If you use --delete the files are deleted before the copying starts.

Putting this all together the command I use to backup one of my servers looks something like this:

rsync -aHxvz --delete --progress --numeric-ids -e "ssh -c arcfour -o Compression=no -x" root@example.com:/ pauli/

Bear in mind rsync is not much good at backing up databases such as MySQL because they frequently store information information in memory so although you may have a copy of the database files, when you restore them you might find the information they contain is corrupt.

For further reading about rsync have a look at the rsync man page or Kevin Korb's rsync article.

Comments

James Gardner » Incremental Backups using Rsync

Posted: 2007-06-02 23:43

[...] you do this simply by copying all the files each time in a similar way to the one described in my my last rsync article you will quickly run out of disk space on the backup [...] :URL: http://jimmyg.org/2007/06/02/incremental-backups-using-rsync/

skioyyjn

Posted: 2007-12-30 21:47

<strong>skioyyjn...</strong>

skioyyjn... :URL: http://www.google.com/search?q=ykqvokjb

Rsync Basics

Comments

James Gardner &raquo; Incremental Backups using Rsync

skioyyjn

James Gardner » Incremental Backups using Rsync