Rsync for dummies

Here are my favorite rsync command serpents:

To copy a directory (here from remote to current directory) over a slow link with restart from the position where the connection broke:
rsync -zvvvhPDKHEyrtlb --backup-dir="`date +%Y%m%d-%H%M%S`-`hostname -f`-$$" --append-verify --inplace --safe-links --bwlimit=200 HOST:/directory .
Do you think this is nuts? It is. My question is, why doesn't rsync efficiently and safely transfer files by default?
  • Do a fast restart where everything broke
  • Never kill anything at the destination
  • Does not leave artifacts on a sudden power loss
  • Why does it not support --sparse with --inplace?
To copy a directory (here from remote to current) over a fast link from or to a slow machine:
rsync -vvvhPDKHErtl --append --inplace --safe-links --timeout=10 --contimeout=10 rsync://HOST/directory .
The difference?
  • No fuzzy
  • As the machine is slow, no verify
  • As the machine is slow, no compression
  • As the machine is slow, no ssh instead a direct rsync connection
  • As the machine is slow, I observed hangs(!WTF?) because of unpredictable errors on the slow machine.
Do you still think this is nuts? It is!

FYI the slow machine was an NSLU2 serving a directory via NTFS-3G. Things like "Value too large for datatype" or similar errors showed up. I really have no idea. The problem was that rsync hung afterwards. I have no problem with terminating (or crashing) but hanging and doing nothing is the worst thing which ever can happen. It's an absolute no-go. Therefor the timeouts added to (hopefully) tear down rsync on such hangs.

The problematic part is, that rsync still needs 70% of the CPU on the slow machine, which is too high for me. That I want is a faster data transfer which can resume where everything broke.

  • TAR does not work as TAR cannot resume.
  • rsync has too high CPU overhead.
  • out of luck

Server RSYNC

See www.heinlein-support.de/web/support/wissen/rsync-backup/ (it is in German language)

Summary

Options which take some resources to compute:
-c               check if files are really identical.
-y               fuzzy, try to find common parts (this is the rsync algorithm)
-z               compression

Common options:
-vvv             very very verbose
-h               human readable output

-a               archive: -rlptgoD
HAS -r           recursive (why isn't that default?)
HAS -l           copy symlinks as symlinks (why isn't that default?)
HAS -p           preserve permissions (why isn't that default?)
HAS -t           (Important!) Preserve modification times (how to backup this?)
HAS -g           preserve group
HAS -o           preserve user

-D               --devices --specials (why isn't that the default?)
HAVE --devices   (root only) preserve device files
HAVE --specials  preserve special files (like Sockets and FIFOs)

-P               --partial --progress (why isn't that the default?)
HAS --partial    keep partially gransferred files
HAS --progress   show progress indicator
--delay-updates  Uses .~tmp~ in each directory as --partial-dir
--partial-dir=DIR  better to give an absolute directory here.

-H               preserve hard links (why isn't that the default?)
-E               preserve executability (why isn't that the default?)
-S               copy sparse files efficiently

-b               make backups of files moved away
--backup-dir=DIR  move the backupped files into the given backup directory
-R               called "relative paths" but is the opposite:
                 Use the full source path in the destination directory.
--delete-delay   delete files on the receiving side which are no more on the source.
                 This is the most efficient variant of several delete options.

--bwlimit=200    limit IO to 200 KiB/s (how to support those common links below 8 kBit/s?)
--timeout=60     abort if 60 seconds long no data is transferred (silent network outage)

Special options:
-x               stay in a single filesystem
-A               copy ACLs (how is that backupped?)
-X               copy xattrs (how is that backupped?)
--link-dest=DIR  the old DIR content will be used as source to links of unchanged files
                 This way full incremental backups can be created
--numeric-ids    don't map usernames but copy user/group ids
--append-verify  Append to files if the first part is identical.
                 Usually what you want but conflicts with --partial

Dangerous options:
--inplace        Use with care, update files inplace, does only work if files are left alone by other processes.
                 Incompatible with -S
-K               --keep-dirlinks - local symlinks to directories are honored
                 This is bad on backups, as it then follows the directory eventually.
--save-links     Should not be used together with -R, but why?  What does the documentation mean:
                 "Ingores symlinks which point outside the copied tree.  All absolute symlinks are also ignored."

Some quite common tasks follow:

Variant 1: Backup remote

  • Copy remote directory to local directory
  • identical: locally delete files which do not exist on the remote side
  • incremental: do not transfer the already partially transferred part again after a connection or power loss, for example if you have to transfer a 15 TiB file with the connection breaking after 10 GiB or so because the ISP recycles your DSL each 24 hours.
  • backup: Keep backups (outside the target directory) of deleted or overwritten files (on the local side)
  • no-debris: In case something suddenly breaks, do not leave partially transferred files in the destination directory
Sorry, have not tested yet.

Variant 2: Merge remote

  • Copy remote directory to local directory
  • fill: do not locally delete files which do no more exist on the remote side
  • incremental: do not transfer the already partially transferred part again after a connection or power loss, for example if you have to transfer a 15 TiB file with the connection breaking after 10 GiB or so because the ISP recycles your DSL each 24 hours.
  • backup: Keep backups (outside the target directory) of deleted or overwritten files (on the local side)
  • no-debris: In case something suddenly breaks, do not leave partially transferred files in the destination directory
Sorry, have not tested yet.

Variant 3: Clone remote

This is Variant 1 without "incremental" and "no-debris":

  • Copy remote directory to local directory
  • identical: locally delete files which do not exist on the remote side
  • backup: Keep backups (outside the target directory) of deleted or overwritten files (on the local side)
First: run without -c

Then: run with -c

bak="$PWD/backup-`date %Y%m%d-%H%M%S`"
mkdir "$bak" remote
rsync -c -yzvvvh -aDHES -P --append-verify --partial-dir="$PWD/tmp" -b --backup="$bak" -R --delete-delay --bwlimit=200 --timeout=60 remote:/ remote/