Rsync for dummies
Here are my favorite rsync command serpents:
To copy a directory (here from remote to current directory) over a slow link with restart from the position where the connection broke:
rsync -zvvvhPDKHEyrtlb --backup-dir="`date +%Y%m%d-%H%M%S`-`hostname -f`-$$" --append-verify --inplace --safe-links --bwlimit=200 HOST:/directory .
Do you think this is nuts? It is. My question is, why doesn't rsync efficiently and safely transfer files by default?
- Do a fast restart where everything broke
- Never kill anything at the destination
- Does not leave artifacts on a sudden power loss
- Why does it not support --sparse with --inplace?
To copy a directory (here from remote to current) over a fast link from or to a slow machine:
rsync -vvvhPDKHErtl --append --inplace --safe-links --timeout=10 --contimeout=10 rsync://HOST/directory .
The difference?
- No fuzzy
- As the machine is slow, no verify
- As the machine is slow, no compression
- As the machine is slow, no ssh instead a direct rsync connection
- As the machine is slow, I observed hangs(!WTF?) because of unpredictable errors on the slow machine.
Do you still think this is nuts? It is!
FYI the slow machine was an NSLU2 serving a directory via NTFS-3G. Things like "Value too large for datatype" or similar errors showed up. I really have no idea. The problem was that rsync hung afterwards. I have no problem with terminating (or crashing) but hanging and doing nothing is the worst thing which ever can happen. It's an absolute no-go. Therefor the timeouts added to (hopefully) tear down rsync on such hangs.
The problematic part is, that rsync still needs 70% of the CPU on the slow machine, which is too high for me. That I want is a faster data transfer which can resume where everything broke.
- TAR does not work as TAR cannot resume.
- rsync has too high CPU overhead.
- out of luck
Server RSYNC
See
www.heinlein-support.de/web/support/wissen/rsync-backup/ (it is in German language)
Summary
Options which take some resources to compute:
-c check if files are really identical.
-y fuzzy, try to find common parts (this is the rsync algorithm)
-z compression
Common options:
-vvv very very verbose
-h human readable output
-a archive: -rlptgoD
HAS -r recursive (why isn't that default?)
HAS -l copy symlinks as symlinks (why isn't that default?)
HAS -p preserve permissions (why isn't that default?)
HAS -t (Important!) Preserve modification times (how to backup this?)
HAS -g preserve group
HAS -o preserve user
-D --devices --specials (why isn't that the default?)
HAVE --devices (root only) preserve device files
HAVE --specials preserve special files (like Sockets and FIFOs)
-P --partial --progress (why isn't that the default?)
HAS --partial keep partially gransferred files
HAS --progress show progress indicator
--delay-updates Uses .~tmp~ in each directory as --partial-dir
--partial-dir=DIR better to give an absolute directory here.
-H preserve hard links (why isn't that the default?)
-E preserve executability (why isn't that the default?)
-S copy sparse files efficiently
-b make backups of files moved away
--backup-dir=DIR move the backupped files into the given backup directory
-R called "relative paths" but is the opposite:
Use the full source path in the destination directory.
--delete-delay delete files on the receiving side which are no more on the source.
This is the most efficient variant of several delete options.
--bwlimit=200 limit IO to 200 KiB/s (how to support those common links below 8 kBit/s?)
--timeout=60 abort if 60 seconds long no data is transferred (silent network outage)
Special options:
-x stay in a single filesystem
-A copy ACLs (how is that backupped?)
-X copy xattrs (how is that backupped?)
--link-dest=DIR the old DIR content will be used as source to links of unchanged files
This way full incremental backups can be created
--numeric-ids don't map usernames but copy user/group ids
--append-verify Append to files if the first part is identical.
Usually what you want but conflicts with --partial
Dangerous options:
--inplace Use with care, update files inplace, does only work if files are left alone by other processes.
Incompatible with -S
-K --keep-dirlinks - local symlinks to directories are honored
This is bad on backups, as it then follows the directory eventually.
--save-links Should not be used together with -R, but why? What does the documentation mean:
"Ingores symlinks which point outside the copied tree. All absolute symlinks are also ignored."
Some quite common tasks follow:
Variant 1: Backup remote
- Copy remote directory to local directory
- identical: locally delete files which do not exist on the remote side
- incremental: do not transfer the already partially transferred part again after a connection or power loss, for example if you have to transfer a 15 TiB file with the connection breaking after 10 GiB or so because the ISP recycles your DSL each 24 hours.
- backup: Keep backups (outside the target directory) of deleted or overwritten files (on the local side)
- no-debris: In case something suddenly breaks, do not leave partially transferred files in the destination directory
Sorry, have not tested yet.
Variant 2: Merge remote
- Copy remote directory to local directory
- fill: do not locally delete files which do no more exist on the remote side
- incremental: do not transfer the already partially transferred part again after a connection or power loss, for example if you have to transfer a 15 TiB file with the connection breaking after 10 GiB or so because the ISP recycles your DSL each 24 hours.
- backup: Keep backups (outside the target directory) of deleted or overwritten files (on the local side)
- no-debris: In case something suddenly breaks, do not leave partially transferred files in the destination directory
Sorry, have not tested yet.
Variant 3: Clone remote
This is Variant 1 without "incremental" and "no-debris":
- Copy remote directory to local directory
- identical: locally delete files which do not exist on the remote side
- backup: Keep backups (outside the target directory) of deleted or overwritten files (on the local side)
First: run without -c
Then: run with -c
bak="$PWD/backup-`date %Y%m%d-%H%M%S`"
mkdir "$bak" remote
rsync -c -yzvvvh -aDHES -P --append-verify --partial-dir="$PWD/tmp" -b --backup="$bak" -R --delete-delay --bwlimit=200 --timeout=60 remote:/ remote/