15

I have a 500 GB file that I plan on backing up remotely. The file changes often. I'll be rsyncing it from a desktop to a server. Both can run rsync client or server.

What is the proper command for this? The ones I've tried sofar has been taking forever or simply acted strange.

Example and results:

rsync -cv --partial --inplace --no-whole-file /desktop/file1 myserver.com::module/file1 

Seems to work, but only if I do it twice (?!). Also, slow.

Does the above command do the checksumming on both computers, or only on the sending one? Is it correct otherwise?

wonea
  • 1,847

3 Answers3

15

It's never going to be fast, because rsync is going to have to read/checksum the entire file, and reading 500GB is going to take a long time, unless you've got it stored on SSDs or something.

Try rsync -vhz --partial --inplace <file/server stuff>.

-c means that it checksums the entire file BEFORE doing any transfers, rather than using the timestamp to see if it's changed, which means reading the whole file twice. If the timestamp isn't getting changed (it should), then you could just touch the file before running rsync.

If this isn't scripted, you can add --progress so you can see how it's doing as it runs.

Dentrasi
  • 11,205
  • Yeah, I know 'large file == long handling'. But I feel I am missing something here, see the comment above. If Dropbox can do it so can we! =)

    I didn't say, but i also tried without -c, still slow.

    – Johan Allgoth Jun 16 '10 at 12:14
  • 3
    also --inplace implies --partial –  May 21 '12 at 09:47
5

Though it's not rsync, depending on what you're trying to do this may work better. I was doing a similar backup task and it was definitely faster.

Use netcat to make a tar pipe from one machine to the other.

On your source machine:

tar -cpv --atime-preserve=system . | nc -q 10 -l -p 45454

You're creating a tarball preserves permissions and time, then piping it into netcat on port 45454

On your backup machine

nc -w 10 X.X.X.X 45454 | tar -xpv

X.X.X.X = local ip address of your source machine.

For me, this worked well. It ran at 25-30 MB/s over wired LAN as opposed to 2-3 MB/s with rsync. The disadvantage is: it doesn't sync, it just makes a copy of what's on your source. For a backup like you're describing though - one 500GB file - it could work very well.

You may have to do this as root in order to avoid permissions problems, or you may get lucky.

FWIW, I initially learned about this here: http://www.screenage.de/blog/2007/12/30/using-netcat-and-tar-for-network-file-transfer/

  • 1
    tar is better than rsync when you have a lot of small files to transfer. Using nc also improves the transfer rate when on a fast connection, because you don't have the overhead of SSH-encryption (which I don't need on a peer-to-peer connection) – jornane Jan 28 '16 at 10:31
1

To avoid the network overhead just use the rsync protocol and not SSH. By default, rsync uses SSH when specifying a URL like hostname:/path. Use rsync://hostname/path instead to use the faster rsync protocol. No tricks with tar/netcat are necessary this way. The rsync delta algorithm should be much faster.

See also https://gergap.wordpress.com/tag/rsync/ for more information.

gergap
  • 11
  • Tar/nc tricks are useful when you have many small files, because rsync/scp will copy each file and metadata in separate network packets, using tar it will be like one big file with always big network packets utilizing full network bandwidth. – Marki555 Apr 21 '21 at 20:33