159

I would like to copy the entire file system hierarchy from one drive to another..i.e contents of each directory as well as regular files in Linux platform. Would be gratefull to know the best way to do that with possibly Linuxes in-built functions. The file system is a ext family.

Chenmunka
  • 3,228
Juggler
  • 1,691
  • 3
  • 11
  • 4
  • 3
    Umm... where is the love for dd? dd if=/dev/sda1 of=/dev/sdb1 bs=4096 – Joseph Mar 25 '17 at 05:08
  • @juniorRubyist +1 for dd. I always use that. But which flags to use? I use conv=notrunc,noerror,sync. – BeniBela Aug 30 '17 at 12:10
  • 5
    -1 for dd for 2 reasons: firstly it's a bad idea to perform a block-level copy of a filesystem that is mounted (which is the case for /) and secondly dd won't copy data from sources mounted within the filesystem like /boot and /home. – Eric Aug 16 '19 at 10:37
  • 6
    -1 for dd also because whatever fragmentation has occurred on the source is copied as well; and a different sized destination isn't automatically handled – duanev Oct 30 '19 at 17:53
  • I'm glad you mentioned that, @duanev . I was wondering whether this was happen when it's claimed that Linux file systems do not fragment. – Sridhar Sarnobat Mar 31 '22 at 00:34
  • Also, dd may copy damaged data without you knowing, spreading it. rsync will at least tell you there is a problem (hmmmm, that's a disadvantage when backing up, but an advantage when restoring). – Sridhar Sarnobat Mar 31 '22 at 00:35
  • To Eric's point, it's just as bad an idea to cp/rsync/otherwise copy files from an in-use filesystem. If you need to make a copy of an entire system you should take it offline (i.e. boot from USB stick), at which point you can either make a block copy (dd, unmounted) or copy files (mounted). Copying files will be much faster, but a block copy will ensure identicality. – DimeCadmium Mar 12 '23 at 07:50
  • To duanev's point, doing a file-level copy could make fragmentation worse. Files which are used together are likely to installed together, and thus likely to be grouped together on disk. Doing a file-level copy will usually undo this. Fragmentation is rarely an issue to begin with; Linux is pretty smart about where to put files. Plus with SSDs the impact of even total fragmentation is negligible. – DimeCadmium Mar 12 '23 at 07:53
  • rsync can't tell you there's a problem any more than dd can. Neither of them knows what's supposed to be in those blocks/files. Your FS might be able to tell you (depending on the FS and whether the corruption occurred in metadata or data), but that will be just as true after you've copied it over. Your disk also might be able to tell you (depending on the disk and the type of corruption), but that will be just as true for dd as for rsync. The go-to tool for recovering data from failed disks is ddrescue. Which does a block-for-block copy. Just like dd. – DimeCadmium Mar 12 '23 at 07:57

12 Answers12

303

What you want is rsync.

This command can be used to synchronize a folder, and also resume copying when it's aborted half way. The command to copy one disk is:

rsync -avxHAX --progress / /new-disk/

The options are:

-a  : all files, with permissions, etc..
-v  : verbose, mention files
-x  : stay on one file system
-H  : preserve hard links (not included with -a)
-A  : preserve ACLs/permissions (not included with -a)
-X  : preserve extended attributes (not included with -a)

To improve the copy speed, add -W (--whole-file), to avoid calculating deltas/diffs of the files. This is the default when both the source and destination are specified as local paths, since the real benefit of rsync's delta-transfer algorithm is reducing network usage.

Also consider adding --numeric-ids to avoid mapping uid/gid values by user/group name.

Raman
  • 383
156

Michael Aaron Safyan's answer doesn't account for sparse files. -S option fixes that.

Also this variant doesn't spam with the each file progressing and doesn't do delta syncing which kills performance in non-network cases.

Perfect for copying filesystem from one local drive to another local drive.

rsync -axHAWXS --numeric-ids --info=progress2
  • 1
    Amazing. This is really doing a good job – Gildas Jul 26 '18 at 08:13
  • 3
    This should be the accepted answer, works great. Example 55,431,669,792 57% 97.47MB/s 0:06:56 xfr#2888, ir-chk=5593/8534) – Drew Aug 04 '18 at 03:11
  • 1
    <3 this is perfect – Tim Strijdhorst Jan 21 '19 at 15:10
  • also, if you want the exact copy of the target disk on the source disk, removing all the additional files in the target disk, that are not present on the source disk, you can add --delete – Oleg Abrazhaev Jun 16 '23 at 13:32
  • 1
    WATCH OUT! What is not "perfect" is that this answer doesn't tell you that if you have two internal drives, subsequent reboots of a live install USB may switch "sda" and "sdb", and so this answer becomes, instead of a way to restore correct privileges from a back up, a way to instead overwrite your backup with the completely broken privileges of your 1st rysnc attempt, destroying the backup and leaving you with no good copy of the server you just spent 3 weeks setting up, like just happened to me. Then you get to spend hours reading all the answers on SE that tell you there's no way to fix it. – John Smith Nov 17 '23 at 09:56
  • 1
    Great! But don't forget to put a trailing slash on both directories – nealmcb Dec 07 '23 at 19:39
43

I often use

> cp -ax / /mnt

Presuming /mnt is the new disk mounted on /mnt and there are no other mounts on /.

the -x keeps it on the one filesystem.

This of course needs to be done as root or using sudo.

This link has some alternatives, including the one above

http://linuxdocs.org/HOWTOs/mini/Hard-Disk-Upgrade/copy.html

WolfmanJM
  • 940
  • while this is dead old answer it is still worth noting that you usually do NOT want to copy all the stuff present in /, excluding i.e. /dev, /sys, /proc etc. Therefore before issuing that cp I suggest searching for better approaches (also using rsync) – Marcin Orlowski Jun 13 '19 at 08:17
  • 11
    @MarcinOrlowski WolfJM's use of the -x flag means that the synthetic filesystems you mention will not be copied. – Jim L. Jul 31 '19 at 21:37
10

Like Michael Safyan suggests above, I've used rsync for this purpose. I suggest using some additional options to exclude directories that you probably don't want to copy.

This version is fairly specific to Gnome- and Debian/Ubuntu-based systems, since it includes subdirectories of users' home directories which are specific to Gnome, as well as the APT package cache.

The last line will exclude any directory named cache/Cache/.cache, which may be too aggressive for some uses:

rsync -WavxHAX --delete-excluded --progress \
  /mnt/from/ /mnt/to/
  --exclude='/home/*/.gvfs' \
  --exclude='/home/*/.local/share/Trash' \
  --exclude='/var/run/*' \
  --exclude='/var/lock/*' \
  --exclude='/lib/modules/*/volatile/.mounted' \
  --exclude='/var/cache/apt/archives/*' \
  --exclude='/home/*/.mozilla/firefox/*/Cache' \
  --exclude='/home/*/.cache/chromium'
  --exclude='home/*/.thumbnails' \
  --exclude=.cache --exclude Cache --exclude cache
Dan
  • 424
7

As mentioned in the comments by juniorRubyist, the preferred approach here should be to use dd. The main reason is performance, it's a block-by-block copy instead of file-by-file.

Cloning a partition

# dd if=/dev/sda1 of=/dev/sdb1 bs=64K conv=noerror,sync status=progress

Cloning an entire disk

# dd if=/dev/sdX of=/dev/sdY bs=64K conv=noerror,sync status=progress

Clone a mounted writable partition

The key for cloning a partition that is mounted read-writable is to remount it as read-only. Then do the cloning and finally remount it again to read-writable.

# mount -o remount,ro /path/to/mount_point
# dd if=/dev/sda1 of=/dev/sdb1 bs=64K conv=noerror,sync status=progress
# mount -o remount,rw /path/to/mount_point

Note. Doing this may have some side-effects on running applications. E.g. if your system have applications that requires writing to this particular partition exactly when you need to clone it, these applications would need to be stopped while partition is cloned. Or if its your own application re-write it to handle this scenario.

Clone a disk with one or more mounted writable partitions

The strategy and side-effects are the same as for Clone a mounted writable partition except this time the remount commands are repeated for each writable mount point.

# mount -o remount,ro /path/to/writeable_mount_point1
# mount -o remount,ro /path/to/writeable_mount_point..
# mount -o remount,ro /path/to/writeable_mount_pointN
# dd if=/dev/sdX of=/dev/sdY bs=64K conv=noerror,sync status=progress
# mount -o remount,rw /path/to/writeable_mount_point1
# mount -o remount,rw /path/to/writeable_mount_point..
# mount -o remount,rw /path/to/writeable_mount_pointN

Final notes

The preferred and recommended way of doing disk/partition cloning is to do so on a non-mounted system since that will not have any non-deterministic side effects. The same goes for systems built on the concept of read-only mounts.

References

  1. https://wiki.archlinux.org/index.php/disk_cloning
  • 2
    dd is a very bad idea for 2 reasons: firstly performing a block-level copy of a filesystem that is mounted (which is the case for /) will more than likely result in target filesystem errors, and secondly dd won't copy data from sources mounted within the filesystem like /boot and /home. Your link is valid for disk cloning, not "file hierarchy" cloning – Eric Aug 16 '19 at 10:42
  • @Eric I may have made a few assumptions here. Firstly I assumed "copy the entire file system hierarchy from one drive to another." to be the equivalence of "how do I clone my disk", since "file hierarchy cloning" only becomes useful when cloning a subset of the disk. Secondly I assumed this was a disk used for storage not for the system you are currently running. However, simply mounting the system does not imply filesystem errors. It depends on how its mounted, read-only mounts are perfectly fine to clone. – Rikard Söderström Dec 16 '20 at 13:42
  • whatever float your boat – Eric Dec 17 '20 at 14:22
6

For a one shot local copy from one drive to another, I guess cp suffices as described by Wolfmann here above.

For bigger works like local or remote backups for instance, the best is rsync.

Of course, rsync is significantly more complex to use.

Why rsync :

  • this allows you to copy (synchronized copy) all or part of your drive A to drive B, with many options, like excluding some directories from the copy (for instance excluding /proc).

  • Another big advantage is that this native tool monitors the file transfer: eg for massive transfers, if the connection is interrupted, it will continue from the breakpoint.

  • And last but not least, rsync uses ssh connection, so this allow you to achive remote synchronized secured "copies". Have a look to the man page as well as here for some examples.

hornetbzz
  • 211
4

rsync

"This approach is considered to be better than disk cloning with dd since it allows for a different size, partition table and filesystem to be used, and better than copying with cp -a as well, because it allows greater control over file permissions, attributes, Access Control Lists (ACLs) and extended attributes."

From:

https://wiki.archlinux.org/index.php/Full_system_backup_with_rsync

Man Page Here

2

I tried the rsync commands proposed here but eventually I got much cleaner and faster results with partclone. Unmount source and target partitions and then run the following:

partclone.ext4 -b -s /dev/sd(source) -o /dev/sd(target)
e2fsck -f /dev/sd(target)
resize2fs /dev/sd(target)

This performs the following steps:

  1. Clone (only the used parts of) the partition
  2. make sure the file system is o.k. (resize2fs enforces this step)
  3. resize the partition to the new file system

The above works in case the target partition is the same size or larger than the source. If your target is smaller than the source (but fits all data) then do the following:

e2fsck -f /dev/sd(source)
resize2fs -M /dev/sd(source)
partclone.ext4 -b -s /dev/sd(source) -o /dev/sd(target)
resize2fs /dev/sd(target)

resize2fs -M shrinks the filesystem to the minimum size before cloning the data.

Note that partclone is not installed by default on most systems. Use a live distro like clonezilla or install partclone from your distros packet manager (apt-get install partclone on debian-based sytsems).

Edit: thanks @jbroome for pointing out an error in the second code block

Thawn
  • 296
  • 1
    Hi from 2024. Did you mean for the first two lines in the "target is smaller" code block to be run against the SOURCE, not the target? – jbroome Mar 07 '24 at 02:56
  • @jbroome yes you are correct. Thanks for spotting this :+1: – Thawn Mar 12 '24 at 14:06
2

Adding two useful bits to the thread re rsync: changing cypher, and using --update:

As per Wolfman's post, cp -ax is elegant, and cool for local stuff.

However, rsync is awesome also. Further to Michael's answer re -W, changing the cypher can also speed things up (read up on any security implications though).

rsync --progress --rsh="ssh -c blowfish" / /mnt/dest -auvx

There is some discussion (and benchmarks) around the place about a slow CPU being the actual bottleneck, but it does seem to help me when machine is loaded up doing other concurrent things.

One of the other big reasons for using rsync in a large, recursive copy like this is because of the -u switch (or --update). If there is a problem during the copy, you can fix it up, and rsync will pick up where it left off (I don't think scp has this). Doing it locally, cp also has a -u switch.

(I'm not certain what the implications of --update and --whole-file together are, but they always seem to work sensibly for me in this type of task)

I realise this isn't a thread about rsync's features, but some of the most common I use for this are:

  • --delete-after etc (as Michael mentioned in follow-up), if you want to sync the new system back to the original place or something like that. And,
  • --exclude - for skipping directories/files, for instances like copying/creating a new system to a new place whilst skipping user home directories etc (either you are mounting homes from somewhere else, or creating new users etc).

Incidentally, if I ever have to use windows, I use rsync from cygwin to do large recursive copies, because of explorer's slightly brain-dead wanting to start from the beginning (although I find Finder is OS X even worse)

cdeszaq
  • 113
1

'dd' is awesome, but ddrescue (apt install gddrescue) is even better. If dd gets interrupted, there is no way to restart (another good reason to use rsync). When you use ddrescue with a logfile, it keeps track of which blocks have been copied.

When backing up a dual boot Windows/Linux system, I use ntfsclone for the Windows partitions and ddrescue for the Linux partition and dd for the MBR. (I haven't tried to back up a dual boot system using GPT/UEFI.)

What I'd love to see is a ddrescue tool that can create files like ntfsclone where unallocated space is marked with control characters. This makes the image not directly mountable, but allows it to be only as big as the contained data.

Someone please come up with the ntfsclone "special image format" for ddrescue...

ECJB
  • 21
0

Before you start copying large amounts of data, you may wish to remount your new drive with some more efficient parameters.

For example, I just remounted my new SSD like this so that the rsync copying my /home to the new drive would go faster:

mount -oremount,defaults,noiversion,auto_da_alloc,noatime,errors=remount-ro,inode_readahead_blks=32,discard /dev/sdc1 /run/media/turgut/ssd1t/

Note that this prevents the update of access times, so if you need to re-run the rsync, it might attempt to copy ALL files again. To prevent that, add --ignore-times and --checksum options to your rsync line, so that it only copies the files that have been changed since your first attempt.

0

rsync is the perfect solution as explained above.

I'd just add -S to "handle sparse files efficiently" in case there is a docker devicemapper volume or similar to be copied.