5

I've been tasked with migrating a live SharePoint document library to another document library. As we want to keep downtime to a minimum, I've synchronised the two document libraries to my PC using OneDrive and I'm using robocopy to mirror between the two local copies. Pretty routine stuff for network shares.

However, I noticed that when I re-ran the mirror command, robocopy copied a lot of files again. I could expect a handful of changes but not thousands. Also, robocopy was flagging the files as "changed" which IMO is a very unusual copy reason for robocopy - this means the timestamp is unchanged but the size of the file has changed...

Further diagnosis revealed that it was only Office documents, e.g. pptx than were being copied again. Other file types like PDF and graphic files copied once and not again.

I finally tracked it down to this observation:

  1. Copy a file to OneDrive using (say) cmd.exe copy
  2. Look at file size immediately in OneDrive and it matches the original size
  3. Wait until OneDrive synchronises it and check file size again

The size has changed. Here is a PowerShell script that copies a file, gets the size immediately, waits 15 seconds (for sync) and get the size again.

$SourceFile = "S:\Temp\Helios\Library\Example.pptx"
$TargetFile = "C:\Users\rob.nicholson\Helios Medical Communications\Library - Documents (unused)\Example.pptx"
Copy-Item $SourceFile $TargetFile
$Length1 = (Get-Item $SourceFile).Length
$Length2 = (Get-Item $TargetFile).Length
Start-Sleep 15
$Length3 = (Get-Item $TargetFile).Length
Write-Host "Orignal size:    $Length1"
Write-Host "After copy size: $Length2"
Write-Host "After sync size: $Length3"

This is the output:

Source size:     1996810
After copy size: 1996810
After sync size: 1997141

Can anyone explain why the file size is changing? Another observation is that OneDrive will say "Uploading" and then it immediately says "Downloading" - which is when the file size changes.

Makes using sync tools rather difficult with OneDrive. Needless to say, Google Drive or Dropbox doesn't have the same issue.

One final note: OneDrive "On-demand" is enabled.

  • This may be the culprit. But beyond this, I've never heard of that. https://www.myce.com/news/microsoft-onedrive-for-business-modifies-files-as-it-syncs-71168/ – Dylan Aug 03 '18 at 17:55
  • Thanks for the reference - I did try a few Google searches before posting. Exactly the problem encapsulated and explained in more detail. OneDrive is modifying the contents of some files. Not good! If it wants to include extra metadata along with a file, it should put it in the associated cloud database for each entry, not touch the file itself. To be honest, this is yet another reason to worry about OneDrive – munrobasher Aug 03 '18 at 18:00
  • In this specific instance, I can include the /xc switch with robocopy to exclude changed files. Slightly risky but as changed files are very unusual, it'll work in this specific migration requirement – munrobasher Aug 03 '18 at 18:05
  • If your problem is similar to that in the article Microsoft OneDrive for Business modifies files as it syncs, then Microsoft has modified your file by adding to it an identifier with the purpose of probably identifying the user. It might be a good idea to let OneDrive do the sync rather than using other utilities. Are you using OneDrive for Business? – harrymc Aug 03 '18 at 18:56

1 Answers1

4

After discussing this with Microsoft and reading the article linked above, this is "by design" in that OneDrive will add additional metadata to Office documents after uploading hence the reason they upload and then download immediately. This therefore changes the size of the file. A problem for those who may rely upon file size/MD5 checksums to ensure document integrity. This doesn't appear to happen with the personal version of OneDrive and non-Office documents are not changed.

  • It sucks if your company moves OneDrive instances or you move companies. Unlike Dropbox, if your company has migrated all your data on the server you can't then move it locally and have Onedrive just check it's all the same and resync it. If you do this it'll think every single file is a conflict because of the changes to the files you mention. Changing the files is a terrible design choice on Microsoft's part. If I put files somewhere, I expect them to be binary identical in the future. As it is, the only choice is to redownload the entire 1TB from MS! – Benj May 05 '19 at 09:13
  • 1
    Quite... it's very bad design IMO – munrobasher Feb 18 '20 at 09:43