3

I am to generate a file name that will be used for a ZIP file sent back to the user from a web app. I have no idea what OS that file will ultimately be saved to so I have to come up with a way to sanitise that file name on miltiple plateform (at the very least, it must be safe on Windows, Linux and Mac). The file name comes from a user-supplied field (stored as UTF-16) and I would like to keep the original text as much as possible. Audience is global so I do not see a way to limit the allowed characters to a simple set.

If there a safe strategy for such an operation ? If it helps, the software is written in C# and runs on both windows and linux servers.

Surprisingly, I cannot seem to find a proper way to do this: all methods I found are either specific to the OS the software runs on, use a limited list of invalid chars (only the most obvious ones) or use a whitelist.

Stephane
  • 18,679
  • 3
  • 63
  • 70
  • be sure when/if you save the file on the server that you choose the filename and extension. When serving a file back as a response, you choose the extension, and you can use a simple filter for the filename. Replace all non-letter/number characters. I believe C# has a getFilenameWithoutExtension method if that's helpful. You can add validation for limiting the size of the filename submitted (and/or just allow numbers/characters). Not sure what you meant by "stored as UTF-16". Maybe add in the context there. – browsermator Oct 17 '23 at 17:22
  • The only files ever saved to the file system are temporary stream that all have GUID as name: everything else is stored in a blob storage with all meta data in a database. My issue here is that I'm using user-supplier data to generate the name of files that I then export to other users which is a security risk is there is no sanitation – Stephane Oct 18 '23 at 13:52
  • I've looked around a bit and I haven't found anything built-in. The answer below listed some things I didn't know about... LPT, COM, etc... Simplest path would be filter out any non letters or numbers. Then prefix the filename with something fixed, and of course, create the extension which is known (".zip"). The source of the user-supplied data might be a factor here as you mentioned encoding, but as long as you decode, then filter to only letters/numbers, then prefix with something safe I think that'd be fine. ex: prefix "download" + user-supplied name... then trim if it exceeds 256. – browsermator Oct 18 '23 at 16:24
  • I just did some quick tests on this to see if the browser did it's own filtering and it did. I supplied "COM1" as the filename and when downloaded it became "underscoreCOM1"... when using "./COM1" it became "underscore.underscoreCOM1". This was Chrome... so that should give you some reassurance that the end-user should be protected on their end. The important part is really securing the server-side of things. (I had to use "underscore" there because this site was filtering...) – browsermator Oct 18 '23 at 18:06
  • did some further testing on the back-end. In ASP.NET-Core, if you try to write a File with an illegal name, it will throw I/O error. If not caught, it will silently fail (meaning it won't write the file). This was in Windows, but I assume the same would apply to Unix. – browsermator Oct 18 '23 at 18:38
  • ...though that silent fail may have meant it was trying to write to COM1 serial port? The I/O errors would be only for certain methods. The FullName was there, but all paths were filtered out.. it became something like ".//.//.COM1" – browsermator Oct 18 '23 at 18:47
  • @pcalkins there is an additional winkle that I didn't add to my question (KISS): the file is included a in ZIP archive so I don't have the luxury to hope the browser will save me here. But it's a good comment nevertheless – Stephane Oct 19 '23 at 07:50
  • In that case you may want to watch out for unique path/filename as well as I've seen zips where there are two files with the same path/filename... when you unzip one will over-write the other. (though the user unzipping the file is usually notified of this...) – browsermator Oct 19 '23 at 19:18

1 Answers1

6

There are a few things to watch out for:

  • Microsoft’s file systems have the most restrictive path lengths, ensure you do t exceed them (I believe 256, but look it up yourself)
  • There are a few files that have special meaning in Microsoft systems (for example LPT, COM, NUL, etc.) ensure your file does not begin with them. Check the Microsoft pages got the full list. for example this one (thank you @Gh0stFish)
  • Files beginning with a “.” Are hidden on a *NIX system, but not on a Microsoft one.
  • Only the 8.3 schema is fully compatible with all current file systems in use, consider if that’s a problem.
  • Only use printable characters for file names, preferring to exclusively use the limited UTF-8 set if possible, to prevent problems, if not.. just stick to the printable set and remove characters with special meaning (‘“``'~\/*?:|`).
  • And the only working solution is to have a checking routine, that verifies all the different possible use cases that are not allowed. It’s to complex for any simple RegEx.(or even just 1 RegEx).

Bonus point: watch out for path traversals, a common technique to infect machines is to make an unpack put files in an odd location through path traversals.

LvB
  • 8,943
  • 1
  • 30
  • 47
  • 1
    Also need to filter out characters like : and \ - there's a list of rules on one of the Microsoft pages - but obviously that only applies to Windows and it may not be complete. – Gh0stFish Oct 18 '23 at 08:10
  • The backtick I got. But I forgot the colon yes, also thank you for the link. I edited my post with this information. – LvB Oct 18 '23 at 09:07
  • ugh, that was meant to be a backslash \ - but I guess the formatting treated that as an escape character and didn't display it. Seems if you don't have a space after it then it doesn't get displayed properly (even if you try and escape it with another backslash). – Gh0stFish Oct 18 '23 at 09:52
  • 1
    i did have it there, but like you it didn't get rendered hopefully fixed now. – LvB Oct 18 '23 at 14:50