29

I'm writing a website which allows users to enter some text in predefined text fields. This text gets passed on the server where a LaTeX document is created (containing the text entered by the user). The server finally returns a compiled LaTeX document (a pdf) to the user. Note: the user doesn't enter the whole tex document, only parts.

My problem: how can I make sure that the entered text will not harm my server? I.e. how to detect harmful LaTeX code?

Some examples:

  • The user entered an infinite loop written in LaTeX, the server can't compile the document.

  • The user entered a shell script which will be executed from the tex file when compiling, potentially crashing my server.

Is my best alternative to blacklist any LaTeX code? Is detecting \ followed by a non-space enough to block any potentially harmful LaTeX code?

edo
  • 455
  • I've written an answer covering what I believe to be the basics, but you should probably look into what sharelatex and overleaf permit (I have a feeling I've seen some details online but that was some time ago and I can't find it now) – Chris H Oct 20 '16 at 11:25
  • 5
    Disable all shell-escape (including the restricted ones) and run LuaTeX with the --safer option (or don't use it at all). Of course, never run TeX as root. – Henri Menke Oct 20 '16 at 11:30
  • 9
  • Sandbox. 2. Whitelist very few commands. Even \begin may not be safe: try \begin{input}{/etc/passwd} inside the document (Linux).
  • – Bruno Le Floch Oct 20 '16 at 12:24
  • @BrunoLeFloch depending on the document you might need to whitelist \begin{figure} (or tabular or equation etc.) even if \begin{document} is provided by the template. Perhaps the OP should expand a little on what user input will consist of. – Chris H Oct 20 '16 at 12:53
  • 9
    Another option if you need only very basic formatting commands etc: don't accept TeX input. Accept markdown and pass through pandoc to convert to LaTeX, trimming any extraneous material (preamble). – Chris H Oct 20 '16 at 12:58
  • 2
    @BrunoLeFloch To be clear: that won't reveal any passwords. It would, however, provide a list a user names. But if the security of the system depends on users not reading a file marked world-readable, there's a problem. On a secure system, this cannot be a real threat. Users can cat /etc/passwd on multi-user systems, but that doesn't really compromise the system. In theory, knowing user names makes an attack slightly easier, but strong passwords and standard precautions mean that makes little difference. The attacker already knows there's a root user, for example. – cfr Oct 20 '16 at 14:59
  • 1
    @cfr Once you know the username (or guess it from /etc/passwd), you can read user's sensitive data, e.g.: \begin{input}{/home/guessed-username/.netrc}. This is a serious threat! – yo' Oct 20 '16 at 19:54
  • 2
    Great, now we need a Anti-LaTeX-Malware program written in LaTeX as well... – Tobias Kienzler Oct 20 '16 at 19:59
  • 1
    Are the text fields simple text fields without any LaTeX markup? Or should a subset of LaTeX be supported? – Heiko Oberdiek Oct 20 '16 at 20:20
  • 1
    @yo' If other users can read sensitive data in my home directory, the system is not secure. Any user can do ls /home to get the names of other users. There is no threat here at all unless the system is already insecure because the permissions on directories are inappropriate. For me to read your .netrc, your home directory must be world readable and world executable (or I must be in the same group as you and it must be group readable and executable) and .netrc must be world readable (or group readable). If that is true, my reading /etc/passwd is beside the point. – cfr Oct 20 '16 at 21:28
  • @cfr but this way someone can propagate your .netrc into a PDF document they are allowed to open. – yo' Oct 20 '16 at 21:30
  • @yo' No they cannot. You can't do that unless you can read it. You shouldn't be able to read it. There is no threat here. /etc/passwd is world readable. It is assumed that everyone can read it. That is simply not a threat at all. Knowing your username does not give me access to your files unless the system is fundamentally insecure. If that's the case, you have more to worry about than my reading /etc/passwd. Knowing your user name doesn't give me access to anything. There is no threat here. It is just FUD. – cfr Oct 20 '16 at 21:33
  • @yo' Unless, of course, you are compiling as root. But obviously nobody sensible would ever do that. – cfr Oct 20 '16 at 21:36
  • @cfr But there is no "you" in the problem. The user that runs the script is the user that created the script, if I understand things correctly. The person that inputs the data is someone else. Or do I miss something? – yo' Oct 20 '16 at 21:37
  • @yo' No. You would not run the compiler as a user with a home containing sensitive data. You don't run anything which is publicly available as a normal user at all. If you are doing that, then malicious LaTeX is the least of your worries. – cfr Oct 20 '16 at 21:51
  • @yo' Probably the effective user's home directory is /dev/null or /srv/http or something along those lines. I've never done this so I don't know the details, but if you are bob, say, you don't run anything on the public side as bob. – cfr Oct 20 '16 at 21:54
  • 1
    You probably want to look into the sandboxing techniques that online C compiler sites use to protect themselves. (Especially the ones that let you compile & run your code on their server.) – Peter Cordes Oct 21 '16 at 08:16
  • @cfr is right. As long as TeX is run by a user with sufficiently few read permissions the system can be kept secure. – Bruno Le Floch Oct 21 '16 at 16:06