11

Which characters may appear in filenames that are included (\include)?

Are hyphens and underscores okay? Is there a dependence on the underlying filesystem or encoding of the LaTeX source that does the including? Are only ASCII characters allowed?

1 Answers1

12

The filename syntax is one of the few explicitly system-dependent parts of TeX. web2c based TeX systems allow most characters allowed by the filesystem. (For example spaces are allowed if you surround the entire path with " quotes).

Note that LaTeX for example does not assume that the extension (to a base name) is formed by appending . with the extension. Some systems did not allow such names and the file usually known as article.cls might be article in a directory cls or [cls]article or any of various other syntactic possibilities. The file textsys.cfg is read at format-making time and by default automatically determines these:

 This file contains the site specific definitions of the four macros\\
 |\@currdir|, |\input@path|, |\filename@parse| and |\@TeXversion|.

But may be defined locally to be anything appropriate to the system.

Note that special characters may need special handling in order to be passed to the file name reader. For example the character ~ is allowed but LaTeX typically gives this a definition that expands to \nobreakspace{} which isn't the intended interpretation in a file name. \string~ works though. See this question for instructions on how to deal with spaces in filenames. Similar issues relate to non ascii characters. If using classic TeX (rather than xetex/luatex) 8bit characters will be given active definitions designed for typesetting rather than using as filename strings. Also the file system may be using utf8 or may be using a local code page and LaTeX probably doesn't really know which (the underlying file IO of the web2c system presumably knows but TeX macros can't access that) so again you may have to use the input "verbatim" using \string or some equivalent quoting method in order to pass bytes directly to the underlying filename reader. It all depends...

David Carlisle
  • 757,742
  • The natural implication from your answer is that the filesystem-dependent naming would also require that any permissible special characters are available in the input encoding of the file that does the including. I assume that the input encoding of the file doesn't matter, as long as it is readable by LaTeX (via \usepackage[encoding]{inputenc}). Do correct me if I'm wrong. – Lover of Structure Jun 11 '12 at 00:38
  • yes and no I'll put something in the answer – David Carlisle Jun 11 '12 at 00:40
  • Many thanks, great answer! One detail: could you give an example for "active definitions designed for typesetting" (or, if tilde is such an example, clarify this)? – Lover of Structure Jun 12 '12 at 00:59
  • that is what I meant, or é getting normalised to \'e etc. – David Carlisle Jun 12 '12 at 12:33
  • So if I understand correctly, this would mean "whatever combination of input encoding and LaTeX notation yields the required string in the OS-dependent output encoding". Your usage of the expression "normalised" is a bit confusing though, because in Unicode lingo, "normalization" has a meaning to convert a glyph to a standardized form, and that would certainly not be \'e (and it's doubtful that from a modern perspective \'e should be regarded as a normal form. Thanks for your information, of course! – Lover of Structure Jun 13 '12 at 19:01