9

The ASCII code point 9 encodes the horizontal tab. Then why does the compilation of the following manuscript

x\char9y\bye

with TeX Version 3.14159265 (TeX Live 2017) yield a dvi file whose body consists of the three letters

xΨy

(x, Greek Capital Letter Psi, y) rather than of the three letters

x y

(x, space, y)?

Evan Aad
  • 11,066
  • You could use \quad for a "tab" of space. – dexteritas Jun 22 '17 at 16:19
  • @dexteritas: Thanks, but I am interested in understanding the reason why the described manuscript fails to result in the expected output rather than in outputting a tab character. – Evan Aad Jun 22 '17 at 16:22
  • 2
    This is exactly the same as your question earlier today why } (character 125) doesn't print as a } – David Carlisle Jun 22 '17 at 18:42
  • 2
    In one case you typeset character 125 and get whatever the font has in slot 125 instead of a } and in this case you typeset character 9 and get whatever the font has in slot 9 instead of a tab. – David Carlisle Jun 22 '17 at 18:52
  • @DavidCarlisle: I find it very difficult and confusing to distinguish between 1. a coded character set (e.g. ASCII, Unicode), 2. a character set encoding (e.g. UTF-8), 3. a font (e.g. family:cmr,series:m,shape:n,size:10), 4. a TeX character token (e.g. character code:32, catcode:10), 5. a TeX command for generating a character token (e.g. A for generating the token (65,11)), and 6. a TeX command for printing some glyph in the current font (e.g. \char9). – Evan Aad Jun 22 '17 at 19:05
  • @EvanAad I'm not entirely sure what the overall thrust of your questions is, but I wonder if you have access to The TeXbook and TeX by Topic (the latter is free as a PDF): these really are required resources for understanding TeX concepts at the lower level. – Joseph Wright Jun 23 '17 at 07:24
  • @JosephWright: Yes, I have access to both books. If you applied the same exhortation to every poster on this site, the site wouldn't have enough posts to move it past area 51. Every qustion could've been answerd by the asker if they only read the TeXbook/packge manual carefully enough, if they only experimented more diligently, if they only searched this site long enough, etc. All of which things we all really and truly ought to do before we post a question. And I do try all of the above before posting, I really do. But sometimes (too often, perhaps...) it's still not enough for my dull brain. – Evan Aad Jun 23 '17 at 08:35

3 Answers3

10

This is about font encoding: the standard OT1 font encoding developed by Knuth uses all of the slots in the font, not just those you might normally think of as printable. Most famously, \char0 is used in this encoding and used to cause issues for some PDF viewers.

The idea that there is one 'universal' encoding has arisen with the adoption of Unicode, but this simply doesn't apply to older material. There are lots of encodings for fonts, both 'standard' and entirely non-standard (think pifonts).

What you have have to bear in mind is that the input codepoint doesn't have to match the output codepoint: the key is that it shows the correct information. That's perhaps most easily understood with something like \alpha: clearly the output (a single glyph) is different from the input, and in a 'classical' encoding such as Latin-1 there is no alpha.

Probably the easiest way to see the full font table for an (8-bit) font is to use the LaTeX package fonttable:

\documentclass{article}
\usepackage{fonttable}
\begin{document}

\fonttable{cmr10}

\end{document}
Joseph Wright
  • 259,911
  • 34
  • 706
  • 1,036
  • 1
    For a LaTeX 'take' on the same ideas, see https://tex.stackexchange.com/questions/44694/fontenc-vs-inputenc - the fontenc package does the work here to pick the correct output slots. – Joseph Wright Jun 22 '17 at 16:23
  • How can I inspect the details of a font's encoding? In other words, given a font name, e.g. OT1, how can I tell what numbers the font uses to encode characters or whatever (i.e. the font's domain), as well as what graphical manifestation the font assigns to every number in its domain? – Evan Aad Jun 22 '17 at 16:33
  • You wrote: "Probably the easiest way to see the full font table for an (8-bit) font is to use the LaTeX package fonttable". How can I tell if a font is 8-bit? How can I see the full font table for a font that is not 8-bit? – Evan Aad Jun 23 '17 at 06:59
  • 1
    @EvanAad 'Classical' TeX fonts are either 7- or 8-bit and are in your TeX tree, so they'll be tfmformat and findable using kpsewhich. (For example, the cmr10 example points to cmr10.tfm whilst if in LaTeX I'd selected T1, I'd have wanted to look at ecrm1000.) The fonttable package will deal happily with 7- and 8-bit fonts. If you are talking about a Unicode font then system or general font tools can be used to inspect the .otf file (e.g. FontForge, Font Book, ...). – Joseph Wright Jun 23 '17 at 07:17
9

There's nothing strange: \char<8 bit number> prints the character in slot <8 bit number> in the current font. It does not generate a character token, as you seem to believe.

If you want a tab character token, with its current category code, type ^^I or ^^09, where ^ must have category code 7.

With XeTeX and LuaTeX you have available \Uchar that expandably generates a character token, so \Uchar 9 will do what you need. XeTeX also provides \Ucharcat that generates a character token with a given number and a given category code: for instance

\Ucharcat 9 10

is the same as \Uchar 9 with the standard category code. Not all category codes are available, notably 13 (active) isn't. The primitive \Ucharcat is not defined in LuaTeX, but it can be emulated (see the ucharcat.sty LaTeX package).

egreg
  • 1,121,712
  • Is there a way to use an octal or a hexadecimal number representation with \Uchar, resp. \Ucharcat, rather than a decimal one? – Evan Aad Jun 29 '17 at 15:21
4

The other (excellent) answers address what \char does and how to input a literal horizontal tab character using ^^ but they don't mention why TeX treats a tab as a space. Admittedly, this is only tangentially related to your question, but it may help explain TeX's behavior.

By default, tab has a category code of 10 (space). One interesting aspect of category code 10 characters is that when TeX encounters them in the input while it is not ignoring spaces, it produces a space token which has category code 10 and character code 32 (ASCII space). In essence, TeX doesn't really know anything about tab characters. As far as it's concerned, since character 9 has category code 10, it's just a space.

TH.
  • 62,639