8

In plain TeX,

\char<int>

produces character with decimal code <int>. Similarly,

\char'<oct>
\char"<hex>

produce character with octal code<oct>, and hexadecimal code <hex> respectively.

Is there such a way of typing Unicode symbols ?

Thank you.

nsoum
  • 240

2 Answers2

9

You can somewhat emulate Unicode also with Plain TeX; say you want to input ć and get \'c out of it.

\catcode"C4=\active % 0xC4 is a two-byte prefix in UTF-8
\def^^c4#1{\csname\string^^c4#1\endcsname}

\expandafter\def\csname\string^^c4^^87\endcsname{\'c}
%%% add other UTF-8 characters having 0xC4 as prefix

%%% Repeat for all other UTF-8 prefixes you need

ć

\bye

Repeat for all prefixes.

You may want to look at http://petr.olsak.net/csplain-e.html for a different strategy and an already baked solution.

egreg
  • 1,121,712
  • doesn't csplain depend on the encTeX extension, as I already pointed out? – jarnosc Jan 29 '16 at 16:59
  • @TarsTarkas Yes, that's why I said it uses a different strategy. – egreg Jan 29 '16 at 17:00
  • In any event, activating characters or transforming input is a whole different game than using the \char primitive, isn't it? – jarnosc Feb 01 '16 at 18:37
  • With TeX you can't use \char<number> with a number beyond 255, can you? You have to switch to XeTeX or LuaTeX for this, with many problems, of course. – egreg Feb 01 '16 at 18:54
  • I know... but your proposed solution assumes the OP is using an 8bit TeX engine, and afaik, they can't activate single tokens over the 7bit range with XeTeX or LuaTeX. afaics, the OP is only assuming the use of the Plain TeX macros, so they can use Plain TeX with the XeTeX or LuaTeX engine. – jarnosc Feb 01 '16 at 19:02
  • 1
    @TarsTarkas No tag [tag:xetex] or [tag:luatex] is accompanying the question. With XeTeX you can certainly type \char"1234, but no output would result, unless you set up an OpenType/TrueType font. – egreg Feb 01 '16 at 19:04
  • the OP is asking about how to use the primitive \char to input unicode characters, so the answer should be: you can't do that with an 8bit font or engine, but you can with a unicode aware font and engine. – jarnosc Feb 01 '16 at 19:16
  • How can I find the associated number for an arbitrary character say "ı" for example and what is the meaning of double caret ^^? – Vesnog Jul 29 '17 at 22:54
  • @Vesnog For ^^ see What is the role of an unescaped circumflex or hat character?. For finding the UTF-8 representation you can use http://r12a.github.io/apps/conversion/ – egreg Jul 30 '17 at 06:28
3

You need an unicode aware engine to do it; macro packages like Plain cannot do it by themselves. Examples of unicode aware engines are xetex and luatex, and perhaps etex and pdftex with the enctex extension.

jarnosc
  • 4,266