10

How does one instruct Lua(La)TeX -- specifically, the lua code invoked via a \directlua directive -- to insert a non-ASCII unicode symbol, such as a "zero width non-joiner" symbol (code U+200C), into the text stream? I already know how to do this inside the body of a (Lua)LaTeX document -- I'd type something like

 stuff\char"200C{}morestuff

But how does one do this from inside lua code?


Addendum to (hopefully...) clarify what I'm trying to get done. If there's a string in the input stream such as xyz123, I'd like to insert a specific character (this evil invisible ZWNJ character...) between xyz and 123, so that the input stream now is

xyz<ZWNJ>123

I already have the code to (i) find all instances of xyz123 in the input stream and (ii) find the insertion point for the ZWNJ character inside the string xyz123. What I'm stuck with is trying to figure out how to insert the ZWNJ character (a "node" in luatex speak? of what type?) at the insertion location.

Mico
  • 506,678
  • 1
    can't you just enter the character directly? – David Carlisle Sep 04 '13 at 21:26
  • 1
    @DavidCarlisle - The character ZWNJ is a "non-printing", i.e., invisible character. To main some debuggability of my code I'd rather not enter an invisible character directly. Instead, I want to provide the unicode "number" and have lua(tex) insert that character... – Mico Sep 04 '13 at 21:33
  • 2
    I don't know Lua, so I can comment only about the TeX side: I'd type stuff\char"200C morestuff, without the empty group. The space is ignored because it follows a required constant. Or also ^^^^200c which is even better, because it's converted to the "real character" during tokenization. Since Lua code in \directlua is tokenized on the TeX side, it's possible that ^^^^200c works. – egreg Sep 04 '13 at 21:46
  • @egreg -- thanks for this suggestion. I realize now I should have mentioned that the lua code sits in separate file (since doing just that is a frequently encountered recommendation...) and is invoked via the statement \directlua{ require("myfile.lua") }. :-( – Mico Sep 04 '13 at 21:56
  • @Mico oops that comment just invalidated my answer I'll update if I get the external version working:-) – David Carlisle Sep 04 '13 at 22:04
  • Answer updated working with external code. – David Carlisle Sep 04 '13 at 22:31

2 Answers2

6

In luatex, there is unicode library included. It acts as replacement for string library, so to print some unicode code point, you can use the char function:

function unicode2utf(c)
  -- As parameter pass hexadecimal unicode code point
  return unicode.utf8.char(tonumber(c,16))
end

print(unicode2utf("038F"))    

This will print the omega symbol, as invisible space isn't best character to test :)

michal.h21
  • 50,697
  • Thanks. I've provided an addendum to my posting, to be (hopefully) a bit clearer about what I'm looking to achieve. I used the term "print" a bit loosely in the original posting; I meant to say that I need to insert a certain character into the input stream. – Mico Sep 04 '13 at 22:36
  • +1 ah OK (that saves decoding the utf8 by hand) but I'll leave my answer there anyway as it has some alternatives (but @Mico should probably accept this:-) – David Carlisle Sep 04 '13 at 22:48
  • @Mico see http://tex.stackexchange.com/a/114573/2891 for sample of generating text from luatex – michal.h21 Sep 04 '13 at 23:07
  • Many thanks for this, both for the answer itself and for the pointer to the posting tex.stackexchange.com/a/114573/2891. – Mico Sep 05 '13 at 00:43
  • Nico's question was in the context of a standalone lua. Is there a Luatex library that can be "required" from lua standalone ? – user1771398 Oct 31 '18 at 18:29
  • @user1771398 LuaTeX uses modified version of Selene Unicode library – michal.h21 Oct 31 '18 at 19:33
  • @michal.h21 I explored the LuaTex folders in search of a .lua file to require, in order to have unicode.utf8.char in the luaxml suite work, to no avail. Therefore I just substitued the "unicode.utf8.char" with "utf8.char" and it seems to work fine. – user1771398 Oct 31 '18 at 20:20
3

The argument is processed by TeX before being passed to lua so you can use

\documentclass{article}

\usepackage{fontspec}
\setmainfont{Arial}
\begin{document}

\showoutput


z^^^^200cZ

\directlua{tex.sprint("a^^^^200cbc")}
\end{document}

which makes

....\EU2/Arial(0)/m/n/10 a
....\EU2/Arial(0)/m/n/10 ‌
....\EU2/Arial(0)/m/n/10 b
....\EU2/Arial(0)/m/n/10 c

showing a single invisible character between a and b

This also works if the tex.print is in an external file that is accessed via

\directlua{require('\jobname.lua') }

but in the external file

tex.sprint("x^^^^200cyz")

The lua string has 11 characters (bytes) the ^^^^ notation i sonly interpreted by TeX as it parses the output of tex.print

If you need the character to be available as such to lua functions rather than just in tex output you need to construct its utf8 encoding I think (update later)

If you need the character to be as such in a lua string then you can use string.char with its utf8 encoding

\jobname.lua

tex.sprint("x^^^^200cyz")
zwnj=string.char(226,128,140)
tex.sprint("v")
tex.sprint(zwnj)
tex.sprint("w")

test.tex

\documentclass{article}

\usepackage{fontspec}
\setmainfont{Arial}
\begin{document}

\showoutput


z^^^^200cZ

\directlua{tex.sprint("a^^^^200cbc")}

\directlua{require('\jobname.lua') }
\end{document}
David Carlisle
  • 757,742