Inspect the shipout box
What you could do is override the \shipout primitive to execute some custom Lua code which walks the vertical list and inspects the nodes. This needs a fairly recent LuaTeX to be able to use token.scan_list. I have not yet found out how to determine the starting position in a sensible manner, so all coordinates are relative to the lower left corner of the bounding box of the first glyph. The coordinates are in pt.
\documentclass{standalone}
\directlua{
local mode
local x = 0
local y = 0
local glue = { hmode = 0, vmode = 0 }
local function glyph_boxes(head)
for n in node.traverse(head) do
if n.id == node.id"hlist" then
mode = "hmode"
y = y - n.shift
glyph_boxes(n.list)
y = y + n.shift
elseif n.id == node.id"vlist" then
mode = "vmode"
glue.hmode = 0
glyph_boxes(n.list)
elseif n.id == node.id"glue" then
glue[mode] = glue[mode] + n.width
elseif n.id == node.id"kern" then
glue[mode] = glue[mode] + n.kern
elseif n.id == node.id"glyph" then
x = x + glue.hmode
local c = string.utfcharacter(n.char)
local f = font.getfont(n.font)
print(c, f.name, x / 2^16, y / 2^16)
x = x + n.width
glue.hmode = 0
end
end
end
function shipout()
local box = token.scan_list()
tex.setbox(255, box)
glue.vmode = 0
print() % just for nice formatting
glyph_boxes(tex.box[255])
tex.shipout(255)
end
}
\def\shipout{\directlua{shipout()}}
\begin{document}
Test $x^2$
\end{document}
Output in the log:
T [lmroman10-regular]:+tlig; 0.0 0.11000061035156
e [lmroman10-regular]:+tlig; 6.3899993896484 0.11000061035156
s [lmroman10-regular]:+tlig; 10.830001831055 0.11000061035156
t [lmroman10-regular]:+tlig; 14.770004272461 0.11000061035156
x cmmi10 21.990005493164 0.11000061035156
2 cmr7 27.705276489258 3.7389221191406
SVG export
Another option to obtain glyph coordinates is using the dvisvgm driver. This requires a bit more handwork but is in general less complex. In that case the document is simply
\documentclass{standalone}
\begin{document}
Test $x^2$
\end{document}
Typeset using
dvilualatex test.tex # or simply latex
dvisvgm --font-format=woff --no-merge test
The log might contain some information about not being able to embed certain fonts. This is irrelevant because we don't want to render the SVG but only extract data. The --no-merge option prevents the driver from merging adjacent letters into a single XML entity. I have removed the base64 encoded font data for brevity.
<?xml version='1.0' encoding='UTF-8'?>
<!-- This file was generated by dvisvgm 2.3.5 -->
<svg height='8.109622pt' version='1.1' viewBox='-72.000004 -72.000007 48.811833 8.109622' width='48.811833pt' xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink'>
<style type='text/css'>
<![CDATA[
@font-face{font-family:cmr7;src:url(/* base64 data */) format('woff');}
@font-face{font-family:cmmi10;src:url(/* base64 data */) format('woff');}
text.f0 {font-family:cmmi10;font-size:9.96264px}
text.f1 {font-family:cmr7;font-size:6.973848px}
text.f2 {font-family:[lmroman10-regular]:+tlig;;font-size:10px}
]]>
</style>
<g id='page1'>
<text class='f2' x='-72.000004' y='-63.890385'>T</text>
<text class='f2' x='-63.662905' y='-63.890385'>e</text>
<text class='f2' x='-54.498905' y='-63.890385'>s</text>
<text class='f2' x='-45.334905' y='-63.890385'>t</text>
<text class='f0' x='-32.853344' y='-63.890385'>x</text>
<text class='f1' x='-27.159412' y='-67.505749'>2</text>
</g>
</svg>
This is hopefully all the information you require. You have the class for each letter which you can match to the respective font using the embedded CSS and the x and y coordinates on the canvas. I cannot tell you where the origin is though. The units are probably bp.
lua-visual-debugjust inserts whatsits to draw colorful boxes in the PDF. It doesn't know about its position on the canvas either. – Henri Menke Feb 06 '19 at 20:52