1

I'm trying to master LaTeX3 using the example of the problem of parsing LaTeX code and then writing function values ​​to a file.

Problem. Generate a piece of XML file in pdfLaTeX containing normalized (allowed and meaningful in XML) LaTeX variables from the current file (article title, list of authors, abstract, keywords, list of references etc.) For example, normalized text is text in which \, disappears, \& is replaced by &amp;, ~ – by &nbsp;, -- – by &ndash;, string $x_2$ is replaced with x<sub>2</sub>, \emph{ ... } – with <em> ... <em> and so on. Since I do not need to convert the entire article to XML, it seems to me that it is worth just writing lines with the necessary XML tags to the file using standard TeX or LaTeX3 functions. I more or less figured out the regular expressions in expl3, but I could not write the value of the \normalize function to the file. Here, in the minimally working example, I've focused on some of the non-XML functionality I need. I'm trying to figure out how I can save to a file not the text of the functions, but just their values.

That is, in this case, I would like to get in the file exactly what I see in the first line on the screen. And why doesn't passing a value to the \normalize function work?

I would appreciate a solution and an explanation. "The LATEX3 Interfaces" is still a bit complicated for me, but I'm getting into it a little.

\documentclass{article}
\usepackage{expl3}

\ExplSyntaxOn

\tl_new:N \l_normalize_tl

\cs_new:Npn \normalize #1 { \tl_set:Nn \l_normalize_tl {#1} \regex_replace_all:nnN { \c{,} } { } \l_normalize_tl \regex_replace_all:nnN { \c{&} } { \c{&}amp; } \l_normalize_tl \regex_replace_all:nnN { ~ } { \c{&}nbsp; } \l_normalize_tl \regex_replace_all:nnN { -- } { \c{&}ndash; } \l_normalize_tl \regex_replace_all:nnN { \c{emph}{(.*?)} } { \c{textless} em\c{textgreater}\1\c{textless}/em\c{textgreater} } \l_normalize_tl \tl_use:N \l_normalize_tl }

\DeclareDocumentCommand\wout { m } { \iow_now:Nx \g_xml_out_iow { #1 } }

\DeclareDocumentCommand\writexml{ } { \iow_new:N \g_xml_out_iow \iow_open:Nn \g_xml_out_iow { \c_sys_jobname_str.xml }

\wout{ \exp_not:V\normalize\teststring }

\iow_close:N \g_xml_out_iow

}

\ExplSyntaxOff

\begin{document}

\def\teststring{This is~-- a test document by \emph{A.,A.~Smith}.}

\normalize{This is~-- a test document by \emph{A.,A.~Smith}.}

\normalize\teststring

\writexml

\end{document}

enter image description here

And I see in the *.xml file

\normalize This is\protect \unhbox \voidb@x \protect \penalty \@M \ {}-- a test document by \protect \unhbox \voidb@x \bgroup \edef .{A.\,A.~Smith}\let \futurelet \@let@token \let \protect \relax \itshape A.\protect \protect \leavevmode@ifvmode \kern +.16667em\relax A.\protect \unhbox \voidb@x \protect \penalty \@M \ {}Smith\egroup .
Crosfield
  • 119
  • 1
  • 7
  • 2
    For a few tasks, you still need to learn how TeX the engine works. (→ read the TeXbook/TeX by topic) – user202729 Jul 05 '22 at 00:54
  • For one, your \normalize function is not expandable → see my answer https://tex.stackexchange.com/questions/645995/why-cant-i-use-some-macro-inside-the-argument-of-some-other-macro , section 2. – user202729 Jul 05 '22 at 00:56

1 Answers1

2

You want first to normalize, then write.

\documentclass{article}

\ExplSyntaxOn

\tl_new:N \l_crosfield_normalize_tl \iow_new:N \g_xml_out_iow \cs_generate_variant:Nn \iow_now:Nn { NV }

\cs_new_protected:Npn \crosfield_normalize:n #1 { \tl_set:Nn \l_crosfield_normalize_tl {#1} \regex_replace_all:nnN { \c{,} } { } \l_crosfield_normalize_tl \regex_replace_all:nnN { \c{&} } { &amp; } \l_crosfield_normalize_tl \regex_replace_all:nnN { ~ } { &nbsp; } \l_crosfield_normalize_tl \regex_replace_all:nnN { -- } { &ndash; } \l_crosfield_normalize_tl \regex_replace_all:nnN { \c{emph}{(.*?)} } { <em> \1 </em> } \l_crosfield_normalize_tl }

\NewDocumentCommand\writexml{ m } { \iow_open:Nn \g_xml_out_iow { \c_sys_jobname_str.xml } \crosfield_normalize:n { #1 } \iow_now:NV \g_xml_out_iow \l_crosfield_normalize_tl \iow_close:N \g_xml_out_iow }

\ExplSyntaxOff

\begin{document}

\writexml{This is~-- a test document by \emph{A.,A.~Smith}.}

\end{document}

You don't want \iow_now:Nx, because \textless doesn't expand to < and similarly for \& or \textless. Just use the desired characters.

The contents of the xml file is

This is&nbsp;&ndash; a test document by <em>A.A.&nbsp;Smith</em>.

If you want to be able to use “variables”:

\documentclass{article}

\ExplSyntaxOn

\tl_new:N \l_crosfield_normalize_tl \iow_new:N \g_xml_out_iow \cs_generate_variant:Nn \iow_now:Nn { NV }

\cs_new_protected:Npn \crosfield_normalize:n #1 { \tl_set:Nn \l_crosfield_normalize_tl {#1} \regex_replace_all:nnN { \c{,} } { } \l_crosfield_normalize_tl \regex_replace_all:nnN { \c{&} } { &amp; } \l_crosfield_normalize_tl \regex_replace_all:nnN { ~ } { &nbsp; } \l_crosfield_normalize_tl \regex_replace_all:nnN { -- } { &ndash; } \l_crosfield_normalize_tl \regex_replace_all:nnN { \c{emph}{(.*?)} } { <em> \1 </em> } \l_crosfield_normalize_tl } \cs_generate_variant:Nn \crosfield_normalize:n { e }

\NewDocumentCommand\writexml{ m } { \iow_open:Nn \g_xml_out_iow { \c_sys_jobname_str.xml } \crosfield_normalize:e { \text_expand:n { #1 } } \iow_now:NV \g_xml_out_iow \l_crosfield_normalize_tl \iow_close:N \g_xml_out_iow }

\ExplSyntaxOff

\begin{document}

\newcommand\test{This is~-- a test document by \emph{A.,A.~Smith}.}

\writexml{\test}

\end{document}

The output is the same.

egreg
  • 1,121,712
  • Thanks, egreg, for the detailed answer! But what if I need to call \writexml on a string defined by \def? That is for \def\teststring{This is~-- a test document by \emph{A.\,A.~Smith}.} \writexml\teststring Just writing to XML will require many calls to the macro \writexml with different strings – Crosfield Jul 05 '22 at 09:00
  • @Crosfield Added the code. – egreg Jul 05 '22 at 09:11
  • Many thanks! It works! – Crosfield Jul 05 '22 at 11:14