0

I want to write a LaTeX-TeX parser to manipulate .tex files for a program. I found this question here How does a TeX engine read and render the input stream? which explains how everything works under the hood for TeX.

How is this process implemented in a LaTeX/TeX engine (for instance PDFTeX) ?

I want to know where I could find the source code for this particular part of a tex engine (LuaTeX or XeTeX would do the job too).

[UPDATE] I found some answer here too : Four TeX processors (as described in TeX by Topic)

Elyo
  • 386
  • 1
    See https://stackoverflow.com/questions/2731266/how-do-i-find-latex-source-code. It provides a link to a pdf file with the source code. – Charles B. Cameron Jun 25 '22 at 16:16
  • 1
    There's TeX Live on GitHub for the source code of the engine, but it's quite difficult to find out the correct part. Then there's texdoc tex which is the documentation of the original TeX engine's source code, but it's still difficult to parse. The most accessible one would be TeXbook. (or TeX by Topic which covers the primitives.) See also package writing - Where do I start LaTeX programming? - TeX - LaTeX Stack Exchange – user202729 Jun 25 '22 at 16:35
  • @CharlesB.Cameron source2e is the source code of latex (which is written in tex) not the source code of tex (which is also available as a pdf) – David Carlisle Jun 25 '22 at 16:36
  • 4
    Although... for your initial question -- I don't think it's a good idea to do what you intend to do (figure out how TeX parses the input exactly), the only way to parse TeX "in general and correctly" is to run TeX. Which is most of the time not what you want, so just make some assumptions and implement your own (or use something existing. For Python there's TeXsoup etc. I think? – user202729 Jun 25 '22 at 16:37
  • 2
    Are you aware that the "parsing-behavior" of LaTeX can be changed in many ways while LaTeX is running? How things are parsed by LaTeX also depends on how the LaTeX-engine is initialized before starting reading the .tex-input-file in question.With LaTeX many things were changed in the recent past in this regard, e.g., loading of inputenc with utf-8-option by default with 8-bit-engines affecting catcode-settings, e.g., making control-sequences of xparse/expl3 available by default, ... What about utf-8-engines like XeTeX/LuaTeX? ... What about Lua-extensions? .. – Ulrich Diez Jun 25 '22 at 20:14
  • 1
    You may want to have a look at my answer to Macro for mass hyper-reference?/Detect phrase in tex input file just to see in what weird ways the parsing-behavior of LaTeX can be changed due to directives occurring in some .tex-input-file while LaTeX is running. – Ulrich Diez Jun 25 '22 at 20:14
  • @UlrichDiez I am aware of this feature, which is why it is difficult to parse TeX code macro-packages like LaTeX. But I want something as modular as possible to be able to read almost any .tex files I could give him – Elyo Jun 26 '22 at 08:45
  • Thank you for your comments ! – Elyo Jun 26 '22 at 08:45
  • "I want to write a LaTeX-TeX parser to manipulate .tex files for a program." There are several stages of processing in TeX which are intertwined: 1) Reading the .tex-file linewise, pre-processing the line and inserting tokens into the token-stream depending on what characters were obtained by pre-processing. 2) Expanding expandable tokens in some sort of regurgitating process unless expansion is suppressed. 3) Further processing of tokens, e.g., carrying out assignments, creating and further processing of boxes and glue/line-, paragraph-, page-breaking, output-routine, producing messages ... – Ulrich Diez Jun 27 '22 at 10:11
  • (La)TeX's behavior in each of these stages can be modified via directives occurring within the .tex-file itself. So probably a more adequate answer can be provided when you tell what "manipulations" to .tex-files and what workflow for doing the manipulations you have in mind. – Ulrich Diez Jun 27 '22 at 10:13
  • @UlrichDiez I want to read a .tex (written with LuaLaTeX, XeLaTeX or LaTeX), create the list of packages used with their options, extract specific informations (for instance in a specific environnement) ... I want to extract all informations I can from a tex file, and process this data so that it can be used in other documents – Elyo Jun 27 '22 at 15:12
  • 1
    @Elyo Probably you can extract information by running TeX on a .tex-file which in turn loads the package extract and then extracts information the .tex-file whose info you wish to have. || You can get into the .log-file of a LaTeX-run a list of packages/files loaded during that LaTeX-run by placing \listfiles at the beginnng of the .tex-file. – Ulrich Diez Jun 27 '22 at 15:29

0 Answers0