How to sequentially "feed" a LaTeX document from external data source?

Question

I have a quite complex C program that has the ability to generat raw text output to text files that are then printed on a matrix printer. These lists are now meant to receive a typographical update to finally exist as a PDF that can be printed on a regular laser printer. For this I intend to use LaTeX. I can not exchange the C program, only modify the "printing/text output" functionality.

The general layout of the document to be generated is well-defined, but the number of rows of data is not.

So, in a first step I wrote a LaTeX "template" file, like this:

\documentclass{article}
\begin{document}
Dear --MRMRS-- --NAME--, we hope ...

And let my program replace (in the end with a sed 's/KEYWORD/programoutput/' command) the keywords --MRMRS--, --NAME-- by the data it produces in raw form. This works pretty well, to this point. As already mentioned, the length of some parts of the generated files is not clear beforehand. For example, later in the document there will be tables with a well-defined structure, but these tables can be different in length each time, so I can not simply define a given Number of --ROWxCOLy-- beforehand:

\begin{tabular}{cccc}
Col1 & Col2 & Col3 & Col4 \\
\hrule
% now how to fill the content sequentially without knowing the size beforehand?
\end{tabular}

Thank your, we hope to hear from you on --DATE-- ...
\end{document}

The data that comes looks currently like

Col1 Col2 Col3 Col4
Col1 Col2 Col3 Col4
Col1 Col2 Col3 Col4
COLSPANNED     LINE
Col1 Col2 Col3 Col4
Col1 Col2 Col3 Col4
... variable amount of lines

There can be special lines that need a full column span (span whole width)
but they come in well-defined order

The first solution that comes to mind is to hardcode LateX code directly in the data generating program, to have one keyword for the whole table, that is then generated with hardcoded LaTeX code in the program. But I'd like to avoid this as much as possible for obvious maintainance reasons (keep layout and logic separated as much as possible).

What other possible solutions are there to feed a LaTeX document with well-defined, but unknown length data?

@Alexey Currently the program produces space-separated raw text columns with a variable number of rows. I need to get this into LaTeX code without hardcoding LaTeX commands into my C program source. Updated question with a basic example. — Foo Bar, Jun 26 '15 at 09:37
I see, your question about how to wrap you output in LaTeX, i might have misunderstood it. I think that there is no choice other than process each output line individually and convert it to LaTeX, possibly wrapping in some LaTeX commands. — Alexey, Jun 26 '15 at 09:43
I mean, is your problem any different from the problem of wrapping/converting/sanitizing plain text into HTML? I think there should exist some solutions for the latter. — Alexey, Jun 26 '15 at 09:44
I would have used a solution similar to the one for HTML generation: i would have used some template engine, like eRuby. — Alexey, Jun 26 '15 at 09:48
There is a very fundamental difference between (La)TeX and HTML: (La)TeX itself is a programming language, while HTML is not (HTML is a plain markup language). So, I can "assist" this wrapping process from within LaTeX with control structures (if-else), loops and so on. I could "feed" these programmatical parts also with keyword-subsitutions. — Foo Bar, Jun 26 '15 at 09:49
Maybe i still do not understand you question, but (1) you may use LaTeX simply as a markup language, (2) you may generate a program with a template engine. — Alexey, Jun 26 '15 at 09:50
yes, but it is a kind of compiler, with several target output formats. Depending on the output format you can add links to or embed dynamic content, such as a AV Stream in a pdf document, but that has nothing to do with latex, but with the viewer and the possibilities of your output format. — ikrabbe, Jun 26 '15 at 09:51

ikrabbe · Accepted Answer · 2015-06-26T13:17:43.137

4

Automatic processing of data for TeX / LaTeX

The problem in your question is the term "sequentially". It is impossible to generate one TeX/LaTeX document continuously. A TeX / LaTeX document that produces valid output has a begin and more important, an end. When the document is finished without errors, the output is complete and the TeX / LaTeX job is done. You cannot "feed" data into that document, once it is done.

What you can do is to build a LaTeX frame document, \input some_external_data and re-produce the output every time the external data changes.

The topic is: At the time the LaTeX job runs, the data you want to produce has to be defined in content and length, as the output document wont update automatically later, when the data changes, without running the LaTeX job again. At least not without quite complex, customized methods, that greatly depend on your used viewer or output medium.

To improve your original approach

It might help to change the way you process the data

Dear --MRMRS-- --NAME--, we hope ...

Here you use a self defined template language to produce a TeX file. I think that is, what you mean by "hard coded".

As its a good practice to separate output form (layout) and logic, it is also a good practice to separate generated data from the template as long as possible and to translate the set of information (the input or the data) in a way, the next processor understands (next processor is LaTeX in our case).

I show what I mean: LaTeX does not know what to do with --MRMRS-- and such constructs, thought the TeX machine is generally able to setup such a parser. But that would make things quite complex and hard to control and debug. So keep in the language LaTeX language domain, when you define your template:

Dear \MRMRS{} \NAME{}, we hope ...

Let's keep it simple and say that is our whole pattern, then the data set in text form might be

Mrs
Moneypenny

The C program might translate this input data into a form, that is known by the C language:

struct greeting {
    char* mrmrs;
    char* name;
};

Now the purpose of the C program (or whatever) is to translate that into the LaTeX language

\def\MRMRS{Mr}
\def\NAME{Moneypenny}

You can now read in the processed data into the LaTeX program, that is a true LaTeX program, not a template to-be-processed by whatever and every step from the raw data to the output document can be debugged separated from the other processes.

edited Jun 26 '15 at 13:17

answered Jun 26 '15 at 09:46

ikrabbe

603

The LaTeX compiler will run after the "feeding" is finished in my case. I feed the .tex file sequentially and after this is done, I run pdflatex on the complete .tex file. The question is how to repeat / feed LaTeX commands without hardcoding them outside of the .tex file? I think about using (La)TeX own control mechanisms somehow (loops, if-else), I'm just not sure on how to do this efficiently. – Foo Bar Jun 26 '15 at 09:54
Outside of what TeX file? You need to keep the TeX commands somewhere, and mix them with plain text from elsewhere. IMO this is called a template. – Alexey Jun 26 '15 at 09:56
@Alexey I have a termplate tex file, that has keywords in it. My C program does this: copy the template to a new name, subsitute the keywords with real data, and finally run pdflatex. The next time the C program runs, it does the same (copy template to working copy, replace Keywords with data, run pdflatex). Now, the problem is that some tables are not fixed-length, so I can not define 10 keywords, I need to define X keywords (variable). This is obviusoly not possible without using control structures somewhere in the LaTeX document. – Foo Bar Jun 26 '15 at 09:59
I see, but i do not understand why you would not like to try an existing template engine instead of inventing one with keyword substitution. I suggested eRuby because it is made to generate plain text file, it is not specific to HTML, unlike some others. – Alexey Jun 26 '15 at 10:02
@FooBar please give us a concrete example, that we can figure the set of objects / data, you want to loop over or the flags you want to branch at. But if I look at your data set in your question, you might be able to write a macro that loops over the tokens of your data and produce TeX tokens from the data. But in general, TeX isn't that good in formatting any data. Many characters have special meaning, that have to be escaped or otherwise replaced, which is possible. But your idea to pre-format the data with a C program is not the worst, in my opinion. – ikrabbe Jun 26 '15 at 10:02
For processing text, i would have used some high level language instead of C. – Alexey Jun 26 '15 at 10:04
That's a matter of taste. For simple string formatting, C isn't the worst, but many jobs can be done with a short awk script. There are many libraries for C to support you with such jobs. The template approach of golang is one of the most powerful things, you can handle complex data and format relations with. – ikrabbe Jun 26 '15 at 10:13
The C program in my case does quite a lot and is not simple. It originated 20 years or so ago and is in a production environment since then. I can't throw it away or exchange it by something else, only change the part that does the output to the text file. – Foo Bar Jun 26 '15 at 10:40
@FooBar, then you may still try to apply the usual approaches of constructing a text file or stream with templates. In particular, how about partials? Instead of having a single template (with keywords), you may have one main template/layout and several partial templates. You can then use partial templates to wrap each line or cell in LaTeX and then substitute the combined result into the main template. – Alexey Jun 26 '15 at 12:39
For my latest automated project, I used my language of choice, Perl, and the HTML::Template module to make the templates. Don't let the HTML name distract you it can be used for many other things. It has simple if constructions and loops. For example a placeholder can be inserted via <TMPL_VAR NAME>, and then when filling in the template from Perl, I have to specify what NAME. This generates the LaTeX file (or files) that I just run though pdflatex afterwards. – daleif Jun 26 '15 at 12:46
@Alexey This was an option I also thought about. And it seems that it will be this or hardcoding LaTeX in the C source (or a mixture of both). The main reason for the original question was to find out if there's any LaTeX-own technology that one could use to make life easier. For example something like using \foreach \foo in KEYWORDS (where KEYWORDS gets replaced by C program), so that I can control the LaTeX source parameters from the C code, to make the "sequential" (sorry for this term) data input easier. – Foo Bar Jun 26 '15 at 12:50
Maybe LuaLaTeX could be helpful too? I know very little about it, but it seems that it should be easier to program with than usual LaTeX. – Alexey Jun 26 '15 at 13:14

score 0 · Answer 2 · edited Apr 13 '17 at 12:34

0

To "keep layout and logic separated as much as possible", a cleaner approach would be to have your program write XML instead of raw text.

This can then be transformed (probably using XSL) to LaTeX or any other output format you like, including raw text.

Of course, this takes additional time.

For a quick and dirty solution: If the column text is fixed width, you might consider using tabbing instead of tabular. That way, you just define the maximum numbers of columns, but you use only those you need. So there is no need to count the columns.

An advanced example: setting tabs by declaration rather than by example

edited Apr 13 '17 at 12:34

Community

1

answered Jun 26 '15 at 10:56

user24582

191

Some people, me too, think of XML as a failure by design: http://harmful.cat-v.org/software/xml/. But principally the pre-processing outside of TeX is a useful approach. I edit my answer to reflect this purpose. – ikrabbe Jun 26 '15 at 11:53

How to sequentially "feed" a LaTeX document from external data source?

2 Answers2

Automatic processing of data for TeX / LaTeX

To improve your original approach