If I want to store a lookup table in LaTeX3, what is a better way to do so than an l3prop?

Question

As described in How can I correctly expand the result of \prop_item:Nn to work with my own \DoSomething function? I am currently working on a topic that requires a lookup function in LaTeX. Basically I have an information which can be accessed through three levels and I want to give the user the option to use each level to access the information. Until now I had three functions, one allowed direct access to level 1. One allowed access through level 2 and looked up the corresponding information on level 1. The latter did the same for level 3.

However now I want multiple functions to access the information and hence I separated data from function, this is where the problem described in the other question came up. But since my prop is now 2200 items long, the lookup in the prop takes about 3 times longer than using a case compare function, even though I am using a const l3prop:

prop_const_from_keyval:Nn \l_tobisbs_lookup_prop
{
    {key1} = {value1},
    {key2} = {value2},
}

So I guess the comment in the documentation, that this is not the proper datatype for this purpose is true, but what would be a good datatype or general solution to store this type of information and use it in multiple functions in LaTeX3?

I think this is going to come down to https://tex.stackexchange.com/questions/147966/how-to-implement-low-level-arrays-in-tex: for larger tables needing high performance, there's little option but to hand-tune. — Joseph Wright, Jan 12 '22 at 11:01
The problem with a prop is that in the underlying implementation, everything is stored in a single macro, so every time you want to do any operation on it, you have to work on the entire thing (2200 items is huge). The other solution here would be to store each item in a \l_tobi_prop_<name>_tl variable, which would give you minimal access time — Phelype Oleinik, Jan 12 '22 at 11:04
I was going to say same as @PhelypeOleinik you could use csnames, then access via c arguments to construct the variable name from the"key" in each case. This gives fast access to individual items but essentially no way to loop over all defined items unless you store the list separately, so it depends.... — David Carlisle, Jan 12 '22 at 11:11
@UlrikeFischer I am referring to TEXhackers note: This function iterates through every key–value pair in the ⟨property list⟩ and is therefore slower than using the non-expandable \prop_get:NnNTF. as I am using \prop_if_in_p:Nn at one spot. Would it actually be faster to use \prop_get:NnN and test for \q_no_value? — TobiBS, Jan 12 '22 at 12:14
@TobiBS Yes. \prop_item:Nn is slower than \prop_get:NnNTF because the former has to search for the labels in the prop one by one (linear time), while the latter can use a delimited macro to go directly to the desired item (constant time), so it's much faster for longer lists. Using \prop_get:NnN or \prop_get:NnNTF has little impact on performance: it's a matter of which serves you best — Phelype Oleinik, Jan 12 '22 at 12:29
@TobiBS but this comment doesn't say that it isn't the right datatype, only that you are using the slower access function. — Ulrike Fischer, Jan 12 '22 at 13:52
@PhelypeOleinik Considering your comments above, how large would you say it'd become "too much" for a property list? Do you guys have any data/benchmarks on this sort of thing? (My real use case with which I'm concerned in asking is that of some lists in the 150-250 length range, on which I do a little setting and a lot of querying, should I consider moving this structure to, e.g. individual macros, or something else?) — gusbrs, Jan 12 '22 at 19:22
@DavidCarlisle ^^^ Considering your earlier comment in the chat about this, the question extends to you. — gusbrs, Jan 12 '22 at 19:23
@gusbrs I haven't done any benchmarks, and I think that's not the biggest issue. Firstly, if you need expandability, you can only use \prop_item:Nn and that's it. If that's not an issue, then I'd probably switch to \prop_get:NnN(TF), which has a much (much) smaller issue with large lists. I'd guess things would only start to get slow with \prop_get:NnN(TF) with hundreds of thousands of accesses, in which case I'd use a dedicated method. Otherwise I wouldn't bother changing the data type — Phelype Oleinik, Jan 12 '22 at 19:31
@PhelypeOleinik Thanks for your comment. Well, most of my querying (almost all of it, and all the most frequent) is being done with \prop_get:NnN(TF). But I do need \prop_item:Nn occasionally. The thing is, property lists are terribly convenient, but just testing the alternative means to forgo this convenience, and if you ever "test" it, why go back? ;-) — gusbrs, Jan 12 '22 at 19:39
@gusbrs I'm assuming you're talking about zref-clever... I'd test a document, and see if it gets noticeably slower but using/not using your package. I doubt so, but it's worth a try. If the difference is below the tenths of a second, the time saved running the document won't offset the time you spent changing the datatype (and probably inserting bugs :) — Phelype Oleinik, Jan 12 '22 at 19:49
@PhelypeOleinik Yes ;-). Well, it is not a "light" package by now... So I definitely should watch for performance, or at least not abuse on resources where there's no need to. Your suggestion is a good one to at least have a rough idea of how things are faring, thanks! — gusbrs, Jan 12 '22 at 20:00
Thank you all for your input, I changed my code to make use of \prop_get:NnN(TF) and it is actually faster than my old implementation through a \str_case:xnF. Hence I will stick with that solution and think the l3prop with the right access function is the solution for my specific problem. So thank you @UlrikeFischer @Phelype Oleinik @gusbrs it is great to have you! — TobiBS, Jan 12 '22 at 21:22
@PhelypeOleinik I did make the move after all. A lot of work (and probably some extra bugs to squash in the mid term), but in my case I think it was worth it. And I can observe some global performance difference in a test file as the one you suggested (I was doing a lot of querying on reasonably sized property lists). Another gain I did not anticipate is that expandability of some basic operations, even when I could do without it, does offer some opportunities for "cleaner code". If you'd like to know more details, email/ping me. And thanks again for the insight! — gusbrs, Jan 19 '22 at 20:07

Ulrich Diez · Answer 1 · 2022-01-13T13:17:47.727

I don't know if this would be faster, but if the database is really very huge, you could probably organize it in a MySQL- or MariaDB-server.

Then you could use LuaLaTeX with "--shell-escape" and via assert call a commandline-SQL-client for retrieving values as strings from the database. The strings in turn can be written to TeX-level. If the strings contain only things that shall be tokenized as character-tokens or if you don't mind relying on backslash having category code 0, you can probably use tex.sprint for this:

\documentclass{article}
\usepackage{verbatim}
\makeatletter
\let\percentchar=@percentchar
\makeatother
\newcommand\CallExterrnal[1]{%
  \directlua{ 
    function runcommand(cmd) 
    local fout = assert(io.popen(cmd, 'r')) 
    local str = assert(fout:read('*a')) 
    fout:close() 
    return str 
    end 
    tex.sprint (runcommand("#1"))
  }%
}%
\begin{document}
% Instead of the echo-command a commandline-SQL/MariaDB-client
% could be called for retrieving values of a database.
Let's catch the output of \verb|echo \LaTeX| between parentheses:
(\CallExterrnal{echo \string\\string\LaTeX\percentchar})
Let's catch the output of \verb|echo \LaTeX| into a macro definition:
\expandafter\expandafter\expandafter\def
\expandafter\expandafter\expandafter\test
\expandafter\expandafter\expandafter{%
  \CallExterrnal{echo \string\\string\LaTeX\percentchar}%
}
\texttt{(\string\test=\meaning\test)}
\end{document}

You may need to add the commandline-SQL-client to shell_escape_commands = in texmf.cnf.

Probably another approach might be having knitr create the .tex-file for you from an .rnw-file or .Rtex-file where R-code and TeX-code is merged and where you have R-code-chunks which serve the purpose of calling the local commandline-SQL-client for retrieving values from the database and placing them into the .tex-file. (knitr works out of the box with overleaf if the file is named .Rtex.)

<<templates, include=FALSE, cache=FALSE, echo=FALSE, results='asis'>>=
knitr::opts_template$set(
  CallExternalApp = list(include=TRUE, cache=FALSE, echo=FALSE, results='asis')
)
@
<<CallExternal, include=TRUE, cache=FALSE, echo=FALSE, results='asis', >>=
CallExternalApp <- function(A) {
    return(cat(system(A, intern = TRUE), sep="", fill=FALSE))
}
@
\documentclass{article}
\begin{document}
Instead of calling \verb|echo| there could probably be a call to a commandline-SQL -client.
\bigskip
In Linux with the \verb|echo|-command you need to escape backslash, i.e., you need to type
two backslashes to get one. Same for R-code-chunks. Thus if you wish R to send an
\verb|echo|-command that produces a backslash, you need to type four backslashes.
\bigskip
Let's catch the output of \verb|echo \LaTeX| between parentheses:
(%
<<, opts.label='CallExternalApp' >>= 
<<CallExternal>>
CallExternalApp("echo \\LaTeX\%")
@
)
\bigskip
Let's catch the output of \verb|echo \LaTeX| into a macro definition:
\def\test{%
<<, opts.label='CallExternalApp' >>= 
<<CallExternal>>
CallExternalApp("echo \\LaTeX\%")
@
}
\texttt{(\string\test=\meaning\test)}
\end{document}

Thank you Ulrich, nice solution which I might use in the future for really large chunks of data. But as this is going to be shipped out as part of a LaTeX package, I will stick to the LaTeX3 approach and use an l3prop. — TobiBS, Jan 12 '22 at 21:24
@TobiBS In case you also accept solutions not based on LaTeX3 you might consider taking a look at the datatool package: You can create databases from within LaTeX or by reading from external .csv-file and, e.g., via \DTLforeach loop through them and do all kinds of things. ;-) — Ulrich Diez, Jan 13 '22 at 03:08
This is an interesting package! However while I might try it for my next project, I‘d still stay with LaTeX3 for this. — TobiBS, Jan 13 '22 at 12:23

If I want to store a lookup table in LaTeX3, what is a better way to do so than an l3prop?

1 Answers1