The 164 paragraphs in kantlipsum are stored in an expl3 sequence, with a total size of 147957 bytes.
I tried concatenating five copies of it, getting a sequence with 820 items and a total size of 739785 bytes.
The impact on memory is
9810 strings out of 492609
192626 string characters out of 6129049
3060841 words of memory out of 5000000
13800 multiletter control sequences out of 15000+600000
Just loading expl3 shows
9774 strings out of 492609
191933 string characters out of 6129049
210796 words of memory out of 5000000
13768 multiletter control sequences out of 15000+600000
You can probably load the strings for a language on demand, by storing them in separate files, so the impact would not be so big, being about 21 kiB. Having separate files for each language would also ease maintenance.
Accessing 300 item sequences is not so fast, but you can use csnames instead.
Here's a comparison: I first define a 300 item sequence, then 300 token lists; then I benchmark accessing random items.
\documentclass{article}
\usepackage{xparse}
%\usepackage{kantlipsum}
\usepackage{l3benchmark}
\ExplSyntaxOn
\int_step_inline:nn { 300 } { \seq_put_right:Nn \l_tmpa_seq { #1 } }
\int_step_inline:nn { 300 }
{
\tl_new:c { l_test_#1_tl }
\tl_set:cn { l_test_#1_tl } { #1 }
}
\begin{document}
\benchmark:n { \seq_item:Nn \l_tmpa_seq { \int_rand:n { 300 } } }
\benchmark:n { \tl_use:c { l_test_\int_rand:n { 300 }_tl } }
The result is
3.47e-4 seconds (1.08e3 ops)
5.6e-6 seconds (17.5 ops)
so there is a factor 100 in favor of the second method. Accessing the last item in the sequence takes essentially the same time as accessing a random one. For the second method, accessing one or another item is the same.
kantlipsumare stored in anexpl3sequence, totaling 147957 bytes. In your case, a sequence for each language doesn't seem too big. – egreg Jun 07 '19 at 20:17