1

A follow-up to my earlier question: Macro to replace text with random string of same length

Thanks to the answer of @Mico, we now have a macro in Lua to replace a UTF-8 String with random characters. However, one issue is that when presented with a macro, the code assumes that the characters \...{ and }as well as \... are all counted for obfuscation. This is problematic because for wireframing, it results in random strings longer than in the ordinary text. Is there a way to get xyz and \textit{xyz} to have hte same length randomised ASCII output?

The MWE (credit to @Mico) is below:

% !TEX TS-program = lualatex
\documentclass{article}
\usepackage{luacode} % for 'luacode' environment and '\luastring' macro
\begin{luacode}
function rndstring ( inputstring )
  local outputstring, choices, mm, nn
  mm = unicode.utf8.len(inputstring) -- no. of utf8-encoded characters in input string

-- Place candidate replacement characters in a Lua table: choices = { "0"," }--substantially simplified to reduce size -- Number of rows in 'choices' table nn = #choices

-- Generate the outputstring in a 'for' loop: outputstring = "" for i = 1 , mm do if unicode.utf8.sub ( inputstring , i , i ) == " " then outputstring = outputstring .. " " -- preserve space char. else -- choose a new char randomly from 'choices' table outputstring = outputstring .. choices[ math.random ( nn ) ] end end

return ( outputstring ) end

\end{luacode}

%% Define a LaTeX macro to invoke the Lua function
\newcommand\rndstring[1]{\directlua{tex.sprint(rndstring(\luastring{#1}))}}

\begin{document}
\ttfamily
\rndstring{This is a string.}
\rnstring{\textit{This is a String}}
%%%% These two Strings should be (but aren't) the same length
\end{document}
Niranjan
  • 3,435

2 Answers2

1

I think a pure LaTeX solution is better.

\documentclass{article}
\usepackage[T1]{fontenc}

\begin{document}

\ExplSyntaxOn

% specify what candidates are in the random replacement \def\RandomStringASCIIRanges{ %33-47, %48-57, 58-64, 65-90, 91-96, 97-122, %123-126 }

\seq_new:N \l_chrepl_all_repl_seq \clist_new:N \l_chrepl_tmpa_clist \int_new:N \l_chrepl_tmpa_int \tl_new:N \l_chrepl_tmpa_tl \tl_new:N \g_chrepl_tmpa_tl \tl_new:N \g_chrepl_tmpb_tl \tl_new:N \l_chrepl_rand_charcode_tl \tl_new:N \l_chrepl_head_tl

\cs_set:Npn __chrepl_parse_ascii_range:w |#1-#2| { \int_step_inline:nnn {#1} {#2} { \seq_put_right:Nn \l_chrepl_all_repl_seq {##1} } }

\cs_set:Npn __chrepl_parse_ascii_range:n #1 { __chrepl_parse_ascii_range:w |#1| }

% parse the ranges \clist_set:NV \l_chrepl_tmpa_clist \RandomStringASCIIRanges \clist_map_function:NN \l_chrepl_tmpa_clist __chrepl_parse_ascii_range:n

% construct an intarray for fast access \intarray_new:Nn \g_chrepl_repl_intarray {\seq_count:N \l_chrepl_all_repl_seq} \int_set:Nn \l_chrepl_tmpa_int {1} % loop index \seq_map_inline:Nn \l_chrepl_all_repl_seq { \intarray_gset:Nnn \g_chrepl_repl_intarray {\l_chrepl_tmpa_int} {#1} \int_incr:N \l_chrepl_tmpa_int }

\cs_set:Npn __chrepl_temp_var:n #1 { __g_chrepl_temp_#1_tl }

\cs_set:Npn __chrepl_group:n #1 { \exp_not:n { {#1} } }

% a recursive replacement algorithm \cs_set:Npn \chrepl_repl:Nnn #1#2#3 { \group_begin: \tl_if_empty:nF {#2} { % check if head is space % if head is space, insert it back \tl_if_head_is_space:nTF {#2} { \tl_gput_right:Nn #1 {\ } % recursive call (skip spaces) \exp_args:Nnx \chrepl_repl:Nnn #1 {\tl_trim_spaces:n {#2}} {#3} } {
\tl_if_head_is_group:nTF {#2} { % the results in this group needs to be written to a unique temp variable % clear the temp var. corresponding to this level \tl_gclear:c {__chrepl_temp_var:n {#3}} \chrepl_repl:cxx {__chrepl_temp_var:n {#3}} {\tl_head:n {#2}} {\int_eval:n {#3 + 1}} \tl_set_eq:Nc \l_chrepl_tmpa_tl {__chrepl_temp_var:n {#3}} \tl_gput_right:Nx #1 { \exp_args:NV __chrepl_group:n \l_chrepl_tmpa_tl } } { % extract the head \tl_set:Nx \l_chrepl_head_tl {\tl_head:n {#2}} \tl_if_empty:NF \l_chrepl_head_tl { % if head is control sequence, insert it back \exp_args:NV \token_if_cs:NTF \l_chrepl_head_tl { \tl_show:N \l_chrepl_head_tl \tl_gput_right:NV #1 \l_chrepl_head_tl } { % otherwise, do replacement % randomly pick a charcode from the intarray \tl_set:Nx \l_chrepl_rand_charcode_tl {\intarray_rand_item:N \g_chrepl_repl_intarray} % generate the corresponding character \tl_gput_right:Nx #1 {\char_generate:nn {\l_chrepl_rand_charcode_tl} {12}} } } } % recursive call \exp_args:Nnx \chrepl_repl:Nnn #1 {\tl_tail:n {#2}} {#3} } } \group_end: }

\cs_generate_variant:Nn \chrepl_repl:Nnn {cxx}

% user function \newcommand{\rndstr}[1]{ \tl_gclear:N \g_chrepl_tmpa_tl % used to store results \chrepl_repl:Nnn \g_chrepl_tmpa_tl {#1} {1} \tl_show:N \g_chrepl_tmpa_tl \tl_use:N \g_chrepl_tmpa_tl }

\ExplSyntaxOff

\texttt{\rndstr{Hello World}}

\texttt{\rndstr{Hello Владимир öäüß}}

\texttt{\rndstr{this \textsl{ab{\huge\bfseries cdef}gh}} nested groups.}

\texttt{\rndstr{this {ab{cdef}gh}} nested groups.}

\texttt{\rndstr{this abcdefgh nested groups.}}

\texttt{\rndstr{this {abcdefgh} nested groups.}}

\texttt{\rndstr{Once upon a time, there was ...}}

\end{document}

Alan Xiang
  • 5,227
1

Unfortunately people have the habit of processing TeX input as regular Lua strings which will always fail when TeX tokens come into play.

What makes this even more sad to see is that LuaTeX actually does already come with a builtin library to process TeX tokens. With this the code not only becomes a lot more compact but it's also absolutely trivial to differentiate between different types of tokens.

\documentclass{article}
\usepackage{luacode}
\begin{luacode}
local function rndstring()
    local toks = token.scan_toks()
for n, t in ipairs(toks) do
    if t.cmdname == "letter" then
        -- random number from printable ASCII range
        local r = math.random(33, 126)
        -- create new token with that character and catcode 12
        local letter = token.create(r, 12)
        -- replace old token
        toks[n] = letter
    end
end

token.put_next(toks)

end

local lft = lua.get_functions_table() lft[#lft + 1] = rndstring token.set_lua("rndstring", #lft, "global")

\end{luacode}

\begin{document}
\ttfamily
\rndstring{This is a string.}
\rndstring{\textit{This is a String}}
\end{document}

enter image description here

Henri Menke
  • 109,596
  • Thanks for this solution. Can you explain a little the meaning of token.put_next(toks) and token.set_lua? – projetmbc Feb 02 '22 at 23:03
  • Brilliant, thank you!!! – luaplaying Feb 03 '22 at 01:09
  • @projetmbc This is described in “10.6 The token library” in the LuaTeX manual: https://www.pragma-ade.nl/general/manuals/luatex.pdf#%232338 – Henri Menke Feb 03 '22 at 08:41
  • Note that # on non-sequence tables is undefined in Lua 5.2 https://stackoverflow.com/questions/23590885/why-does-luas-length-operator-return-unexpected-values?noredirect=1 (although currently I see lualatex uses Lua 5.3 so no problem there) – user202729 Jul 17 '22 at 14:46
  • @user202729 It should also be safe to assume that \luadef registers will be allocated incrementally, same as \count, \dimen, \skip, etc. – Henri Menke Jul 17 '22 at 17:35