LaTeX3: efficient way to remove spaces in front of a command

Question

If I have a LaTeX3 string containing:

    def f():
        return 42
def g():
    return 43

I would like to turn it into:

def f():
    return 42
def g():
    return 43

i.e., remove the maximum number of leading spaces/tab (forgetting about empty lines containing only spaced/tabs, I count TAB and space as the same unit) to make sure that I removed the same number of spaces on every line without removing actual text.

What is the most efficient solution to do that? (ideally in LaTeX3) I was thinking to use regexp like ^[ \t]$ to check if the line contains only spaces, and then to count on each line the number of leading space with another regex like ^[ \t], but regexp are so inefficient in LaTeX that it is not really a viable solution.

MWE

\documentclass[]{article}
\usepackage{verbatim}
\usepackage{xsimverb}
\begin{document}
\ExplSyntaxOn
\ior_new:N \g__robExt_read_ior%
\NewDocumentEnvironment{getString}{}{%
  \XSIMfilewritestart*{\jobname-tmp-file-you-can-remove.tmp}%
}{%
  \XSIMfilewritestop%
  \ior_open:Nn \g__robExt_read_ior {\jobname-tmp-file-you-can-remove.tmp}%
  %% Put the file in l__robExt_tmp_contain_file_str
  \str_clear_new:N \l__robExt_tmp_str%
  \ior_str_map_inline:Nn \g__robExt_read_ior {%
    \str_gput_right:Nx \l__robExt_tmp_str {##1^^J}%
  }%
  \str_gset_eq:cN {l__my_string_str} \l__robExt_tmp_str%
}%
\NewDocumentCommand{\printVerbatimString}{}{%
  % For some reasons, newlines are displayed as \Omega. We need to replace them with \
  % https://tex.stackexchange.com/questions/694716/print-latex3-string-verbatim/694717
  \tl_set_eq:NN \l__robExt_tmp_tl \l__my_string_str
  \tl_replace_all:Nnn \l__robExt_tmp_tl {^^J} { \mbox{}\par }%mbox needed to print empty lines
  \tl_replace_all:Nnn \l__robExt_tmp_cl { ~ } { \  }
  \begin{flushleft}\ttfamily%
    \l__robExt_tmp_tl
  \end{flushleft}%
}
\ExplSyntaxOff
\begin{getString}
    def f():
        return 42
def g():
    return 43

\end{getString}
\noindent Before:
\printVerbatimString
Goal:
\begin{getString}
def f():
    return 42
def g():
    return 43
\end{getString}
\printVerbatimString
\end{document}

Can we assume this is read in the normal way? If so, tabs are spaces and it's trivial — Joseph Wright, Feb 19 '24 at 15:20
If tabs are too hard to deal with, we can start with spaces only. But in practice I get this string from xsim, by writting it first to a file and reading in latex3 that file (maybe not the more efficient solution, but at least it works, is robust, and is simple). I use it in my robust-externalize library. But I'm very surprised to see that it is trivial, is there a builtin function for that? — tobiasBora, Feb 19 '24 at 15:26
Why are you using the suffix _str for what must be a _tl? — egreg, Feb 19 '24 at 16:09
@egreg Why should it be a _tl? I want to consider it as a string (i.e. only a sequence of basic letters), and I thought that \ior_str_map_inline was always providing str so that tl_to_str is not useful? — tobiasBora, Feb 19 '24 at 16:17
if these are strings then why are you using \tl_set_eq:NN on them? — Ulrike Fischer, Feb 19 '24 at 16:43
Just naming a token list with the suffix _str doesn't make it into a string. And if you try \str_show:N \l__robExt_tmp_str you'll see an error message. And that token list contains much more than a simple list of characters. — egreg, Feb 19 '24 at 17:00
@UlrikeFischer I use \tl_set_eq only in the printing part, my goal being to convert the string to a token list, as this time I want to add tokens like \par for new lines etc… — tobiasBora, Feb 19 '24 at 17:17
@egreg oh, do you mean in the printing part or in the getString environment? In the printing part this is indeed a typo I made (fixed). On the getString however, I would expect this to be a string, what makes it a non-string? — tobiasBora, Feb 19 '24 at 17:18
@tobiasBora How is it fixed? \tl_set_eq:NN \l__robExt_tmp_tl \l__my_string_str is wrong .... — cfr, Feb 19 '24 at 17:30
@cfr why? I thought str was a "subset" of tl, so that I can just create a tl from a str using \tl_set_eq. I can't find any \str_to_tl (white \tl_to_str does exist), so I have no idea how to make this correct otherwise. — tobiasBora, Feb 19 '24 at 18:33
@tobiasBora I'm not sure whether \tl_set_eq:NN \l_tmpa_tl \l_tmpa_str is correct, however, I can guarantee that \tl_set:NV \l_tmpa_tl \l_tmpa_str is correct (though slower). — Skillmon, Feb 19 '24 at 19:29
You're using global assignments on multiple local variables. Please note that a variable starting with l should only be assigned to locally, a global variable should be named with a leading g and only be assigned to globally. (\str_gset_eq:cN and \str_gput_right:Nx are global assignments!) — Skillmon, Feb 19 '24 at 19:33
The variable \l__my_string_str isn't declared, the fact that the first assignment to it is done via c-expansion doesn't change that. Same is true for other variables. Please declare variables before using them. — Skillmon, Feb 19 '24 at 19:34

score 4 · Accepted Answer · answered Feb 19 '24 at 21:08

The following does what you want. It might not be the fasted method to do it, but it should be rather straight forward.

I count the leading symbols to strip by counting the spaces in the string before and after removing all leading ones. The space remover simply loops for the minimum count of leading spaces removing one space or tab at a time.

\documentclass[]{article}
\usepackage{verbatim}
\usepackage{xsimverb}
\ExplSyntaxOn
\ior_new:N \g__robExt_read_ior
\tl_new:N \l__robExt_tmp_tl
\str_new:N \l__robExt_tmp_str
\int_new:N \l__robExt_whitespaces_int
\seq_new:N \l__robExt_lines_seq
\str_new:N \g__my_string_str
\group_begin:
\char_set_catcode_other:N ^^I
\cs_new_protected:Npn __robExt_count_leading_whitespace:n #1
  {
    \str_set:Nn \l__robExt_tmp_str {#1}
    \str_replace_all:Nnn \l__robExt_tmp_str { ^^I } { ~ }
    \tl_if_blank:VF \l__robExt_tmp_str
      {
        \int_set:Nn \l__robExt_whitespaces_int
          {
            \int_min:nn
              { \l__robExt_whitespaces_int }
              {
                \str_count_spaces:N \l__robExt_tmp_str -
                \str_count_spaces:e
                  { \exp_last_unbraced:NV \use:n \l__robExt_tmp_str {} }
              }
          }
      }
  }
\cs_generate_variant:Nn \str_count_spaces:n { e }
\cs_new:Npn __robExt_whitespace_trimmer:n #1
  {
    \use:e
      {
        \exp_not:n { __robExt_whitespace_trimmer:nN {#1} }
        \prg_replicate:nn
          { \l__robExt_whitespaces_int - 1 }
          { \exp_not:N __robExt_whitespace_trimmer:nN }
      }
    \use:n
    \prg_break_point:
  }
\cs_new:Npn __robExt_whitespace_trimmer:nN #1#2
  {
    \tl_if_empty:nT {#1} \prg_break:
    \tl_if_head_is_space:nTF {#1}
      { \exp_args:No #2 { __robExt_whitespace_trimmer:w #1 } }
      {
        \tl_if_head_eq_charcode:nNTF {#1} ^^I
          { \exp_args:No #2 { \use_none:n #1 } }
          { \prg_break:n {#1} }
      }
  }
\exp_last_unbraced:NNo \cs_new:Npn __robExt_whitespace_trimmer:w \c_space_tl {}
\group_end:
\NewDocumentEnvironment { getString } {}
  { \XSIMfilewritestart*{\jobname-tmp-file-you-can-remove.tmp} }
  {
    \XSIMfilewritestop
    \ior_open:Nn \g__robExt_read_ior { \jobname-tmp-file-you-can-remove.tmp }
    %% Put the file in l__robExt_tmp_contain_file_str
    \seq_clear:N \l__robExt_lines_seq
    \int_set_eq:NN \l__robExt_whitespaces_int \c_max_int
    \ior_str_map_inline:Nn \g__robExt_read_ior
      {
        __robExt_count_leading_whitespace:n {##1}
        \seq_put_right:Nn \l__robExt_lines_seq {##1}
      }
    \int_compare:nNnT \l__robExt_whitespaces_int > \c_zero_int
      {
        \seq_set_map_e:NNn \l__robExt_lines_seq \l__robExt_lines_seq
          { __robExt_whitespace_trimmer:n {##1} }
      }
    \str_gset:Nx \g__my_string_str { \seq_use:Nn \l__robExt_lines_seq { ^^J } }
  }
\NewDocumentCommand \printVerbatimString {}
  {
    % For some reasons, newlines are displayed as \Omega. We need to replace them with \
    % https://tex.stackexchange.com/questions/694716/print-latex3-string-verbatim/694717
    \tl_set_eq:NN \l__robExt_tmp_tl \g__my_string_str
    \tl_replace_all:Nnn \l__robExt_tmp_tl {^^J} { \mbox{}\par }%mbox needed to print empty lines
    \tl_replace_all:Nnn \l__robExt_tmp_tl { ~ } { \  }
    \begin{flushleft}\ttfamily
      \l__robExt_tmp_tl
    \end{flushleft}
  }
\ExplSyntaxOff
\begin{document}
\begin{getString}
    def f():
        return 42
def g():
    return 43

\end{getString}
\noindent Before:
\printVerbatimString
Goal:
\begin{getString}
def f():
    return 42
def g():
    return 43
\end{getString}
\printVerbatimString
\end{document}

Who awesome, thanks a lot! I'm still trying to wrap my head around this solution. First, I'm not sure to understand how you count leading spaces, notably, what is \exp_last_unbraced:NV \use:n \l__robExt_tmp_str {} exactly doing? For the formula to work, I guess it should output the part of the string after the first spaces, but I have to admit I don't see why it works… Also, once you have the number of spaces to trim, why do you need this complex whitespace trimmer? Can't you "just" use \str_range? — tobiasBora, Feb 20 '24 at 09:49
@tobiasBora it assumes that the number stored in \l__robExt_whitespaces_int might be wrong and does never trim a non-whitespace character, hence the complicated trimmer. Most likely in your use case \str_range:nnn is fine and you can use it instead. — Skillmon, Feb 20 '24 at 09:57
I see, thanks. And what about \exp_last_unbraced:NV \use:n \l__robExt_tmp_str {}? — tobiasBora, Feb 20 '24 at 10:06
\exp_last_unbrace:NV expands the value \l__robExt_tmp_str before \use:n runs. Then \use:n grabs the first none-space token of the string and reinserts it (because of TeX-rules that an undelimited parameter ignores spaces), the {} at the end is used if the string is blank but doesn't change the count of spaces, hence is fine. — Skillmon, Feb 20 '24 at 10:11
I see, thanks a lot! Actually, I'm just realizing: you directly plug yourself on \ior_str_map_inline:Nn to iterate over the lines of the file, but in practice I do this operation much later (the file has already been converted to a single string). Can I easily & efficiently loop over all lines of a string? (the only guarantee I have is that all of them end with ^^J) The str_map operate character by character and your solution assumes that we can directly parse the whole string (I assume it will also be more efficient if we can directly parse the whole line at once instead of per char). — tobiasBora, Feb 20 '24 at 11:51
\seq_set_split:Nnn with ^^J as separator, then use \seq_map_... to count the whitespaces before the \seq_set_map_e:NNn, rebuild the string by \str_set:Ne with \seq_use:Nn with ^^J. — Skillmon, Feb 20 '24 at 12:09
Oh good point, I was trying to find this function among the \str_* part of the documentation. I'll try that, thanks a lot! — tobiasBora, Feb 20 '24 at 12:15
(actually, I guess I want \seq_set_split_keep_spaces to keep the spaces) — tobiasBora, Feb 20 '24 at 12:16
@tobiasBora yes, you want the keep_spaces variant, sorry my bad :) — Skillmon, Feb 20 '24 at 12:30

score 1 · Answer 2 · answered Feb 19 '24 at 22:33

1

You can use autogobble option in minted which does exactly this. minted requires --shell-escape though.

\documentclass{article}
\usepackage{minted}
\begin{document}
\begin{minted}{python}
    def f():
        return 42
def g():
    return 43

\end{minted}
\bigskip
\begin{minted}[autogobble]{python}
    def f():
        return 42
def g():
    return 43

\end{minted}
\end{document}

answered Feb 19 '24 at 22:33

David Carlisle

757,742

Thanks, but the problem is that I would need to do it without shell escape (I ideally need to compute the hash of the string in pure LaTeX, this is needed for my library robust-externalize) – tobiasBora Feb 20 '24 at 08:17
1

@tobiasBora No problem I suspected as much but I thought I'd post anyway as others may land here and minted does have that specific option. – David Carlisle Feb 20 '24 at 08:58

LaTeX3: efficient way to remove spaces in front of a command

2 Answers2