l3regex - Line by line splitting

Question

I'd like to analyse the contents of an environment line by line, but the following minimalistic testing code fails by "printing" false.

\documentclass{article}
\ExplSyntaxOn
\NewDocumentEnvironment{linebyline}{b}{
    \seq_new:N \l_temp_seq
    \regex_split:nnNTF { \n } { #1 } \l_tmpa_seq
        { true }
        { false }
}{}
\ExplSyntaxOff
\begin{document}
\begin{linebyline}
% Comment
    Line 1
    Line 2
% Comment
    Line 3
\end{linebyline}
\end{document}

tex is tex..., you need \obeylines or set \endlinechar or ... — David Carlisle, Dec 17 '23 at 22:56
I will investigate this. Thanks for showing me the light... :-) — projetmbc, Dec 17 '23 at 22:57
And don't use regex for this. Splitting at a single character can be done with \seq_set_split:Nnn or \seq_set_split_keep_spaces:Nnn. — Skillmon, Dec 18 '23 at 08:45
@projetmbc still shows a tool too complicated and wasteful for the job... Your request never needed regex if it's just about splitting lines, and I wanted to make sure you realise this. l3regex is brilliant code, don't get me wrong, but it is used "in the wild" way too often for things simpler tools can do just as well yet hundreds of times faster. — Skillmon, Dec 18 '23 at 13:23
Due to a precise design choice of TeX, how the input is split across lines is completely irrelevant, so long as line breaks correspond to spaces in output. So it's essentially meaningless to “analyze the input line-by-line”. — egreg, Dec 19 '23 at 15:19

projetmbc · Answer 1 · 2023-12-18T22:54:34.900

Here's a solution with a command rather than an environment, which is not my need, and using regular expressions instead of seq_set_split:Nnn or seq_set_split_keep_spaces:Nnn.

Any advice to a split-seq solution is welcome.

\documentclass{article}
\ExplSyntaxOn
\NewDocumentCommand{\linebyline}{+v}{
    \seq_new:N \l_temp_seq
    \regex_split:nnN {^^M} {#1} \l_tmpa_seq 
    \seq_use:Nn \l_tmpa_seq { :: }
}{}
\ExplSyntaxOff
\begin{document}
\linebyline{
% Comment
    Line 1
    Line 2
% Comment
    Line 3
}
\end{document}

This code outputs:

::% Comment::Line 1::Line 2::% Comment::Line 3::

Steven B. Segletes · Answer 2 · 2023-12-19T20:43:59.770

I'm not really sure what you are looking for, but with \obeylines in place within the environment, a token cycle can be used to search for the line ends and emplace (as in your example) a :: between lines.

\documentclass{article}
\usepackage{tokcycle}
{\obeylines
\gdef\mycr{
}}
\def\myenvname{linebyline}
\newenvironment{\myenvname}{\obeylines\catcode`\%=12 \tokencycle
  {\addcytoks{##1}}
  {\processtoks{##1}}
  {%
  \expandafter\ifx\mycr##1\addcytoks{::}\else
    \ifx\end##1
      \tcpop\z\tcpushgroup\z%
      \ifx\z\myenvname
        \tcpush{\noexpand\endtokcycraw##1}%
      \else\addcytoks{##1}\fi
    \else\addcytoks{##1}\fi
  \fi}
 {\addcytoks{##1}}}{}
\begin{document}
\begin{linebyline}
% Comment
    Line 1
    Line 2
% Comment
    Line 3
\end{linebyline}
Back to
normal
text.
\begin{linebyline}
% Comment
    Line 1 \today
    Line 2\begin{itemize} \item xxx\end{itemize}
% Comment
    Line 3
\end{linebyline}
Back to % absolutely
normal
text.
\end{document}

If one desires the parsed content to not be executed, but instead detokenized, and the line content of each line collected, one can do this:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{tokcycle}
{\obeylines
\gdef\mycr{
}}
\def\myenvname{linebyline}
\newenvironment{\myenvname}{\obeylines\catcode`\%=12 \tokencycle
  {\addcytoks{\string##1}}
  {\addcytoks{\{}\processtoks{##1}\addcytoks{\}}}
  {%
  \expandafter\ifx\mycr##1
    \mbox{}\\Input line: ``\the\cytoks''% <-CURRENT INPUT LINE
    \cytoks{}%
  \else
    \ifx\end##1
      \tcpop\z\tcpushgroup\z%
      \ifx\z\myenvname
        \tcpush{\noexpand\endtokcycraw##1}%
      \else\addcytoks{\detokenize{##1}}\fi
    \else\addcytoks{\detokenize{##1}}\fi
  \fi}
  {\addcytoks{##1}}}{}
\begin{document}
\begin{linebyline}
% Comment
    Line 1
    Line 2
% Comment
    Line 3
\end{linebyline}
Back to
normal
text.
\begin{linebyline}
% Comment
    Line 1 \today
    Line 2\begin{itemize} \item xxx\end{itemize}
% Comment
    Line 3
\end{linebyline}
Back to % absolutely
normal
text.
\end{document}

Thanks for this proposition. Concretely, I want to use a DSL (Domain Specific Language) to type easily tables of variation and/or signs of real functions. In my partial answer, the l3 sequence will serve to analyse each line of the DSL. — projetmbc, Dec 19 '23 at 07:24
@projetmbc In my implementation, the code of each line is actually executed (as in the itemize). However, I could just as easily have \stringed the output, making it more verbatim-like, if that were preferable. — Steven B. Segletes, Dec 19 '23 at 19:24

l3regex - Line by line splitting

2 Answers2