13

In order to improve the \sameword feature of reledmac, I would like to find a way to automatically apply a command to every word. For example, in plain TeX,

\def\foo#1{#1 (#1)}
foo bar foo bar foo bar foo, foo? foo. foo;
\end

should be automatically transformed to

\def\foo#1{#1 (#1)}
\foo{foo} \foo{bar} \foo{foo} \foo{bar} \foo{foo} \foo{bar} \foo{foo}, \foo{foo}? \foo{foo}. \foo{foo};
\end

My constraints are:

  • excluding the punctuation mark
  • excluding some commands (for example foo\footnote{bar} should become \foo{foo}\footnote{bar} and not \foo{foo\footnote{bar}})
  • if possible, working with pdfTeX, XeTeX and LuaTeX, but a LuaTeX-only solution would be nice.
  • using any already existing LaTeX package (but a pure plainTeX is also accepted)
Maïeul
  • 10,984
  • Your second constraint is the difficult one... how to exclude some/all macros. – Steven B. Segletes Dec 01 '15 at 12:24
  • yes, I know. Notes that the list of excluded macro could be defined manually. – Maïeul Dec 01 '15 at 12:27
  • Related: http://tex.stackexchange.com/questions/253203/how-to-repeat-over-all-characters-in-a-string. My answer there also shows how to do "something" to each word. However, excluding macros would be an issue. – Steven B. Segletes Dec 01 '15 at 12:28
  • Your example caused my answer to fail!! finally realised it was the example that is wrong, plain tex \footnote takes (effectively) two arguments not one:-) see the usage in my update answer. – David Carlisle Dec 01 '15 at 12:59
  • 1
    Just wondering whether preprocessing the source could be an alternative …? The script would only have to check for duplicate words that are max. 80 or so characters apart, so I'd imagine it wouldn't take too long to run. – Florian Dec 01 '15 at 13:26
  • @Florian: that is what I suggested to people who asked me. But the question. – Maïeul Dec 01 '15 at 13:29
  • @DavidCarlisle Oh ! My bad. I will update my question – Maïeul Dec 01 '15 at 13:29
  • I wouldn't mind running an extra script, as long as I don't have to find all problematic words by hand. Such a script is still not trivial for non-programmers to write, so maybe a compromise would be to offer a ready-made script as part of the reledmac-distribution? Possibly with the option to let reledmac trigger it in a similar manner as e.g. imakeidx does for the index-processing? But any automatisation of this problem would be very nice! – Florian Dec 01 '15 at 13:36
  • @why not. It should not be very difficult for me to write such script in python (indeed, I have already one). However it would be not possible to have the same feature as imakeidx, as it should run before the first run of LaTeX, in order to have a correct .tex input – Maïeul Dec 01 '15 at 13:44

3 Answers3

14

It would be so easy to break this, but..

enter image description here

\def\foo#1{#1 (#1)}
\def\xfoo#1 {%
\def\tmp{#1}%
\ifx\tmp\endfoo
\let\next\relax
\else
\ftnfoo#1\footnote\empty\empty\empty\relax
\let\next\xfoo
\fi
\next}
\def\ftnfoo#1\footnote#2#3#4\relax{%
\ifx\footnote#4\foo{#1}\footnote{#2}{#3} \else\afoo#1?\relax\fi}
\def\afoo#1?#2\relax{\ifx?#2\foo{#1}? \else\bfoo#1,\relax\fi}
\def\bfoo#1,#2\relax{\ifx,#2\foo{#1}, \else\cfoo#1;\relax\fi}
\def\cfoo#1;#2\relax{\ifx;#2\foo{#1}; \else\dfoo#1.\relax\fi}
\def\dfoo#1.#2\relax{\ifx.#2\foo{#1}. \else\foo{#1} \relax\fi}


\vsize=5\baselineskip

% assumes a space before the par
\def\yfoo#1\par{\xfoo#1!@ \par}

\def\endfoo{!@}

\yfoo
foo bar foo bar foo\footnote{$^1$}{bar} bar foo, foo? foo. foo;

\bye
David Carlisle
  • 757,742
10

IMHO you need something like this:

\def\everyword#1#2{\let\domacro=#1\everywordA#2 {} }
\def\everywordA#1 {\ifx^#1^\else
   \def\tmp{}\everywordB #1\end
   \expandafter\everywordA \fi
}
\def\everywordB{\futurelet\next\everywordC}
\def\everywordC{\ifcat\noexpand\next A\expandafter\everywordD
                \else \expandafter\everywordE \fi}
\def\everywordD#1{\edef\tmp{\tmp#1}\everywordB}
\def\everywordE#1\end{\expandafter\domacro\expandafter{\tmp}#1}

\def\foo#1{#1 (#1)}
\everyword\foo{foo bar foo bar foo bar foo, foo? foo. foo;}
\end

The main idea of this macro is: we process each word separated by space first and each such word is divided to two parts: first the letter tokens (catcode 11) and second all tokens of another type.

wipet
  • 74,238
9

An implementation with xparse and the l3regex module of expl3; first the appearances of \footnote are kept out of the way, then in each piece runs of characters that are not spaces or punctuation are given as argument to a macro whose meaning can be set with \setxsamewordformat:

\documentclass{article}
\usepackage{xparse}

\ExplSyntaxOn
\NewDocumentCommand{\xsameword}{m}
 {
  \maieul_xsameword:n { #1 }
 }

\tl_new:N \l__maieul_xsameword_list_tl

\cs_new_protected:Nn \maieul_xsameword:n
 {
  \tl_set:Nn \l__maieul_xsameword_list_tl
   {
    \__maieul_xsameword_start:n { #1 }
   }
  \regex_replace_all:nnN
   { (\c{footnote}\cB..*?\cE.) }
   { \cE\} \1 \c{__maieul_xsameword_start:n} \cB\{ }
   \l__maieul_xsameword_list_tl
  \tl_use:N \l__maieul_xsameword_list_tl
 }
\cs_new_protected:Nn \__maieul_xsameword_start:n
 {
  \tl_set:Nn \l__maieul_xsameword_list_tl { #1 }
  \regex_replace_all:nnN
   { ([^\s,.!?]+) }
   { \c{maieul_xsameword_format:n} \cB\{ \1 \cE\} }
   \l__maieul_xsameword_list_tl
  \tl_use:N \l__maieul_xsameword_list_tl
 }

\NewDocumentCommand{\setxsamewordformat}{m}
 {
  \cs_set_protected:Nn \maieul_xsameword_format:n { #1 }
 }
\ExplSyntaxOff

\setxsamewordformat{#1 (#1)}

\textheight=3cm

\begin{document}

\xsameword{foo \textit{bar} baz? foo\footnote{footnote footnote} bar}

\setxsamewordformat{\textbf{#1}}

\xsameword{foo \textit{bar} baz? foo\footnote{footnote footnote} bar}

\end{document}

enter image description here

Note that also non ASCII characters are managed.

egreg
  • 1,121,712
  • I think your answer, as based on regexp, is the better one. I will look on it more carefuly later. – Maïeul Dec 01 '15 at 15:43
  • 2
    @Maïeul There are limitations, of course: something like \textit{foo\footnote{bar}} is not going to work. Also efficiency is out of the question, I'm afraid. More testing with real use cases would be needed to fine tune it. The list of “exceptions” can be extended. – egreg Dec 01 '15 at 15:49
  • yes of course. For now, I have to understand your code well. – Maïeul Dec 01 '15 at 16:04
  • however, maybe the solution proposed by @florian will be the best one: using external script. – Maïeul Dec 01 '15 at 16:12
  • \usepackage{xparse,l3regex} dont work, see: https://tex.stackexchange.com/questions/479650/l3regex-sty-not-found#comment1292576_479650 – AndréC Oct 06 '19 at 09:26
  • @AndréC Thanks, removed. – egreg Oct 06 '19 at 12:11