28

\MakeUppercase and \uppercase use TeX's uccode, and they are not purely expandable. Say,

\edef\temp{\MakeUppercase{abc}}

will fail.

Sometimes purely expandable macros are very useful. They are robust, for example. I think we can use string substitution to implement such a function. Ideal result would be:

\Uppercase{abc} % expanded to -> ABC

And may also be:

\GetUppercase{abc} % \result is expanded to -> ABC

So far, what I can do, is this:

% \getuppercase{abc} => \result :-> A
\def\getupperchar#1{%
  \edef\result{\ifcase`#1\relax
     0\or 1\or 2\or 3\or 4\or 5\or 6\or 7\or 8\or 9\or
    10\or11\or12\or13\or14\or15\or16\or17\or18\or19\or
    20\or21\or22\or23\or24\or25\or26\or27\or28\or29\or
    30\or31\or32\or33\or34\or35\or36\or37\or38\or39\or
    40\or41\or42\or43\or44\or45\or46\or47\or48\or49\or
    50\or51\or52\or53\or54\or55\or56\or57\or58\or59\or
    60\or61\or62\or63\or64\or A\or B\or C\or D\or E\or
     F\or G\or H\or I\or J\or K\or L\or M\or N\or O\or
     P\or Q\or R\or S\or T\or U\or V\or W\or X\or Y\or
     Z\or91\or92\or93\or94\or95\or96\or A\or B\or C\or
     D\or E\or F\or G\or H\or I\or J\or K\or L\or M\or
     N\or O\or P\or Q\or R\or S\or T\or U\or V\or W\or
     X\or Y\or Z\or123\or124\or125\or126\or127\or128\or129\or
    \fi}}

And this can be used to implement a naive variant of mfirstuc:

% \getfirstupper{abc} => \result :-> Abc
\def\getfirstupper#1{%
  \getupperchar{#1}%
  \edef\result{\result\gobble#1}}
\def\gobble#1{}

However, I cannot implement the full \Uppercase or \GetUppercase this way. I wonder if there is a smart way to define such commands.

Any information is welcome. BTW, I know LuaTeX can be used, but I'm looking for a pure TeX solution.

lockstep
  • 250,273
Leo Liu
  • 77,365
  • 8
    While \uppercase isn't expandable, e.g. \edef\temp{\uppercase{abc}} doesn't work, it is normally used like this \uppercase{\def\temp{abc}}, which works fine. But you probably know this already. – Martin Scharrer Feb 11 '11 at 00:26
  • 3
    @Martin: Thanks. I'm not familar with uppercase/lowercase tricks. This should be an answer. – Leo Liu Feb 11 '11 at 01:03
  • Please note that \uppercase doesn't work with special letters like German Umlauts and such. It simply changes all character tokens to uppercase and ignores all macros and active characters. – Martin Scharrer Feb 11 '11 at 01:18
  • 1

    ... I'm looking for a pure TeX solution. I do not know of LuaTeX ‘s implementation, but in TeX, \uppercase and \lowercase are executed by the execution processor. They aren’t macro expansions like \number. I don’t think you will find a pure-TeX solution.

    – Ahmed Musa Feb 11 '11 at 01:27
  • 2
    @Ahmed: Indeed. That's why I (and @Bruno) do not use \uppercase here. – Leo Liu Feb 11 '11 at 05:31

7 Answers7

19

Updated answer

For expl3 based partly on ideas raised here in my original approach and in Bruno's method we have now developed a set of expandable case-changing functions that implement case mappings as described by the Unicode Consortium:

  • \str_foldcase:n
  • \text_uppercase:n(n)
  • \text_lowercase:n(n)
  • \text_titlecase:n(n)

One important point to note is that they work with 'engine native' input, which means just ASCII for pdfTeX (the upper half of the 8-bit range is tricky). For XeTeX/LuaTeX the full Unicode range is covered.

The direct answer to the question is to use \text_uppercase:n: it does expansion of input in a selective way, can deal with entries such as \aa and with work inside an expansion context including 'f-type' methods (expansion using \romannumeral). In the current implementation there are features very similar to the textcase package, for example selective skipping of input, skipping over math mode material, etc.

There are four types of function to cover different use cases:

  • 'Removal' of case for use in non-text contexts. This looks rather like 'lower casing' and is a one-one mapping. As the data is string-like the function is called \str_foldcase:n and does not skip or expand any input.

  • Uppercasing

  • Lowercasing

  • Making 'titlecase' (Unicode description): it covers only the first 'letter' of some text not the first letter of every word of some text (the latter is usually called title case in English)

The code includes the ability to handle context dependence (e.g. final-sigma in Greek) and also language-dependent versions such as \text_lowercase:nn { tr } { I } to apply Turkish rules (here producing a dotless-i).

At the implementation level, the approach taken is to map over the input using a two-part strategy, first working out if the next token is a space, something braced or something else (what we call N-type). Each type can be grabbed properly and then case changed as appropriate using a lookup table.

Note that using Lua in LuaTeX offers only a partial solution for two reasons. First, Lua does not work with TeX tokens meaning that for example skipping math mode input requires more effort. Secondly, the current Lua Unicode library available in LuaTeX is poorly documentation and does not cover context-dependent issues, non one-one mappings and so on. For example, a simple test case is

\documentclass{article}
\usepackage{fontspec}
\usepackage{expl3}
\begin{document}
\ExplSyntaxOn
\text_uppercase:n { Fußball }
\ExplSyntaxOff

\directlua{tex.print(unicode.utf8.upper("Fußball"))}
\end{document}

where no case changing occurs in the Lua-based case. (It is also not clear what Unicode version the Lua library follows.)


Original answer

For expl3, I wrote the following as the most robust approach I could find

\documentclass{article}
\usepackage{expl3}
\ExplSyntaxOn
\cs_new:Npn \tl_to_upper_case:n #1
  { \exp_args:Nf \__tl_to_upper_case:n {#1} }
\cs_new:Npn \__tl_to_upper_case:n #1
  { \__tl_to_upper_case:w #1 ~ \q_no_value \q_stop }
\cs_new:Npn \__tl_to_upper_case:w #1 ~ #2 \q_stop
  {
    \quark_if_no_value:nTF {#2}
      { 
        \tl_map_function:nN {#1} \__tl_to_upper_case_aux:N 
        \tl_trim_spaces:n { }
      }
      { \__tl_to_upper_case:w #1 { ~ } #2 \q_stop }
  }
\cs_new:Npn \__tl_to_upper_case_aux:N #1
  {
    \prg_case_str:nnn {#1}
      {
        { a } { \__tl_to_case_aux:nw { A } }
        { b } { \__tl_to_case_aux:nw { B } }
        { c } { \__tl_to_case_aux:nw { C } }
        { d } { \__tl_to_case_aux:nw { D } }
        { e } { \__tl_to_case_aux:nw { E } }
        { f } { \__tl_to_case_aux:nw { F } }
        { g } { \__tl_to_case_aux:nw { G } }
        { h } { \__tl_to_case_aux:nw { H } }
        { i } { \__tl_to_case_aux:nw { I } }
        { j } { \__tl_to_case_aux:nw { J } }
        { k } { \__tl_to_case_aux:nw { K } }
        { l } { \__tl_to_case_aux:nw { L } }
        { m } { \__tl_to_case_aux:nw { M } }
        { n } { \__tl_to_case_aux:nw { N } }
        { o } { \__tl_to_case_aux:nw { O } }
        { p } { \__tl_to_case_aux:nw { P } }
        { q } { \__tl_to_case_aux:nw { Q } }
        { r } { \__tl_to_case_aux:nw { R } }
        { s } { \__tl_to_case_aux:nw { S } }
        { t } { \__tl_to_case_aux:nw { T } }
        { u } { \__tl_to_case_aux:nw { U } }
        { v } { \__tl_to_case_aux:nw { V } }
        { w } { \__tl_to_case_aux:nw { W } }
        { x } { \__tl_to_case_aux:nw { X } }
        { y } { \__tl_to_case_aux:nw { Y } }
        { z } { \__tl_to_case_aux:nw { Z } }
      }
      { \__tl_to_case_aux:nw {#1 } }
  }
\cs_new:Npn \__tl_to_case_aux:nw #1#2 \tl_trim_spaces:n #3
  {
   #2
   \tl_trim_spaces:n { #3 #1 }
  }
\cs_set_eq:NN \MakeExpandableUppercase \tl_to_upper_case:n
\ExplSyntaxOff
\begin{document}
\MakeExpandableUppercase{Hello World}
\edef\test{\MakeExpandableUppercase{Hello World}}
\show\test
\MakeExpandableUppercase{Hello {World}}
\edef\test{\MakeExpandableUppercase{Hello {World}}}
\show\test
\edef\test{Hello\space\space World}
\MakeExpandableUppercase{\test}
\edef\test{\MakeExpandableUppercase{\test}}
\end{document}

The reason for the space stripping at the end of the input is that you can't avoid it at the start of the string, so I felt the best you could do was say 'spaces at the ends are stripped'. Spaces should be retained within the input. You can implement a lower case function in the same way, and if you do nesting

 \MakeExpandableUppercase{\MakeExpandableLowercase{Hello} World}

should work correctly. As illustrated by the last example, material is expanded before doing the case change. That applies even to protected macros, as the underlying expansion uses \romannumeral. So the argument needs to be made up of purely expandable material.

(As a note, this can of course be implemented without expl3.)


For completeness, a LuaTeX solution might read

\documentclass{article}
\usepackage{fontspec}
\newcommand*\MakeExpandableUppercase[1]{%
  \scantokens\expandafter{%
    \directlua{
      tex.write(string.upper("\luatexluaescapestring{\unexpanded{#1}}"))
    }%
  \noexpand
  }%
}%
\begin{document}
\MakeExpandableUppercase{hello world \oe}
\end{document}

(I'm no Lua expert: there may be a more efficient approach.)

Joseph Wright
  • 259,911
  • 34
  • 706
  • 1,036
  • Thank you @Joseph. I was searching it in source3. Although I know little about LaTeX3 syntax, it is of great help. – Leo Liu Feb 11 '11 at 07:03
  • 1
    @Leo. The above is currently in source3, as while I've proposed it as useful we have not yet got a consensus on that from all of the team. – Joseph Wright Feb 11 '11 at 07:07
  • @Joseph: What can we do with non-English alphabets like \oe? – Leo Liu Feb 11 '11 at 08:38
  • @Leo: I've thought about this, but with an expandable solution we are rather limited. With XeLaTeX/LuaLaTeX, UTF-8 characters can be handled by extending the list in my approach. However, for macros like \oe, the problem is that they don't expand nicely. If you look at how \MakeUppercase works, it's done by doing \let\oe\OE, etc., which is not an option for an expandable solution. You also see the same with things like \'e, which again has an expansion that is hard to deal with. – Joseph Wright Feb 11 '11 at 08:55
  • 1
    @Leo: The bottom line is that the only way to do this expandably and deal with accents, etc., is at the engine level. That means only LuaTeX can offer a truly general solution, as this problem is trivial in Lua. – Joseph Wright Feb 11 '11 at 08:57
  • @Joseph: Yes, \MakeUppercase's method is limited here. After all, we can use this old method to get an expanded result (\reserved@a in \MakeUppercase's method) for future use. – Leo Liu Feb 11 '11 at 11:14
  • @Joseph: to avoid stripping spaces at the start, can you simply remove \exp_args:Nf from the first definition? (I don't understand why it's there.) – Bruno Le Floch Feb 15 '11 at 00:33
  • @Bruno. I was aiming for a 'function-like' approach, such that \Uppercase{\Lowercase{ab}c} would yield ABC and not abC. – Joseph Wright Feb 15 '11 at 07:03
  • @Joseph: but you only expand the first token completely? – Bruno Le Floch Feb 15 '11 at 07:59
  • @Bruno: I said 'aiming at' :-) Perhaps it was not such a good idea. I still need to look again at your approach – Joseph Wright Feb 15 '11 at 08:03
  • @Joseph: I updated my answer to reflect the work I did tonight. The code is the same as the one mentionned on LaTeX-L. It leads to quite a few useful commands. – Bruno Le Floch Feb 15 '11 at 09:05
  • You are better using unicode.utf8.upper() instead of string.upper() since the later is 8-bit only. – خالد حسني Feb 15 '11 at 10:20
  • @Khaled: I did say "I'm not Lua expert": I'd assumed that Lua was intrinsically UTF-8. There's also an issue with needing to escape things, which is wrong: it ends up case-changing macro names. Perhaps you can suggest a better implementation as an answer? – Joseph Wright Feb 15 '11 at 10:21
  • @joseph: I might know some lua but I know almost no TeX, basically anything with the word "expansion" in it just scares me out; I have totally and absolutely no idea what is being discussed here :). – خالد حسني Feb 15 '11 at 23:12
  • Lua itself knows nothing about Unicode and encodings, the string functions just delegate to the system, so it depends from the hosting environment (which is the LuaTeX engine in this case). Since LuaTeX claims to work on Unicode, we should assume it works here, but I didn't test it (nor read the code). – Paŭlo Ebermann Feb 16 '11 at 12:00
  • @Paŭlo: I see. That surprises me a little, as you might guess, but them I'm used to TeX engines which have 'native' encodings. – Joseph Wright Feb 16 '11 at 12:03
  • 1
    Programming in Lua, chapter 20: Both string.upper and string.lower follow the current locale. Therefore, if you work with the European Latin-1 locale, the expression string.upper("ação") results in "AÇÃO". And I suppose LuaTeX redefines this current locale thing to mean Unicode. – Paŭlo Ebermann Feb 16 '11 at 12:10
  • Make sure you use unicode.utf8.* functions. – topskip Aug 08 '11 at 11:50
  • Note: still an active area of development: updates likely! – Joseph Wright Feb 19 '15 at 12:39
12

Edit3: now the token list module in LaTeX3 provides \text_uppercase:n and \text_lowercase:n, which stem from that discussion but are more robust and much less greedy on the number of control sequences. Slower, as well.

EDIT2: after a first code which ate spaces and choked when it saw braces, and a second code which would crash for more than 600 or so tokens, I spent some time writing a clean code that still works with >5k tokens, although it gets slow. The new code actually lends itself to all sorts of generalizations (see near the bottom of the code). I got rid of the expansion control that was the cause of an "too many levels of expansion", and the code is now much less tricky.

(Sorry, code and explanations are long.) Now, after exactly three expansion steps, \Uppercase{ Hel{l }o\error World } expands to HEL{L }O\error WORLD, with spaces, braces, and macros kept (and not expanded).

Two ideas:

  • Check for braces and for spaces by using a delimited argument (see \UL_brace_check:nw and \UL_space_check:nw), after having placed a {\q_mark} \q_stop after all the tokens, to ensure that there is at least one brace or space after the argument.

  • Define tables of case change. For example, \UL_table_upper_p is a macro which expands to P, and \UL_table_lower_A expands to a. If the relevant entry of the table is not defined, then the token which is being read is not altered. See \UL_convert_token:nN for this. The last part of the code is all about setting up these tables ("case-tables"?).

We need to step inside brace groups and expand \UL_to_case:nn entirely before continuing. For this, we use \romannumeral-\\0`, closed by a space, which is introduced at the very end.

A few macros deserve some explanation.

  • \UL_expand_csname:n{...} replaces every \csname abc\endcsname construction by the corresponding \abc. I need this somewhere to explicit a csname which is quite deep in a definition.

  • \expandafter:nw{...}\foo will expand \foo before ....

  • \expandsome{\foo\expandthis\bar\baz\expandthis\foo{ABC}} will expand the macro following \expandthis once (that macro is allowed to take any kind of argument: in fact, we simply \expandafter it).

The code can also be found online. Finally, the code, with some tests at the end, and a few comments.

\catcode`\_=11\relax
\catcode`\:=11\relax

% ======================== Generic macros

% A few standard commands to manipulate arguments
\long\gdef\use_none:n#1{}
\long\gdef\use_none:nn#1#2{}
\long\gdef\use_i:nn#1#2{#1}
\long\gdef\use_ii:nn#1#2{#2}
\long\gdef\use_ii_i:nn#1#2{#2#1}
\long\gdef\use_ii_iii_i:nnn#1#2#3{#2#3#1}
\long\gdef\use_i_bbii:nn#1#2{#1{{#2}}}
\long\gdef\use_bii_bi:nn#1#2{{#2}{#1}}

% What expl3 calls "quarks", useful for |\ifx| comparisons.
\gdef\q_stop{\q_stop}
\gdef\q_mark{\q_mark}
\gdef\q_nil{\q_nil}
\long\gdef\use_none_until_q_stop:w#1\q_stop{}

% Two tests 
\long\gdef\UL_if_empty:nTF#1{%
  \expandafter\ifx\expandafter\q_nil\detokenize{#1}\q_nil%
  \expandafter\use_i:nn%
  \else%
  \expandafter\use_ii:nn%
  \fi}

\expandafter\long\expandafter\gdef\expandafter\UL_if_detok_qmark:wTF%
\expandafter#\expandafter1\detokenize{\q_mark}#2\q_stop{% 
  \UL_if_empty:nTF{#1}}

% ======================== Main command: |\UL_to_case:nn|
% Usage:       |\UL_to_case:nn{<table>}{<text>}|
% Expands in:  2 steps.
\long\gdef\UL_to_case:nn{\romannumeral\UL_to_case_aux:nn}
\long\gdef\UL_to_case_aux:nn#1#2{-`\0%
  \UL_brace_check:nw{#1}#2{\q_mark} \q_stop\UL_to_case_end:n{}}%

% Initially, I used |\q_mark{} \q_stop|: the braces and space are there
% to avoid runaway arguments in |\UL_brace_check:nw| and 
% |\UL_space_check:nw|, whose "w" arguments are delimited respectively
% by an open brace, and by a space. I changed to |{\q_mark} \q_stop|:
% then we only do the check for |\q_mark| in the case of a brace group, 
% and not at every step.

% |\UL_to_case_output:n| appends its argument to the argument of
% |\UL_to_case_end:n|.
\long\gdef\UL_to_case_output:n#1#2\UL_to_case_end:n#3{%
                                    #2\UL_to_case_end:n{#3#1}}
\long\gdef\UL_to_case_end:n#1{ #1}
% And |\UL_to_case_end:n| expands to 
% - a space, which stops the expansion of |\romannumeral-`\0|,
% - followed by its argument, which is the result we want.


% First, we check whether the next token is a brace. 
\long\gdef\UL_brace_check:nw#1#2#{%
  \UL_if_empty:nTF{#2}%
  {\UL_brace_yes:nn{#1}}%
  {\UL_space_check:nw{#1}#2}%
}
% If there is a brace, we might have reached {\q_mark}.
\long\gdef\UL_brace_yes:nn#1#2{%
  \expandafter\UL_if_detok_qmark:wTF \detokenize{#2 \q_mark}\q_stop{% 
    \use_none_until_q_stop:w% 
  }{% 
    \csname UL_table_#1_braces\endcsname{#1}{#2}%
    \UL_brace_check:nw{#1}%
  }%
}

% Then check whether the next token is a space.
\long\gdef\UL_space_check:nw#1#2 {%
  \UL_if_empty:nTF{#2}%
  {\UL_convert_token:nn{#1}{ }}%
  {\UL_convert_token:nn{#1}#2 }% we put the space back!
}

\long\gdef\UL_convert_token:nn#1#2{%
  \ifcsname UL_table_#1_\detokenize{#2}\endcsname%
  \expandafter\use_i:nn%
  \else%
  \expandafter\use_ii:nn%
  \fi% 
  {\csname UL_table_#1_\detokenize{#2}\endcsname}%
  {\csname UL_table_#1_default\endcsname{#2}}%
  \UL_brace_check:nw{#1}% Do the next token.
}


% ======================== Casecode tables.
% ============ Generic setup.
% Typical use:
% - |\UL_setup:nnn{u}{a}{A}| to define |a| uppercased as |A|.
% - |\UL_setup_cmd:nnpn{ULnil}{\NoCaseChange}#1{%
%      \UL_to_case_output:n{#1}}|
% Note that for the second, we have to grab all the arguments in one go.
% Also note that the second should not be used until we define the ULec 
% and ULea tables below.
%
% - |\UL_set_eq:nnnn{tableA}{tokenA}{tableB}{tokenB}| sets the entry
% |tokenA| of the table |tableA| to be equal to the entry |tokenB| of the
% table |tokenB|.
% - |\UL_new_table:nn{tableA}{tableB}| creates a new table, |tableA|, 
% which is a copy of |tableB|.

\protected\long\gdef\UL_content_of_table_add:nn#1#2{%
  \long\expandafter\gdef\csname UL_table_#1%
  \expandafter\expandafter\expandafter\endcsname%
  \expandafter\expandafter\expandafter{%
    \csname UL_table_#1\endcsname{#2}}%
}

\protected\long\gdef\UL_setup:nnn#1#2#3{%
  \UL_content_of_table_add:nn{#1}{#2}%
  \expandafter\long\expandafter\gdef%
  \csname UL_table_#1_\detokenize{#2}\endcsname%
  {\UL_to_case_output:n{#3}}%
}

\protected\long\gdef\UL_setup_cmd:nnpn#1#2#3#{%
  \UL_content_of_table_add:nn{#1}{#2}%
  \UL_expand_csname:n{%
    \long\gdef\csname UL_table_#1_\detokenize{#2}\endcsname##1##2{%
      \expandafter:nw{\use_ii_i:nn{##1{##2}}}%
      \csname UL_table_#1_\detokenize{#2}_aux\endcsname}%
  }%
  \use_i_bbii:nn{\expandafter\long\expandafter\gdef%
    \csname UL_table_#1_\detokenize{#2}_aux\endcsname#3}%
}

\protected\long\gdef\UL_set_eq:nnnn#1#2#3#4{%
  \UL_content_of_table_add:nn{#1}{#2}%
  {\expandafter}\expandafter\global\expandafter\let%
  \csname UL_table_#1_\detokenize{#2}\expandafter\endcsname%
  \csname UL_table_#3_\detokenize{#4}\endcsname%
}

\long\gdef\UL_new_table:nn#1#2{%
  \ifcsname UL_table_#1\endcsname%
  \PackageError{ULcase}{Table \detokenize{#1} already defined!}{}%
  \fi%
  \long\expandafter\gdef\csname UL_table_#1\endcsname{}%
  %
  \def\UL_tmpA{#1}%
  \def\UL_tmpB{#2}%
  \expandafter\expandafter\expandafter\UL_new_table_aux:nnn%
  \csname UL_table_#2\endcsname{}%
}
\long\gdef\UL_new_table_aux:nnn#1{%
  \UL_if_empty:nTF{#1}{}{%
    \UL_set_eq:nnnn{\UL_tmpA}{#1}{\UL_tmpB}{#1}%
    \UL_new_table_aux:nnn%
  }%
}%
\long\gdef\UL_new_table:n#1{\UL_new_table:nn{#1}{ULnil}}




% ============ Table ULea, \expandafter:nw
% 
% The |ULea| table puts |\expandafter| after each token (including braces
% and spaces). Allows us to define |\expandafter:nw|, which expands what
% follows its first argument once. 
% 
% |\expandafter:nw| takes 2-steps to act. For a 1-step version, use 
% |\MEA_trigger:f\MEA_expandafter:nw|. 

\long\gdef\UL_table_ULea_default#1{\UL_to_case_output:n{\expandafter#1}}%
\long\gdef\UL_table_ULea_braces#1#2{%
  \expandafter\expandafter\expandafter\UL_to_case_output:n%
  \expandafter\expandafter\expandafter{%
    \expandafter\expandafter\expandafter\expandafter%
    \expandafter\expandafter\expandafter{%
      \UL_to_case:nn{#1}{#2}\expandafter}%
  }%
}
\let\MEA_trigger:f\romannumeral
\def\MEA_expandafter:nw{\UL_to_case_aux:nn{ULea}}
\def\expandafter:nw{\MEA_trigger:f\MEA_expandafter:nw}


% ============ Table |ULec|, |\UL_expand_csname:n|
% The |ULec| table expands only the 
% |\csname ...\endcsname| constructions.
% 
\long\gdef\UL_table_ULec_default{\UL_to_case_output:n}%
\long\gdef\UL_table_ULec_braces#1#2{%
  \expandafter\expandafter\expandafter\UL_to_case_output:n%
  \expandafter\expandafter\expandafter{%
    \expandafter\expandafter\expandafter{\UL_to_case:nn{#1}{#2}}%
  }%
}
\long\expandafter\gdef\csname%
  UL_table_ULec_\detokenize{\csname}\endcsname#1#2{%
  \expandafter:nw{\use_ii_iii_i:nnn{#1{#2}}}%
  \expandafter\UL_to_case_output:n\csname%
}%

\def\UL_expand_csname:n{\MEA_trigger:f\UL_to_case_aux:nn{ULec}}


% ============ Table |ULexpandsome|, |\expandsome|
% The |ULexpandsome| table expands only the tokens following |\expandthis|.
% 
\long\gdef\UL_table_ULexpandsome_default{\UL_to_case_output:n}%
\long\gdef\UL_table_ULexpandsome_braces#1#2{%
  \expandafter\expandafter\expandafter\UL_to_case_output:n%
  \expandafter\expandafter\expandafter{%
    \expandafter\expandafter\expandafter{\UL_to_case:nn{#1}{#2}}%
  }%
}
\long\expandafter\gdef\csname%
  UL_table_ULexpandsome_\detokenize{\expandthis}\endcsname#1#2{%
  \expandafter:nw{#1{#2}}%
  %\expandafter\UL_to_case_output:n\csname%
}%

\def\expandsome{\MEA_trigger:f\UL_to_case_aux:nn{ULexpandsome}}


% ============ The default table, ULnil
\long\gdef\UL_table_ULnil{{default}{braces}{$}}%$
\long\gdef\UL_table_ULnil_default{\UL_to_case_output:n}
\long\gdef\UL_table_ULnil_braces#1#2{%
  \expandafter\expandafter\expandafter\UL_to_case_output:n%
  \expandafter\expandafter\expandafter{%
    \expandafter\expandafter\expandafter{\UL_to_case:nn{#1}{#2}}%
  }%
}
\UL_setup_cmd:nnpn{ULnil}{\NoCaseChange}#1{%
  \UL_to_case_output:n{#1}}


% ============ Working on math mode.
% 
% We add \q_mark so that \UL_dollar_aux:nw can read to the next dollar
% without unbracing the argument, so that ${...}$ --x-> $...$
\long\expandafter\gdef\csname UL_table_ULnil_\detokenize{$}\endcsname#1#2{%$
    \UL_dollar_aux:nw{#1{#2}}\q_mark%
}
% Grab until the next dollar, so #2={\q_mark Math Stuff}. 
% If \use_none:n #2 is empty, then we had only grabbed `\q_mark`, 
% which means there was $$, and we need to redo the same business. 
% Otherwise, we output, after stripping the \q_mark.
\long\gdef\UL_dollar_aux:nw#1#2${%$%
  \expandafter\UL_if_empty:nTF\expandafter{\use_none:n#2}{% eats \q_mark
    \UL_bidollar:nw{#1}\q_mark%
  }{%
    \expandafter\UL_to_case_output:n\expandafter{%
      \expandafter$\use_none:n#2$}#1%
  }%
}
\long\gdef\UL_bidollar:nw#1#2$${%
  \expandafter\UL_to_case_output:n\expandafter{%
    \expandafter$\expandafter$\use_none:n#2$$}#1}



% =========== Lowercase, Uppercase, Caesar
\long\gdef\Lowercase{\UL_to_case:nn{lower}}
\long\gdef\Uppercase{\UL_to_case:nn{upper}}
\long\gdef\CaesarCipher{\UL_to_case:nn{caesar}}

% Setup the uppercase and lowercase tables.
\UL_new_table:n{lower}
\UL_new_table:n{upper}

\protected\long\gdef\UL_setup_lower_upper:n#1{%
  \UL_if_empty:nTF{#1}{}{%
    \UL_setup:nnn{upper}#1%
    \expandafter:nw{\UL_setup:nnn{lower}}\use_bii_bi:nn#1%
    \UL_setup_lower_upper:n%
  }%
}
% should become user-firendly.
\UL_setup_lower_upper:n {{a}{A}} {{b}{B}} {{c}{C}} {{d}{D}} {{e}{E}} 
{{f}{F}} {{g}{G}} {{h}{H}} {{i}{I}} {{j}{J}} {{k}{K}} {{l}{L}} {{m}{M}} 
{{n}{N}} {{o}{O}} {{p}{P}} {{q}{Q}} {{r}{R}} {{s}{S}} {{t}{T}} {{u}{U}} 
{{v}{V}} {{w}{W}} {{x}{X}} {{y}{Y}} {{z}{Z}} {{\ae}{\AE}} {{\oe}{\OE}} 
{}


% Just for fun, we define the Caesar cipher.
\UL_new_table:n{caesar}
\begingroup
  \lccode`\x=1\relax
  \loop
    \lccode`\X=\numexpr\lccode`\x+2\relax
    \lowercase{\UL_setup:nnn{caesar}{x}{X}}%
    \lccode`\x=\numexpr\lccode`\x+1\relax
  \unless\ifnum\lccode`\x>126\relax
  \repeat
\endgroup
\UL_setup:nnn{caesar}{ }{ }




% ====== Various tests
\long\gdef\checkoutput{\ifx\a\b\message{Correct}\else\show\WRONG\fi}

\long\gdef\expandonce#1{% redefines #1 as #1 expanded once.
  \long\xdef#1{\unexpanded\expandafter\expandafter\expandafter{#1}}}
\def\0{\1}\def\1{\2}\def\2{\3}\def\3{\4}\def\4{\5}


% \Uppercase, \Lowercase, \NoCaseChange work (+ nesting)
% Spaces and braces are fine.
\long\gdef\a{\Uppercase{ Hello, { } W\Lowercase{O}r\NoCaseChange{lD}! }}
\expandonce\a\expandonce\a\expandonce\a
\long\gdef\b{ HELLO, { } W\Lowercase{O}RlD! }
\checkoutput

% Another test.
\long\gdef\a{\Lowercase{He l%
    \NoCaseChange{\Uppercase{ Lp\NoCaseChange{ o}}}o }}
\expandonce\a\expandonce\a\expandonce\a
\long\gdef\b{he l\Uppercase{ Lp\NoCaseChange{ o}}o }
\checkoutput
\long\edef\a{\a}
\long\gdef\b{he l LP oo }
\checkoutput

% Math works (both $$ and $). Nesting does not break, 
% although we would wish for better (i.e. "Letter"-> "letter").
\long\gdef\a{\Lowercase{{t}ExT, $$\frac{A}{B}$$ and $(\mbox{Letter $A$})$}}
\expandonce\a\expandonce\a\expandonce\a
\long\gdef\b{{t}ext, $$\frac{A}{B}$$ and $(\mbox{Letter $A$})$}
\checkoutput

\edef\a{\CaesarCipher{a{b}cdef@ ABCX}}
\edef\b{c{d}efghB CDEZ}
\checkoutput


\long\gdef\a{\Uppercase{%
    \0{ a${} {{abd}+cd}$\0{b$${\d $0$}$$ }}%
    \NoCaseChange{ Ac dD\relax\0ii}i cd }%
}
\expandonce\a\expandonce\a\expandonce\a
\long\gdef\b{\0{ A${} {{abd}+cd}$\0{B$${\d $0$}$$ }} %
  Ac dD\relax\0iiI CD }%
\checkoutput



% More on braces, spaces, and expansion (nothing is expanded, 
% as we expect).
\long\gdef\a{\Lowercase{ {} \0 { b{C} {dB\AE~}} \0{\0} }}
\expandonce\a\expandonce\a\expandonce\a
\long\gdef\b{ {} \0 { b{c} {db\ae ~}} \0{\0} }
\checkoutput

% Testing the ULec table (expanding only \csname)
\long\gdef\a{\UL_expand_csname:n{ \hello 
    {\csname Hdsf\endcsname}##1\space \csname el\endcsname{ }lo, my name}}
\expandonce\a\expandonce\a
\long\gdef\b{ \hello {\Hdsf}##1\space \el{ }lo, my name}
\checkoutput


% Custom table.
\UL_new_table:n{mytable}
\UL_setup:nnn{mytable}{h}{Hello}
\long\gdef\a{\UL_to_case:nn{mytable}{h{ h} {}\space \h}}
\expandonce\a\expandonce\a\expandonce\a\expandonce\a
\long\gdef\b{Hello{ Hello} {}\space \h}
\checkoutput


\def\mydo#1#2{(#1)-(#2)}
\long\gdef\a{\expandsome{\0\0{\expandthis\mydo{\0\expandthis\0}\0\0}}}
\expandonce\a\expandonce\a
\long\gdef\b{\0\0{(\0\1)-(\0)\0}}
\checkoutput

\long\gdef\a{\Uppercase{\NoCaseChange{The quick brown fox jumps over the lazy dog.} The quick brown fox jumps over the lazy dog. \NoCaseChange{The quick brown fox jumps over the lazy dog.} The quick brown fox jumps over the lazy dog. \NoCaseChange{The quick brown fox jumps over the lazy dog.} The quick brown fox jumps over the lazy dog. \NoCaseChange{The quick brown fox jumps over the lazy dog.} The quick brown fox jumps over the lazy dog. \NoCaseChange{The quick brown fox jumps over the lazy dog.} The quick brown fox jumps over the lazy dog. \NoCaseChange{The quick brown fox jumps over the lazy dog.} The quick brown fox jumps over the lazy dog. \NoCaseChange{The quick brown fox jumps over the lazy dog.} The quick brown fox jumps over the lazy dog. }}
\begingroup\tracingall\tracingonline=0\relax
\expandonce\a\expandonce\a\expandonce\a
\endgroup
\long\gdef\b{The quick brown fox jumps over the lazy dog. THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG. The quick brown fox jumps over the lazy dog. THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG. The quick brown fox jumps over the lazy dog. THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG. The quick brown fox jumps over the lazy dog. THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG. The quick brown fox jumps over the lazy dog. THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG. The quick brown fox jumps over the lazy dog. THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG. The quick brown fox jumps over the lazy dog. THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG. }
\checkoutput
Joseph Wright
  • 259,911
  • 34
  • 706
  • 1,036
  • @Bruno: Repetitive calls to \defupper doesn’t seem attractive. How many \upper@xxx would you need? All the letters of the alphabet? And I don’t see how \EA\def\EA\foo\EA{\Uppercase{abz:}} will do the whole thing. It will only expand \Uppercase once. I assume \EA stands for \expandafter. – Ahmed Musa Feb 11 '11 at 03:13
  • I am sorry, I didn't see it: \EA stands for seven \expandafter's. – Ahmed Musa Feb 11 '11 at 03:34
  • 1
    Thank you @Bruno. I like this solution. It works like toupper in C/C++, and is indeed expandable. – Leo Liu Feb 11 '11 at 05:45
  • And we can test \ifcsname upper@#1\endcsname to get rid of \defupper{:}{:} etc. – Leo Liu Feb 11 '11 at 05:46
  • @Ahmed: even with seven expandafters, the current version wont fully expand. I'm just saying that it could be done. @Leo: We could do much better by defining all of these inside a loop with some lccode (or uccode) trickery. Right now I'm busy, but later this weekend, I'll make a better version of that. In principle, we can also avoid doing bad things to braces, but that's much more tricky. – Bruno Le Floch Feb 11 '11 at 08:26
  • @Ahmed: now \let\ea\expandafter and \ea\ea\ea\def\ea\ea\ea\foo\ea\ea\ea{\Uppercase{abc}} defines \foo as ABC, so only two steps of expansion instead of the three that I thought were needed. – Bruno Le Floch Feb 12 '11 at 19:16
  • @Bruno. Interesting approach, which I'll study a bit more later today (I'd wondered about csname tables for performance reasons: I'm still unsure what is better over all). Obviously your solution poses two questions: 1) What is the 'expected' result of \Uppercase{\Lowercase{<tokens>}} (you've gone for one way round, I've gone for the other); 2) What should happen about braced material? In many ways I'd 'expect' braces to act as they do in BibTeX, and prevent case changing. That though leaves some accented characters as rather awkward. – Joseph Wright Feb 12 '11 at 19:52
  • @Bruno. Of course, as {TeX} is meant for answers, such discussion might be more appropriate elsewhere (LaTeX-L, perhaps?) – Joseph Wright Feb 12 '11 at 19:53
  • @Joseph: I posted to LaTeX-L. For \Uppercase{\Lowercase{<tokens>}} I went Knuth's way, because there is no way of knowing that the tokens are going to be \Lowercase-d before expanding it, and I wanted no expansion. – Bruno Le Floch Feb 13 '11 at 02:05
  • Thank you very much, @Bruno. I'll spend some time to understand it. – Leo Liu Feb 13 '11 at 06:02
  • @Joseph, @Bruno: What a pity, I really like to follow such comment discussions, and I don't see a real problem here since they don't actually use up any valuable answer space. – Hendrik Vogt Feb 13 '11 at 15:32
  • @Hendrik: It's partly about audience. People like Frank Mittelbach, Donald Arseneau and David Carlisle read LaTeX-L but not this site. Frank has already made some interesting observations on LaTeX-L, which we would not get in a somewhat hidden set of comments here :-) – Joseph Wright Feb 13 '11 at 15:40
  • @Joseph: Very good point, thanks. Is there any way other than subscribing to see what Frank wrote? – Hendrik Vogt Feb 13 '11 at 15:44
  • @Bruno: Finally I understand where the comment % The user commands are \MultiExpand and \MultiExpandAfter comes from. This really puzzled me a lot; you might want to adjust that line. – Hendrik Vogt Feb 13 '11 at 15:55
  • 1
    @Hendrik: Indeed there is. See http://news.gmane.org/group/gmane.comp.tex.latex.latex3 for the archive – Joseph Wright Feb 13 '11 at 15:56
  • @Bruno: I see only now that you got rid of the \MultiExpandAfter here. Somehow that's a pity ... Though you could still use some \expandtwiceafter instead of \expandafter\expandafter\expandafter. But that'll probably mystify too many people. – Hendrik Vogt Feb 18 '11 at 08:36
  • @Hendrik: I did, since only \MultiExpandafter{2} was ever used, and the answer is already too long. I kept some (renamed) version of it in what I sent to LaTeX-L, though: with the case-changing code, it allows \expandsome{...\expthis{3}\tokenA...\expthis{2}\tokenB...} to expand the tokens that we want a given number of times. The expansion takes place from right to left, so that macros with arguments don't get spurious \expthis in their argument when expanded. – Bruno Le Floch Feb 18 '11 at 08:54
  • @Bruno: Great! I've thought already after writing my answer to the \superexpandafter that somehow the clearest solution would be what now is your \expandsome. Good that implemented this! – Hendrik Vogt Feb 18 '11 at 08:59
5

I use the following to get a fully expanded string with the first letter capitalized. I needed it to write the string to the AUX file as part of a message. It was posted long ago by Dan Luecking on CTT. The command \makefirstcap store the expanded string in \firstcaphold. You can make your own varients of this.

\documentclass{article}

\def\makefirstcap#1#2\nil{%
    \iffalse{\fi
    \uppercase{\edef\firstcaphold{\iffalse}\fi#1}#2}}

\begin{document}
\makefirstcap test\nil
\show\firstcaphold
\end{document} 
Danie Els
  • 19,694
3

Here is a tentative short solution. The only flaw I see so far with this solution is outer brace stripping of arguments when \ifconvertcs is false. I may find the time to look into that later. There may be more traps: eg, spaces. Converting to lowercase uses the same \lucasemap and requires only one additional macro (\lowercase@@do). Please comment.

% Preliminaries:
\catcode`\:=11
\newcommand*\ifstrcmp:TF[2]{%
  \@nameuse{@\ifnum\pdfstrcmp{\detokenize{#1}}%
  {\detokenize{#2}}=\z@ first\else second\fi oftwo}%
}
\let\@rnnm\romannumeral
\newcommand*\ifbool:TF[1]{%
  \@nameuse{@\@nameuse{if#1}first\else second\fi oftwo}%
}
\newcommand*\ifx:TF[2]{%
  \@nameuse{@\ifx#1#2\@empty first\else second\fi oftwo}%
}
% Should control sequences (cs) also be converted to lower or uppercase?
\newif\ifconvertcs
\convertcstrue
\def\everyscantokens{%
  \everyeof{\noexpand}%
  \endlinechar\m@ne
  \makeatletter
}
% The solution:
\long\def\ExpandableUppercase#1{%
  \ifbool:TF{convertcs}{%
    \scantokens\expandafter{\expandafter\protect
    \@rnnm-`\q\expandafter\uppercase@loop\detokenize{#1}\@nnil}%
  }{%
    \expandafter\protect\@rnnm-`\q\uppercase@loop#1\@nnil
  }%
}
\def\uppercase@loop#1{%
  \ifx:TF#1\@nnil{ }{\uppercase@do{#1}\uppercase@loop}%
}
\def\lucase@do#1{\expandafter\noexpand\expandafter#1\@rnnm-`\q}
\def\lucasemap{%
  {a}{A}{b}{B}{c}{C}{d}{D}{e}{E}{f}{F}{g}{G}{h}{H}{i}{I}{j}%
  {J}{k}{K}{l}{L}{m}{M}{n}{N}{o}{O}{p}{P}{q}{Q}{r}{R}{s}{S}%
  {t}{T}{u}{U}{v}{V}{w}{W}{x}{X}{y}{Y}{z}{Z}\lu@nil\lu@nil
}
\def\uppercase@do#1{%
  \expandafter\uppercase@@do\lucasemap\cpt@nil{#1}%
}
\def\uppercase@@do#1#2#3\cpt@nil#4{%
  \ifstrcmp:TF{#1}\lu@nil{%
    \lucase@do{#4}%
  }{%
    \ifstrcmp:TF{#1}{#4}{%
      \lucase@do{#2}%
    }{%
      \uppercase@@do#3\cpt@nil{#4}%
    }%
  }%
}

% Tests:
{
\everyscantokens
\let\@display@protect\string
%\let\protect\@unexpandable@protect
%\let\protect\@typeset@protect
%\let\protect\@display@protect
\let\protect\noexpand
\edef\x{\ExpandableUppercase{{\oe}{x}a}}
\toks@\expandafter{\x}
\ExpandableUppercase{\oe} % needs document
\edef\x{\ExpandableUppercase{abcd}}
\show\x

\def\abcd{abcd}
\def\ABCD{ABCD}
\convertcstrue
\edef\x{\ExpandableUppercase{\abcd}}
\show\x
\edef\x{\ExpandableUppercase{\ABCD}}
\show\x
}

\catcode`\:=12
Ahmed Musa
  • 11,742
3

I started with the \ifcase code from OP and added two lines in order to create expandable \euppercase macro.

\def\euppercaseB#1{\ifcase`#1\relax
     0\or 1\or 2\or 3\or 4\or 5\or 6\or 7\or 8\or 9\or
    10\or11\or12\or13\or14\or15\or16\or17\or18\or19\or
    20\or21\or22\or23\or24\or25\or26\or27\or28\or29\or
    30\or31\or32\or33\or34\or35\or36\or37\or38\or39\or
    40\or41\or42\or43\or44\or45\or46\or47\or48\or49\or
    50\or51\or52\or53\or54\or55\or56\or57\or58\or59\or
    60\or61\or62\or63\or64\or A\or B\or C\or D\or E\or
     F\or G\or H\or I\or J\or K\or L\or M\or N\or O\or
     P\or Q\or R\or S\or T\or U\or V\or W\or X\or Y\or
     Z\or91\or92\or93\or94\or95\or96\or A\or B\or C\or
     D\or E\or F\or G\or H\or I\or J\or K\or L\or M\or
     N\or O\or P\or Q\or R\or S\or T\or U\or V\or W\or
     X\or Y\or Z\or123\or124\or125\or126\or127\or128\or129\or
    \fi
}
\def\euppercase#1{\euppercaseA#1\end}
\def\euppercaseA#1{\ifx#1\end \else\euppercaseB#1\expandafter\euppercaseA\fi}

\message{aha: \euppercase{aha}.} % Prints: aha: AHA.

Of course, a slight modification of the \ifcase code should be done. For example ., , etc. should be expanded to ., , etc. and not to 46, 44 etc.

wipet
  • 74,238
1
\def\makefirstcap#1#2\@nil{%
    \toks@{#2}%
    \uppercase{\edef\firstcaphold{#1\the\toks@}}%
}

% Test:

\makefirstcap test\@nil
\show\firstcaphold
Ahmed Musa
  • 11,742
1

TOTALLY REVISED ANSWER:

It obviously has limitations in terms of what types of arguments it can digest, but it is expandable.

\documentclass{article}
\newcommand\caseupper[2]{\caseupperhelp{#1}#2\relax\relax}
\def\caseupperhelp#1#2#3\relax{%
  \ifx a#2A\else  \ifx b#2B\else  \ifx c#2C\else  \ifx d#2D\else  \ifx e#2E\else
  \ifx f#2F\else  \ifx g#2G\else  \ifx h#2H\else  \ifx i#2I\else  \ifx j#2J\else
  \ifx k#2K\else  \ifx l#2L\else  \ifx m#2M\else  \ifx n#2N\else  \ifx o#2O\else
  \ifx p#2P\else  \ifx q#2Q\else  \ifx r#2R\else  \ifx s#2S\else  \ifx t#2T\else
  \ifx u#2U\else  \ifx v#2V\else  \ifx w#2W\else  \ifx x#2X\else  \ifx y#2Y\else
  \ifx z#2Z\else  #1#2%
  \fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi\fi
  \ifx\relax#3\else\caseupperhelp{#1}#3\relax\fi
}
\begin{document}
\edef\x{\caseupper{}{abcDefGhiJkL}}
Expanded result is: \x

\edef\x{\caseupper{\noexpand}{%
  aBc1c3\#\&\$\itshape aBc\upshape\bfseries xYz\mdseries4@!f}}Expanded result is: \x

\caseupper{}{aBc1c3\#\&\$\itshape aBc\upshape\bfseries xYz\mdseries4@!f}
\end{document}

enter image description here

ORIGINAL ANSWER (stringstrings)

The stringstrings package produces expandable results that it places into a string named \thestring. It can be \edefed. In this MWE, the [q] "quiet" option to \caseupper says not to print out the result immediately. Whether printed or not, the expanded result resides in \thestring.

It is generally set up to handle only regular expressions, but has a very limited ability to handle macros in its arguments.

\documentclass{article}
\usepackage{stringstrings}
\begin{document}
\caseupper[q]{abc}
\edef\savedstring{\thestring}
The value is \savedstring.
\end{document}

enter image description here

  • Isn't this more-or-less equivalent to \uppercase{\edef\temp{Hello}}\temp, i.e. isn't expandable in itself only in the sense that the result can be saved into a macro. (I realise that stringstrings is more flexible than \uppercase, of course.) – Joseph Wright Feb 19 '15 at 07:05
  • @JosephWright You would know better than I the semantics of what it means, but I don't see it equivalent to what you just wrote at all. In your example, the uppercase doesn't happen until the end. In my case, it happens first and can be stored in a string for future processing. For example, I could grab the 5th letter of \thestring in the equivalent to your example and determine it to be a capital "O:, whereas I don't see how your example offers that same option. – Steven B. Segletes Feb 19 '15 at 10:48
  • @JosephWright If I had to write an functional equivalent to what stringstrings is actually doing, it would be \edef\thestring{`expanded-version-of-uppercase`{Hello}} – Steven B. Segletes Feb 19 '15 at 11:01
  • @JosephWright Related: http://tex.stackexchange.com/questions/173481/expandably-change-letter-case-and-use-inside-csname-without-a-package – Steven B. Segletes Feb 19 '15 at 11:32
  • My point about the equivalence with \uppercase{\edef\temp{Hello}}\temp is that both \uppercase and \caseupper are not expandable. Thus in an expansion context you can't use them: you've got to have set up some 'pre-converted' macro first (\thestring in your case, \temp in the \uppercase version). – Joseph Wright Feb 19 '15 at 12:38
  • Thanks for the reminder about the closely related question! – Joseph Wright Feb 19 '15 at 12:39
  • @JosephWright Understood and agreed. – Steven B. Segletes Feb 19 '15 at 12:44