Extract first letter of each word, also after a special character like a dash

Question

This question based on this answer.

Found letter missing when it occurs next to dash, as per the below MWE:

\documentclass{article}
\usepackage{readarray}
\usepackage{ifthen}
\newcounter{index}\setcounter{index}{0}
\def\firstletters#1{%
  \getargsC{#1}%
  \whiledo{\theindex<\narg}{%
    \stepcounter{index}%
    \edef\nextword{\csname arg\romannumeral\theindex\endcsname}%
    \expandafter\getfirst\nextword\relax%
  }%
}
\def\getfirst#1#2\relax{#1}
\begin{document}
\firstletters{This is a test of the Emergency Broadcast System. This-Test. for sample. This T.}
\end{document}

Nicola Talbot · Accepted Answer · 2016-07-09T09:47:54.733

The datatool package provides \DTLinitials. For example:

\documentclass{article}

\usepackage{datatool-base}

\begin{document}

\DTLinitials{This is a test of the Emergency Broadcast System.
This-Test. for sample. This T.}

\end{document}

This automatically inserts a period after each initial, but that can be prevented by redefining \DTLafterinitials, \DTLbetweeninitials and \DTLafterinitialbeforehyphen to do nothing.

\documentclass{article}

\usepackage{datatool-base}

\renewcommand*{\DTLbetweeninitials}{}
\renewcommand*{\DTLafterinitials}{}
\renewcommand*{\DTLafterinitialbeforehyphen}{}

\begin{document}

\DTLinitials{This is a test of the Emergency Broadcast System.
This-Test. for sample. This T.}

\end{document}

If you need the initials in an expandable context, you first need to use \DTLstoreinitials, which will save the initials in the command provided in the second argument:

\DTLstoreinitials{This is a test of the Emergency Broadcast System.
This-Test. for sample. This T.}{\initials}

\initials

Edit: if you also want to remove the hyphen from the initials, just redefine \DTLinitialhyphen to do nothing as well:

\renewcommand*{\DTLinitialhyphen}{}

Edit2: Note that \DTLinitials is designed primarily for names (its original purpose was for use with the abbreviated bibliography style provided by databib) so it assumes its argument is a series of letters separated by spaces or hyphens. Additionally from the manual:

Be careful if the initial letter has an accent. The accented letter needs to be placed in a group, if you want the initial to also have an accent, otherwise the accent command will be ignored.

So, as per your comment below:

\DTLinitials{{\"{O}}zg\"{u}r}

Or use XeLaTeX or LuaLaTeX with UTF-8 characters. This is similar to the limitations on \makefirstuc (from mfirstuc)

Also from the datatool manual:

In fact, any command which appears at the start of the name that is not enclosed in a group will be ignored.

This means that, say

\DTLinitials{\MakeUppercase{m}ary ann}

will produce m.a. not M.a.

Your method doesn't appear to notice that the very first character of the string need not be a letter. — Mico, Jul 08 '16 at 21:35
\DTLinitials{\"{O}zg\"{u}r} produces output as O. & \DTLinitials{{\"{O}}zg\"{u}r} produces the proper output Ö... Any advice on this... — Kumaresh PS, Jul 09 '16 at 08:59
@Mico Yes, this is because \DTLinitials is intended primarily for names so it assumes the string is in the form of a name. — Nicola Talbot, Jul 09 '16 at 09:33

score 7 · Answer 2 · answered Jul 09 '16 at 05:59

7

Here is a solution based on classical TeX only:

\def\firstletters{\bgroup \catcode`-=10 \catcode`(=10 \filA}
\def\filA#1{\filB#1 {\end} }
\def\filB#1#2 {\ifx\end#1\egroup \else#1\expandafter\filB\fi} 

\firstletters{This is a test of the Emergency Broadcast System. 
   This-Test. for sample (per se). This T.}

\bye

answered Jul 09 '16 at 05:59

wipet

74,238

I like this simplicity! Is there a way to preserve specific other characters? For example, given the input: "Hello, you wonderful world!" I would like the output: "H, yww!" (preserving the ',' and the '!') – Dan Cranston Jun 05 '23 at 14:52
Also, I am much less familiar with TeX than with LaTeX, so could you offer a little explanation of what your solution is doing? (I get the general idea of \def and #1#2, but don't understand the \bgroup, \egroup, and \catcode) Thanks! – Dan Cranston Jun 05 '23 at 14:54

Manuel · Answer 3 · 2016-07-09T11:41:19.577

\documentclass{scrartcl}

\usepackage{xparse}
\ExplSyntaxOn
\NewDocumentCommand \firstletters { m } { \kumaresh_firstletters:n { #1 } }
\cs_new_protected:Npn \kumaresh_firstletters:n #1
 {
  \tl_set:Nn \l_tmpa_tl { #1 }
  \tl_replace_all:Nnn \l_tmpa_tl { - } { ~ }
  \seq_set_split:NnV \l_tmpa_seq { ~ } \l_tmpa_tl
  \seq_map_inline:Nn \l_tmpa_seq { \tl_head:n { ##1 } }
 }
\ExplSyntaxOff

\begin{document}

\firstletters{This is a test of the Emergency Broadcast System. This-Test. for sample. This T.}

\end{document}

Here's a version that copes with traditional TeX accents (I did not put the whole list, just a few, add anything you want to the definition). This is probably on the limit of complexity while using predefined variables from expl3, it's recommended to define your own variables rather than use the default tmpa, etc.

Also, this version copes in a basic way with functions of the type \emph{words here} and will convert that to \emph{wh}. And also with [brackets and (parenthesis)] (and whatever you add) and it will convert that to bap.

\documentclass{scrartcl}

\usepackage{xparse}
\ExplSyntaxOn
\NewDocumentCommand \firstletters { m } { \kumaresh_firstletters:n { #1 } }
\cs_new_protected:Npn \kumaresh_firstletters:n #1
 {
  \tl_set:Nn \l_tmpa_tl { #1 }
  \tl_replace_all:Nnn \l_tmpa_tl { - } { ~ } % here we convert dashes into spaces for our function
  \tl_map_inline:nn { [( } % here we remove certain symbols (and whatever you add) so that it doesn't interfere
   { \tl_remove_all:Nn \l_tmpa_tl { ##1 } }
  \seq_set_split:NnV \l_tmpa_seq { ~ } \l_tmpa_tl
  \seq_map_inline:Nn \l_tmpa_seq { \kumaresh_firstletters_head:n { ##1 } }
 }
\cs_generate_variant:Nn \tl_if_in:NnTF { NV }
\tl_const:Nn \c_kumaresh_accents_tl
 { \^ \" \' \` \H \. \d \~ \v } % here should be all accents
\tl_new:N \g_kumaresh_fl_exceptions_tl
\tl_gset:Nn \g_kumaresh_fl_exceptions_tl
 { \MakeUppercase \emph \textbf } % add here functions for your exceptions
\cs_new_protected:Npn \kumaresh_firstletters_head:n #1
 {
  \tl_set:Nx \l_tmpa_tl { \tl_head:n { #1 } }
  \tl_if_in:NVTF \c_kumaresh_accents_tl \l_tmpa_tl
   { \kumaresh_firstletter_accent:NNw #1 \q_stop }
   {
    \tl_if_in:NVTF \g_kumaresh_fl_exceptions_tl \l_tmpa_tl
     { \kumaresh_firstletter_exception:Nnw #1 \q_stop }
     { \tl_use:N \l_tmpa_tl }
   }
 }
\cs_new_protected:Npn \kumaresh_firstletter_accent:NNw #1 #2 #3 \q_stop
 { #1 {#2} }
\cs_new_protected:Npn \kumaresh_firstletter_exception:Nnw #1 #2 #3 \q_stop
 { #1 { \kumaresh_firstletters:n { #2 } } }
\ExplSyntaxOff

\begin{document}

\firstletters{\"{O}zg\"{u}r \MakeUppercase{This is} a \emph{test of} the \textbf{Emergency Broadcast} System. (This-Test). [for sample]. This \'T.}

\end{document}

Your macro correctly returns nothing, i.e., an empty string, if the argument of \firstletters is empty. However, the macro doesn't appear to guard against the possibility that the very first few characters of the string might not be letters. E.g., \firstletters{.This} returns . rather than T. — Mico, Jul 08 '16 at 21:42
This is a general approach that separates each string at spaces and takes the first character. And only one extra step is caring about -. If one needs to have more things in account one might want to do \tl_map_inline:nn { ().; } { \tl_remove_all:Nn \l_tmpa_tl { ##1 } } and that will remove all those symbols from the equation. — Manuel, Jul 08 '16 at 21:46
\firstletters{\"{O}zg\"{u}r} produces output as O & \firstletters{{\"{O}}zg\"{u}r} produces the proper output Ö... Any advice on this... – — Kumaresh PS, Jul 09 '16 at 09:05
@KumareshPS That's simple. I will add in a few minutes. But coping with plain \firstletters{Özgür} requires more work. — Manuel, Jul 09 '16 at 11:00
@KumareshPS Done. I think it could easily be even generalized to work on \firstletters{\MakeUppercase{This type} and \emph{that too}} to output \MakeUpercase{Tt}a\emph{tt}. In case you are interested. — Manuel, Jul 09 '16 at 11:16
@KumareshPS Added, plus taking care of ([ but you can add whatever to that list. — Manuel, Jul 09 '16 at 11:42

score 4 · Answer 4 · answered Jul 08 '16 at 15:06

4

With a regex we remove everything from a letter to a space or a hyphen.

\documentclass{article}
\usepackage{xparse,l3regex}

\ExplSyntaxOn
\NewDocumentCommand{\firstletters}{m}
 {
  \kumaresh_firstletters:n { #1 }
 }

\tl_new:N \l_kumaresh_fl_input_tl

\cs_new_protected:Nn \kumaresh_firstletters:n
 {
  \tl_set:Nn \l_kumaresh_fl_input_tl { #1 ~ }
  \regex_replace_all:nnN { ([A-Za-z]).*?[-\s]} { \1 } \l_kumaresh_fl_input_tl
  \tl_use:N \l_kumaresh_fl_input_tl
 }
\ExplSyntaxOff

\begin{document}
\firstletters{This is a test of the Emergency Broadcast System. This-Test. for sample. This T.}
\end{document}

answered Jul 08 '16 at 15:06

egreg

1,121,712

Your method doesn't appear to notice that the very first characters of the string may not be alphabetical characters. E.g., if the string is given by "()This ...", your macro returns "()T" rather than "T". Also, it looks like if the string is entirely empty, a single space rather than an empty string is returned. – Mico Jul 08 '16 at 21:32
1

@Mico There are no real specifications, so I just assumed letters, spaces and hyphens; it's easy to cope with empty strings if needed. – egreg Jul 08 '16 at 21:51

Mico · Answer 5 · 2016-07-09T13:38:41.777

Here's another LuaLaTeX-based solution. It tests if the string contains any alphabetical characters, and it does nothing if no alphabetical characters are found. It is not assumed that the first character of the string is a letter-type character. The proposed solution can handle non-ASCII-encoded letters such as ä, Ä, and Å.

\documentclass{article}
\usepackage{fontspec}
\usepackage{luacode} % for 'luacode' env. and '\luaexec' macro
\begin{luacode}
local i, w , wstring
function fl ( s )
   i = unicode.utf8.find ( s , "%w")
   -- Do nothing if i=="nil", i.e., if 's' doesn't 
   -- contain at least one alphabetical character:
   if i ~= nil then
      -- Pick up the first letter of first word:
      wstring = unicode.utf8.sub ( s , i , i ) 
      s = unicode.utf8.sub ( s , i+1 )
      -- Pick up the first letters of all remaining words:
      for w in unicode.utf8.gmatch ( s , "%W%w" ) do
         wstring = wstring .. unicode.utf8.sub ( w , 2 )
      end
      tex.sprint ( wstring )
   end
end
\end{luacode}
\newcommand{\firstletter}[1]{\luaexec{fl(\luastring{#1})}}

\begin{document}
\firstletter{This is a test of the Emergency Broadcast System. This-Test. for sample. This T. per se}

% Same string, but with additional non-letter characters
\firstletter{@--?#&$() []<>^_ This is a test of the 
   Emergency    Broadcast System. This--Test. 
   for sample. This T. 
   (per se)}

% Words that start with non-ASCII-encoded characters
\firstletter{$$$ähnlich "öffentlich *übrigens !?<>Äpfel 
   Özgür  ((((^Übung    .ßcheusslich+++ ,===Ångstrom}

\firstletter{!@#$^&*()!@#$^&*()_+-={}[]|\\;<>?Ö} 

% Two strings without any "words"
a\firstletter{"("§$&/)@@=}b\firstletter{}c 

\end{document}

@KumareshPS - Is something preventing you from adopting a different work flow? — Mico, Jul 09 '16 at 09:20

Steven B. Segletes · Answer 6 · 2016-07-10T19:04:09.407

This takes the earlier insufficient answer you provide (which was mine by the way), and augments it to make the - active and equal to a space prior to executing the earlier code. Thus, the dash-made-space will allow the subsequent letter to be detected as the beginning of a new word.

\documentclass{article}
\usepackage{readarray}
\usepackage{ifthen}
\newcounter{index}\setcounter{index}{0}
\catcode`-=\active %
\def-{ }
\catcode`-=12 %
\def\firstletters{\catcode`-=\active \firstlettersX}
\def\firstlettersX#1{%
  \getargsC{#1}%
  \whiledo{\theindex<\narg}{%
    \stepcounter{index}%
    \edef\nextword{\csname arg\romannumeral\theindex\endcsname}%
    \expandafter\getfirst\nextword\relax%
  }%
  \catcode`-=12 %
}
\def\getfirst#1#2\relax{#1}
\begin{document}
\firstletters{This is a test of the Emergency Broadcast System. This-Test. for sample. This T.}
- - -Dash restored
\end{document}

An identical approach can be used if you need to capitalize following other punctuation, for example ( or [. For example:

\documentclass{article}
\usepackage{readarray}
\usepackage{ifthen}
\newcounter{index}\setcounter{index}{0}
\catcode`-=\active %
\def-{ }
\catcode`-=12 %
\catcode`(=\active %
\def({}
\catcode`(=12 %
\def\newpunct{%
  \catcode`-=\active %
  \catcode`(=\active %
}
\def\oldpunct{%
  \catcode`-=12 %
  \catcode`(=12 %
}
\def\firstletters{\newpunct\firstlettersX}
\def\firstlettersX#1{%
  \getargsC{#1}%
  \whiledo{\theindex<\narg}{%
    \stepcounter{index}%
    \edef\nextword{\csname arg\romannumeral\theindex\endcsname}%
    \expandafter\getfirst\nextword\relax\relax%
  }%
  \oldpunct%
}
\def\getfirst#1#2\relax{#1}
\begin{document}
\firstletters{This is a test of the Emergency Broadcast System (per se). 
    This-Test. for sample. This T.}
- - -Dash restored (and paren too)
\end{document}

Your methods appear not to pick up the possibility that the very first character of the string needn't be a letter, and they crash if the first character is a space. :-( — Mico, Jul 08 '16 at 21:27
@Mico as to the leading space issue, I just edited to add an extra \relax in the code \getfirst\nextword\relax\relax` which fixes that issue. As to the first problem with non-alphabetic lead characters, that is the exact problem that is addressed by this solution. I have shown how to do it with dashes and a left paren... other characters can be added, as well. — Steven B. Segletes, Jul 10 '16 at 19:06

Extract first letter of each word, also after a special character like a dash

6 Answers6