5

I need access to Emoji characters for a little package I am currently writing. Is there a way of having a fallback font in XeLaTeX for whenever a random character is not available in the standard font?

In my MWE, I tried something similar to what is described here: https://tex.stackexchange.com/a/224585/205359

My MWE:

\documentclass{scrartcl}

\usepackage{fontspec}

\usepackage{newunicodechar} \newfontfamily{\fallbackfont}{Segoe UI Emoji} \DeclareTextFontCommand{\textfallback}{\fallbackfont} \newcommand{\fallbackchar}[2][\textfallback]{% \newunicodechar{#2}{#1{#2}}% } \fallbackchar{}

\begin{document}

This is very mysterious !

\end{document}

This leads to this (correct) output:

enter image description here

Naturally of course, it is very inconvenient and infeasible to manually write a definition for all emojis (which are sometimes updated, too). In the link above, there was a hint to take a look at this post: https://tex.stackexchange.com/a/171872/205359. This approach uses a \foreach loop over different unicode numbers, but I can't find a way to let it loop over a whole text section instead of a list.

All in all, what I want to make possible is to be able to input a text with emojis like in the MWE above (but not (!) with unicode numbers). Subsequently, XeLaTeX should use the regular font for "standard" characters and a specified font for "non-standard" characters (i. e. emojis). Is this, somehow, possible to achieve?

TiMauzi
  • 841
  • I just found this package: https://ctan.org/pkg/combofont - could this help, perhaps? – TiMauzi Mar 31 '22 at 01:34
  • 1
  • There is no 'standard' coverage. If font A has emojis, do you want the emojis to come from font B instead? Unrelated: you can define your own Unicode charclasses, and what happens when transitioning into, and out of, them. Do you need to print complex scripts, like Devanagari? – Cicada Mar 31 '22 at 04:09
  • Looping over everything in an environment (including the document environment) is possible - e.g., via an expl3 token list, and using regex. What is the coverage of your "font space" - can the user input anything, or a restricted set? आઍଋஊඋชນ༆ሙᐚᝌᭈ襤⤖ – Cicada Mar 31 '22 at 04:15
  • Null case: If the user does not have the fallback font, what do you want to happen? – Cicada Mar 31 '22 at 04:18
  • fallbacks can be defined with luaotfload see https://tex.stackexchange.com/a/572220/2388, but with xelatex it is rather difficult. – Ulrike Fischer Mar 31 '22 at 08:29
  • @TiMauzi combofont is for lualatex, and if lualatex is used then using the in-built fallback option of luaotfload is better here. – Ulrike Fischer Mar 31 '22 at 08:30
  • @Cicada The use case is that I want to create a pretty format for Tweets. Thus, basically every character that can be written in Tweets should be printable. – TiMauzi Mar 31 '22 at 13:55
  • FYI, my current approach is using the ucharclasses package. With this, I defined a font for the Unicode sets Dingbats, Emoticons, Miscellaneous Symbols, Miscellaneous Symbols And Arrows, Supplemental Symbols And Pictographs, Symbols And Pictographs Extended A, and Transport And Map Symbols (those are the ones with the most Emojis). But this doesn't seem to be the perfect solution, either. – TiMauzi Mar 31 '22 at 13:57
  • @TiMauzi The twitter developer site says (in a note in the context of widgets): "Tweet text is always displayed in its originally authored language." So that basically sounds like anything in Unicode - Japanese, hieroglyphs, runes, dingbats, mahjong tiles, etc. – Cicada Apr 05 '22 at 06:41

1 Answers1

8

A solution for Xelatex and Lualatex, using expl3.

The method defines "default" fonts: it stores codepoint block information as a set of records in an expl3 sequence-variable, aggregates those blocks into blocksets via an expl3 property list acting as a lookup table, and then assigns a font to that blockset via another property list.

So: Any glyph that doesn't belong to the defined set, can then be typeset in the fallback.


First, the defined set.

Firstly, load up some sequences with codeblock information. For faster lookup, 2-digit codepoint ranges go into a 2-digit list, 3 into 3, and so on, for as many ranges as required. The record format is <start codepoint> <delimiter> <end codepoint> <delimiter> <block name>, where <delimiter> is ;.

\mfsloadaseq{list2digits}{
32;99;Basic Latin
}

\mfsloadaseq{list3digits}{ 100;127;Basic Latin ,128;255;Latin-1 Supplement ,256;383;Latin Extended-A ,384;591;Latin Extended-B ,592;687;IPA Extensions ,688;767;Spacing Modifier Letters ,768;879;Combining Diacritical Marks ,880;999;Greek and Coptic }

\mfsloadaseq{list4digits}{ 1000;1023;Greek and Coptic ,1024;1279;Cyrillic ,1328;1423;Armenian ,2304;2431;Devanagari ,4256;4351;Georgian ,7312;7359;Georgian Extended ,7936;8063;Greek Extended }

\mfsloadaseq{list5digits}{ 11520;11567;Georgian Supplement ,43232;43263;Devanagari Extended ,66432;66463;Ugaritic ,77824;78895;Egyptian Hieroglyphs }

\mfsloadaseq{list6digits}{ 129280;129535;Supplemental Symbols and Pictographs }

Next, group the blocks into blocksets:

\mfsloadaprop{block2blockset}{
Basic Latin=latin
,Latin-1 Supplement=latin
,Latin Extended-A=latin
,Latin Extended-B=latin
... etc
}

The blockset name will become part of the font switch command.

Next, link the blockset name to its font.

\mfsloadaprop{blockset2font}{
latin=Noto Serif
,greek=Alexander
,cyrillic=Charis SIL
,armenian=Noto Serif Armenian
,georgian=Noto Serif Georgian
,symbols=Segoe UI Emoji
,devanagari=Shobhika
,ugaritic=Noto Sans Ugaritic
,egyptian=Noto Sans Egyptian Hieroglyphs
}

Next, some fonts will need fontspec font options set. These go into a property list too, since they are one-to-one between blockset name and option settings. To get past the = and , in the property list's key=value,... syntax, the value uses ; in place of =, and - in place of ,. ; and - are converted back to = and , when actually used in the font command.

\mfsloadaprop{blockset2fontoptions}{
latin=Colour;blue-Scale;2
,devanagari=Script;Devanagari
,greek=Colour;violet-Scale;1.5
}

A command, \ftext, has been defined that can take text input and apply the fonts as requested. The command can also take some (basic) formatting switches (and space), protecting them from the stringification process that happens, and re-inserts them into the output aftwards.

So, given the definitions,

\ftext{abc αβγδабв \\ гдs{աა} {T}he \itshape cat\upshape\space sat on the mat. абвгд αβγδ \bfseries абвгд\mdseries\space\itshape абвгд\upshape\space ა,./ \hashtag   स्कूल

}

produces:

e1

MWE

\documentclass[12pt]{article}
\usepackage{xcolor}
\usepackage{fontspec}
\usepackage{xparse}

%.... \ExplSyntaxOn

    \cs_generate_variant:Nn 
        \seq_set_split:Nnn 
        { cno }


    \cs_generate_variant:Nn 
        \seq_gset_split:Nnn 
        { cno }

\cs_generate_variant:Nn 
        \tl_count_tokens:n 
        { V }



\seq_new:N 
        \l_fc_rweq_seq
\seq_new:N 
        \l_fc_rweqz_seq
\seq_new:N 
        \l_fc_rweqy_seq
\tl_new:N 
        \l_fc_rweqz_tl
\seq_new:N 
        \l_fc_rweqyy_seq
\int_new:N 
        \l_fc_rweqz_int
\int_new:N 
        \l_fc_rweqy_int

\tl_new:N 
        \g_fc_rwenamespace_tl

\tl_gset:Nn 
        \g_fc_rwenamespace_tl
        {mfs}

\tl_new:N 
        \l_fc_rwenamespacea_tl
\tl_new:N 
        \l_fc_rwenamespaceb_tl
\tl_new:N 
        \l_fc_rwenamespacec_tl



%------------------ \cs_set:Npn \ic_funcsortseq:cn #1#2 { % 1=seq, 2=order (<:descending, >:ascending) \seq_sort:cn { #1 } %\l_fc_rweq_seq { \seq_clear:N \l_fc_rweqz_seq \seq_clear:N \l_fc_rweqy_seq \seq_set_split:Nnn \l_fc_rweqz_seq { ; } { ##1 } \seq_set_split:Nnn \l_fc_rweqy_seq { ; } { ##2 } \tl_set:Nx \l_fc_rweqz_tl { \seq_item:Nn \l_fc_rweqz_seq {1} } \tl_set:Nx \l_fc_rweqy_tl { \seq_item:Nn \l_fc_rweqy_seq {1} }

        \int_set:Nn 
                \l_fc_rweqz_int 
                { 
                    \tl_count_tokens:V 
                            \l_fc_rweqz_tl 
                }
        \int_set:Nn 
                \l_fc_rweqy_int 
                { 
                    \tl_count_tokens:V 
                    \l_fc_rweqy_tl 
                }
        \int_compare:nNnTF 
                { \l_fc_rweqz_int } { #2 } { \l_fc_rweqy_int }
                { \sort_return_swapped: }
                { \sort_return_same: }
    }

}

%------------------ \cs_set:Npn \ic_funcsortseqnum:cn #1#2 { % 1=seq, 2=order (<:descending, >:ascending) \seq_sort:cn { #1 } % A;B are both numbers { \seq_clear:N \l_fc_rweqz_seq \seq_clear:N \l_fc_rweqy_seq \seq_set_split:Nnn \l_fc_rweqz_seq { ; } { ##1 } \seq_set_split:Nnn \l_fc_rweqy_seq { ; } { ##2 } \tl_set:Nx \l_fc_rweqz_tl { \seq_item:Nn \l_fc_rweqz_seq {1} } \tl_set:Nx \l_fc_rweqy_tl { \seq_item:Nn \l_fc_rweqy_seq {1} }

        \int_set:Nn 
                \l_fc_rweqz_int 
                { 
                    \l_fc_rweqz_tl 
                }
        \int_set:Nn 
                \l_fc_rweqy_int 
                { 
                    \l_fc_rweqy_tl 
                }
        \int_compare:nNnTF 
                { \l_fc_rweqz_int } { #2 } { \l_fc_rweqy_int }
                { \sort_return_swapped: }
                { \sort_return_same: }
    }

}

%-------------------- \NewDocumentCommand { \mfsloadaseq } { m +m } { % 1=seq name, 2=data

\cs_if_free:cT
        { g_fc_rwe #1 _seq }
        { \seq_new:c
                { g_fc_rwe #1 _seq } 
        }
\seq_gclear:c 
        { g_fc_rwe #1 _seq } 
\seq_gset_split:cno 
        { g_fc_rwe #1 _seq } 
        { , } 
        { #2 }



}

%-------------------- \NewDocumentCommand { \mfsloadaprop } { m +m } { % 1=prop name, 2=data

\cs_if_free:cT
        { g_fc_rwe #1 _prop }
        { \prop_new:c
                { g_fc_rwe #1 _prop } 
        }
\prop_gclear:c 
        { g_fc_rwe #1 _prop } 
\prop_gset_from_keyval:cn 
        { g_fc_rwe #1 _prop } 
        { #2 }

}

%-------------------- \NewDocumentCommand { \mfssortaseq } { m m } { %1=seqname, 2=><, asc/desc \ic_funcsortseq:cn { g_fc_rwe #1 _seq } { #2 }

}

%-------------------- \NewDocumentCommand { \mfssortaseqnum } { m m } { %1=seqname, 2=><, asc/desc \ic_funcsortseqnum:cn { g_fc_rwe #1 _seq } { #2 } }

\seq_new:N \g_fc_current_seq \seq_new:N \g_fc_currentb_seq \tl_new:N \g_fc_current_tl \tl_new:N \l_fc_currentb_tl \seq_new:N \g_fc_fontsn_seq \seq_new:N \g_fc_fontln_seq

%-------------------- Vars \tl_new:N \g_fc_ftext_tl \int_new:N \g_fc_glyphcode_int \tl_new:N \g_fc_glyphcode_tl \int_new:N \g_fc_numdigits_int \tl_new:N \g_fc_numdigits_tl

\tl_new:N \g_fc_myblockname_tl % for lookup \tl_new:N \g_fc_myblocksetname_tl % for lookup; will become font switch \tl_new:N \g_fc_myfontname_tl \tl_new:N \g_fc_prevblocksetname_tl

%-------------------- Functions \cs_set:Npn \fc_funcappplyfont:n #1 { % 1=text \int_gset:Nn \g_fc_glyphcode_int { `#1 }

        \tl_gset:NV 
                \g_fc_glyphcode_tl
                \g_fc_glyphcode_int

        \tl_gset:Nn 
                \g_fc_numdigits_tl
                { 
                    \tl_count_tokens:V 
                            \g_fc_glyphcode_tl 
                }


        \seq_map_function:cN 
                { g_fc_rwe 
                  list
                  \tl_use:N 
                        \g_fc_numdigits_tl
                  digits
                   _seq } 
                \fc_funcgetblock:n 

\int_case:nnF { \g_fc_glyphcode_int } { { `\僜 } { \par } { 127026 } { \itshape } { 127027 } { \upshape } { 127028 } { \c_space_token } { 127029 } { \c_space_token } { 127030 } { \bfseries } { 127031 } { \mdseries } { 127032 } { \hashtag } { 127033 } { \ } } { \symbol{ \int_use:N \g_fc_glyphcode_int } }

}

%------------------ \cs_set:Npn \fc_funcgetblock:n #1 { % 1=seq item

\tl_gset:Nx \g_fc_current_tl { #1 } \seq_gset_split:NnV \g_fc_currentb_seq { ; } \g_fc_current_tl

    \fc_funcgetfontnameb:nn
    { 
    \seq_item:Nn
            \g_fc_currentb_seq 
            { 1 }
     } 
    {
    \seq_item:Nn
            \g_fc_currentb_seq 
            { 2 }
     } 


}

\tl_new:N \g_fc_fontoptionsa_tl

%------------------ \cs_set:Npn \fc_funcgetfontnameb:nn #1#2 { % 1=startcode in block range % 2=finishcode in block range

\bool_if:nT
{

\int_compare_p:n { #1 <= \g_fc_glyphcode_int } && \int_compare_p:n { #2 >= \g_fc_glyphcode_int }
} {

                \tl_set:Nn 
                        \g_fc_myblockname_tl
                        {

% found: \seq_item:Nn \g_fc_currentb_seq { 3 } }

        \exp_args:Nnx
                \prop_get:cnN%TF
                        { g_fc_rwe block2blockset _prop }
                        { \g_fc_myblockname_tl }
                        \g_fc_myblocksetname_tl

% { T } % { F }

        \exp_args:Nnx
                \prop_get:cnN
                        { g_fc_rwe blockset2font _prop }
                        { \g_fc_myblocksetname_tl }
                        \g_fc_myfontname_tl


        \exp_args:Nnx
                \prop_get:cnNTF
                        { g_fc_rwe blockset2fontoptions _prop }
                        { \tl_use:N \g_fc_myblocksetname_tl }
                        \g_fc_fontoptionsa_tl

                        {

% T: \regex_replace_all:nnN { ; } { = } \g_fc_fontoptionsa_tl \regex_replace_all:nnN { - } { , } \g_fc_fontoptionsa_tl }{ % F: \tl_clear:N \g_fc_fontoptionsa_tl }

        \tl_if_eq:NNF
        \g_fc_prevblocksetname_tl
        \g_fc_myblocksetname_tl
        {
            \cs_if_free:cT
                { ffc\g_fc_myblocksetname_tl }
                { 
                    \exp_args:Nxx
                        \newfontfamily
                        {   \use:c { ffc\g_fc_myblocksetname_tl } }
                        { \tl_use:N \g_fc_myfontname_tl }
                        [
                            \tl_use:N \g_fc_fontoptionsa_tl     

% NFSSFamily = \tl_use:N \l_fc_rwenamespaceb_tl % , % \l_fc_rwenamespacec_tl ]

                }
            \use:c { ffc\g_fc_myblocksetname_tl }

            \tl_set_eq:NN
                \g_fc_prevblocksetname_tl
                \g_fc_myblocksetname_tl

        }

                 } % if glyphcode match found

}

\tl_new:N \l_fc_uregex_tl %-------------------- FText \NewDocumentCommand \ftext { o +m } { % 1 = text \tl_gset:Nn \g_fc_ftext_tl { #2 } \IfNoValueTF { #1 } { } { \tl_to_str:N \g_fc_ftext_tl #1\par} \regex_replace_all:nnN { \c{par} } { 僜 } \g_fc_ftext_tl

            \regex_replace_all:nnN
                    { \c{itshape} }
                    {  }
                    \g_fc_ftext_tl

            \regex_replace_all:nnN
                    { \c{upshape} }
                    {  }
                    \g_fc_ftext_tl

            \regex_replace_all:nnN
                    { \c[S](.) }
                    {  }
                    \g_fc_ftext_tl

            \regex_replace_all:nnN
                    { \c{space} }
                    {  }
                    \g_fc_ftext_tl

            \regex_replace_all:nnN
                    { \c{bfseries} }
                    {  }
                    \g_fc_ftext_tl

            \regex_replace_all:nnN
                    { \c{mdseries} }
                    {  }
                    \g_fc_ftext_tl

            \regex_replace_all:nnN
                    { \c{hashtag} }
                    {  }
                    \g_fc_ftext_tl

            \tl_set:Nn \l_fc_uregex_tl { \\ }       
            \regex_replace_all:nnN
                    { \u { l_fc_uregex_tl } }
                    {  }
                    \g_fc_ftext_tl




        \exp_args:Nx
        \str_map_function:nN 
                { \g_fc_ftext_tl }
                \fc_funcappplyfont:n


}

%-------------------- hashtag \NewDocumentCommand \hashtag { } { # }

\ExplSyntaxOff

%---

%Data

\mfsloadaseq{list2digits}{ 32;99;Basic Latin }

\mfsloadaseq{list3digits}{ 100;127;Basic Latin ,128;255;Latin-1 Supplement ,256;383;Latin Extended-A ,384;591;Latin Extended-B ,592;687;IPA Extensions ,688;767;Spacing Modifier Letters ,768;879;Combining Diacritical Marks ,880;999;Greek and Coptic }

\mfsloadaseq{list4digits}{ 1000;1023;Greek and Coptic ,1024;1279;Cyrillic ,1328;1423;Armenian ,2304;2431;Devanagari ,4256;4351;Georgian ,7312;7359;Georgian Extended ,7936;8063;Greek Extended }

\mfsloadaseq{list5digits}{ 11520;11567;Georgian Supplement ,43232;43263;Devanagari Extended ,66432;66463;Ugaritic ,77824;78895;Egyptian Hieroglyphs }

\mfsloadaseq{list6digits}{ 129280;129535;Supplemental Symbols and Pictographs } %129379

\mfsloadaprop{block2blockset}{ Basic Latin=latin ,Latin-1 Supplement=latin ,Latin Extended-A=latin ,Latin Extended-B=latin ,IPA Extensions=latin ,Spacing Modifier Letters=latin ,Combining Diacritical Marks=latin ,Greek and Coptic=greek ,Greek Extended=greek ,Cyrillic=cyrillic ,Armenian=armenian ,Georgian=georgian ,Georgian Extended=georgian ,Georgian Supplement=georgian ,Supplemental Symbols and Pictographs=symbols ,Devanagari=devanagari ,Devanagari Extended=devanagari ,Ugaritic=ugaritic ,Egyptian Hieroglyphs=egyptian }

\mfsloadaprop{blockset2font}{ latin=Noto Serif ,greek=Alexander%;Noto Serif ,cyrillic=Charis SIL%Asimov%BBC Reith Serif Light%Noto Serif ,armenian=Noto Serif Armenian ,georgian=Noto Serif Georgian ,symbols=Segoe UI Emoji%EmojiOne Color ,devanagari=Shobhika ,ugaritic=Noto Sans Ugaritic ,egyptian=Noto Sans Egyptian Hieroglyphs }

\mfsloadaprop{blockset2fontoptions}{ latin=Colour;blue-Scale;2 ,devanagari=Script;Devanagari ,greek=Colour;violet-Scale;1.5 }

%=============================================================== \begin{document}

\ftext{abc αβγδабв \ гдs{աა} {T}he \itshape cat\upshape\space sat on the mat. абвгд αβγδ \bfseries абвгд\mdseries\space\itshape абвгд\upshape\space ა,./ \hashtag स्कूल

}

\end{document}


Version2

Adding a namespace component to the data model allows multiple mapping definitions (independent sets of sets) to exist in the same document. This is done by specifying the namespace as an option to the commands.

Example:

%Data

\mfsloadaseq{list2digits}{ 32;99;Basic Latin }

\mfsloadaseq{list3digits}{ 100;127;Basic Latin ,592;687;IPA Extensions ,969;969;Greek ω }

\mfsloadaseq{list4digits}{ 1080;1080;Cyrillic и ,1377;1377;Armenian ա }

\mfsloadaprop{block2blockset}{ Basic Latin=ipa ,IPA Extensions=ipa ,Greek ω=ipa ,Cyrillic и=ipa ,Armenian ա=ipa2 }

\mfsloadaprop{blockset2font}{ ipa=Noto Sans Mono ,ipa2=Noto Sans Armenian }

\mfsloadaprop{blockset2fontoptions}{ ipa=Colour;blue-Scale;1.1 ,ipa2=Colour;blue-Scale;1.1 }

is in the default namespace (=nothing), while:

%Data

\mfsloadaseq[x2]{list2digits}{ 32;99;Basic Latin }

\mfsloadaseq[x2]{list3digits}{ 100;127;Basic Latin ,592;687;IPA Extensions ,969;969;Greek ω }

\mfsloadaseq[x2]{list4digits}{ 1080;1080;Cyrillic и ,1377;1377;Armenian ա }

\mfsloadaprop[x2]{block2blockset}{ Basic Latin=ipa ,IPA Extensions=ipa ,Greek ω=ipa ,Cyrillic и=ipa ,Armenian ա=ipa2 }

\mfsloadaprop[x2]{blockset2font}{ ipa=Noto Serif ,ipa2=Noto Serif Armenian }

\mfsloadaprop[x2]{blockset2fontoptions}{ ipa=Colour;red-Scale;2.1 ,ipa2=Colour;red-Scale;2.1 }

is in thex2 namespace (arbitrary name).

The result is that a vessel [\ftext{ա ωesʹiиl}, \ftext[x2]{ա ωesʹiиl}]. produces:

e2

More practical example - adding Latin quotation marks to Devanagari text:

e3

MWE

\documentclass[12pt]{article}
\usepackage{xcolor}
\usepackage{fontspec}
\usepackage{xparse}

%............................................................................................... \ExplSyntaxOn

    \cs_generate_variant:Nn 
        \seq_set_split:Nnn 
        { cno }


    \cs_generate_variant:Nn 
        \seq_gset_split:Nnn 
        { cno }

\cs_generate_variant:Nn 
        \tl_count_tokens:n 
        { V }



\seq_new:N 
        \l_fc_rweq_seq
\seq_new:N 
        \l_fc_rweqz_seq
\seq_new:N 
        \l_fc_rweqy_seq
\tl_new:N 
        \l_fc_rweqz_tl
\seq_new:N 
        \l_fc_rweqyy_seq
\int_new:N 
        \l_fc_rweqz_int
\int_new:N 
        \l_fc_rweqy_int

\tl_new:N 
        \g_fc_rwenamespace_tl

\tl_gset:Nn 
        \g_fc_rwenamespace_tl
        {mfs}

\tl_new:N 
        \l_fc_rwenamespacea_tl
\tl_new:N 
        \l_fc_rwenamespaceb_tl
\tl_new:N 
        \l_fc_rwenamespacec_tl



%**************************************************** %* %**************************************************** %------------------ \cs_set:Npn \ic_funcsortseq:cn #1#2 { % 1=seq, 2=order (<:descending, >:ascending) \seq_sort:cn { #1 } %\l_fc_rweq_seq { \seq_clear:N \l_fc_rweqz_seq \seq_clear:N \l_fc_rweqy_seq \seq_set_split:Nnn \l_fc_rweqz_seq { ; } { ##1 } \seq_set_split:Nnn \l_fc_rweqy_seq { ; } { ##2 } \tl_set:Nx \l_fc_rweqz_tl { \seq_item:Nn \l_fc_rweqz_seq {1} } \tl_set:Nx \l_fc_rweqy_tl { \seq_item:Nn \l_fc_rweqy_seq {1} }

        \int_set:Nn 
                \l_fc_rweqz_int 
                { 
                    \tl_count_tokens:V 
                            \l_fc_rweqz_tl 
                }
        \int_set:Nn 
                \l_fc_rweqy_int 
                { 
                    \tl_count_tokens:V 
                    \l_fc_rweqy_tl 
                }
        \int_compare:nNnTF 
                { \l_fc_rweqz_int } { #2 } { \l_fc_rweqy_int }
                { \sort_return_swapped: }
                { \sort_return_same: }
    }

}

%**************************************************** %* %**************************************************** %------------------ \cs_set:Npn \ic_funcsortseqnum:cn #1#2 { % 1=seq, 2=order (<:descending, >:ascending) \seq_sort:cn { #1 } % A;B are both numbers { \seq_clear:N \l_fc_rweqz_seq \seq_clear:N \l_fc_rweqy_seq \seq_set_split:Nnn \l_fc_rweqz_seq { ; } { ##1 } \seq_set_split:Nnn \l_fc_rweqy_seq { ; } { ##2 } \tl_set:Nx \l_fc_rweqz_tl { \seq_item:Nn \l_fc_rweqz_seq {1} } \tl_set:Nx \l_fc_rweqy_tl { \seq_item:Nn \l_fc_rweqy_seq {1} }

        \int_set:Nn 
                \l_fc_rweqz_int 
                { 
                    \l_fc_rweqz_tl 
                }
        \int_set:Nn 
                \l_fc_rweqy_int 
                { 
                    \l_fc_rweqy_tl 
                }
        \int_compare:nNnTF 
                { \l_fc_rweqz_int } { #2 } { \l_fc_rweqy_int }
                { \sort_return_swapped: }
                { \sort_return_same: }
    }

}

%**************************************************** %* %**************************************************** %-------------------- \NewDocumentCommand { \mfsloadaseq } { o m +m } { % 1=seq name, 2=data

            \IfNoValueTF { #1 } 
                    { \tl_clear:N \g_fc_namespace_tl } 
                    { \tl_gset:Nn \g_fc_namespace_tl { #1 } }


\cs_if_free:cT
        { g_fc_rwe \g_fc_namespace_tl #2 _seq }
        { \seq_new:c
                { g_fc_rwe \g_fc_namespace_tl #2 _seq } 
        }
\seq_gclear:c 
        { g_fc_rwe \g_fc_namespace_tl #2 _seq } 
\seq_gset_split:cno 
        { g_fc_rwe \g_fc_namespace_tl #2 _seq } 
        { , } 
        { #3 }

% \seq_show:c % { g_fc_rwe \g_fc_namespace_tl #2 _seq }

}

%**************************************************** %* %**************************************************** %-------------------- \NewDocumentCommand { \mfsloadaprop } { o m +m } { % 1=NS, 2=prop name, 3=data

            \IfNoValueTF { #1 } 
                    { \tl_clear:N \g_fc_namespace_tl } 
                    { \tl_gset:Nn \g_fc_namespace_tl { #1 } }

\cs_if_free:cT
        { g_fc_rwe \g_fc_namespace_tl #2 _prop }
        { \prop_new:c
                { g_fc_rwe \g_fc_namespace_tl #2 _prop } 
        }
\prop_gclear:c 
        { g_fc_rwe \g_fc_namespace_tl #2 _prop } 
\prop_gset_from_keyval:cn 
        { g_fc_rwe \g_fc_namespace_tl #2 _prop } 
        { #3 }

}

%**************************************************** %* %**************************************************** %-------------------- \NewDocumentCommand { \mfssortaseq } { o m m } { %1=seqname, 2=><, asc/desc

            \IfNoValueTF { #1 } 
                    { \tl_clear:N \g_fc_namespace_tl } 
                    { \tl_gset:Nn \g_fc_namespace_tl { #1 } }

\ic_funcsortseq:cn { g_fc_rwe \g_fc_namespace_tl #2 _seq } { #3 }

}

%**************************************************** %* %**************************************************** %-------------------- \NewDocumentCommand { \mfssortaseqnum } { o m m } { %1=seqname, 2=><, asc/desc

            \IfNoValueTF { #1 } 
                    { \tl_clear:N \g_fc_namespace_tl } 
                    { \tl_gset:Nn \g_fc_namespace_tl { #1 } }

\ic_funcsortseqnum:cn { g_fc_rwe \g_fc_namespace_tl #2 _seq } { #3 } }

\seq_new:N \g_fc_current_seq \seq_new:N \g_fc_currentb_seq \tl_new:N \g_fc_current_tl \tl_new:N \l_fc_currentb_tl \seq_new:N \g_fc_fontsn_seq \seq_new:N \g_fc_fontln_seq

%**************************************************** %* %**************************************************** %-------------------- Meta \NewDocumentCommand \mm { m } { % 1 = text \vspace{2\baselineskip}#1

\str_set:Nn \l_tmpa_str { #1 } \str_use:N \l_tmpa_str

}

%**************************************************** %* %**************************************************** %-------------------- Vars \tl_new:N \g_fc_ftext_tl \int_new:N \g_fc_glyphcode_int \tl_new:N \g_fc_glyphcode_tl \int_new:N \g_fc_numdigits_int \tl_new:N \g_fc_numdigits_tl

\tl_new:N \g_fc_myblockname_tl % for lookup \tl_new:N \g_fc_myblocksetname_tl % for lookup; will become font switch \tl_new:N \g_fc_myfontname_tl \tl_new:N \g_fc_prevblocksetname_tl

%**************************************************** %* %**************************************************** %-------------------- Functions %------------------ \cs_set:Npn \fc_funcappplyfont:n #1 { % 1=text \int_gset:Nn \g_fc_glyphcode_int { `#1 }

        \tl_gset:NV 
                \g_fc_glyphcode_tl
                \g_fc_glyphcode_int

        \tl_gset:Nn 
                \g_fc_numdigits_tl
                { 
                    \tl_count_tokens:V 
                            \g_fc_glyphcode_tl 
                }

% \seq_show:c % { g_fc_rwe % list % \tl_use:N % \g_fc_namespace_tl % \tl_use:N % \g_fc_numdigits_tl % digits % _seq }

        \seq_map_function:cN 
                { g_fc_rwe 
                  \tl_use:N
                        \g_fc_namespace_tl 
                  list
                  \tl_use:N 
                        \g_fc_numdigits_tl
                  digits
                   _seq } 
                \fc_funcgetblock:n 

\int_case:nnF { \g_fc_glyphcode_int } { { `\僜 } { \par } { 127026 } { \itshape } { 127027 } { \upshape } { 127028 } { \c_space_token } { 127029 } { \c_space_token } { 127030 } { \bfseries } { 127031 } { \mdseries } { 127032 } { \hashtag } { 127033 } { \ } } { \symbol{ \int_use:N \g_fc_glyphcode_int } }

}

%**************************************************** %* %**************************************************** %------------------ \cs_set:Npn \fc_funcgetblock:n #1 { % 1=seq item

\tl_gset:Nx \g_fc_current_tl { #1 } \seq_gset_split:NnV \g_fc_currentb_seq { ; } \g_fc_current_tl

    \fc_funcgetfontnameb:nn
    { 
    \seq_item:Nn
            \g_fc_currentb_seq 
            { 1 }
     } 
    {
    \seq_item:Nn
            \g_fc_currentb_seq 
            { 2 }
     } 


}

\tl_new:N \g_fc_fontoptionsa_tl

%**************************************************** %* %**************************************************** %------------------ \cs_set:Npn \fc_funcgetfontnameb:nn #1#2 { % 1=startcode in block range % 2=finishcode in block range

\bool_if:nT
{

\int_compare_p:n { #1 <= \g_fc_glyphcode_int } && \int_compare_p:n { #2 >= \g_fc_glyphcode_int }
} {

                \tl_set:Nn 
                        \g_fc_myblockname_tl
                        {

% found: \seq_item:Nn \g_fc_currentb_seq { 3 } }

% \prop_show:c % { g_fc_rwe \g_fc_namespace_tl block2blockset _prop } \exp_args:Nxx \prop_get:cnN%TF { g_fc_rwe \g_fc_namespace_tl block2blockset _prop } { \g_fc_myblockname_tl } \g_fc_myblocksetname_tl % { T } % { F }

        \exp_args:Nxx
                \prop_get:cnN
                        { g_fc_rwe \g_fc_namespace_tl blockset2font _prop }
                        { \g_fc_myblocksetname_tl }
                        \g_fc_myfontname_tl


        \exp_args:Nxx
                \prop_get:cnNTF
                        { g_fc_rwe \g_fc_namespace_tl blockset2fontoptions _prop }
                        { \tl_use:N \g_fc_myblocksetname_tl }
                        \g_fc_fontoptionsa_tl

                        {

% T: \regex_replace_all:nnN { ; } { = } \g_fc_fontoptionsa_tl \regex_replace_all:nnN { - } { , } \g_fc_fontoptionsa_tl }{ % F: \tl_clear:N \g_fc_fontoptionsa_tl }

        \tl_if_eq:NNF
        \g_fc_prevblocksetname_tl
        \g_fc_myblocksetname_tl
        {
            \cs_if_free:cT
                { ffc\g_fc_namespace_tl\g_fc_myblocksetname_tl }
                { 
                    \exp_args:Nxx
                        \newfontfamily
                        {   \use:c { ffc\g_fc_namespace_tl\g_fc_myblocksetname_tl } }
                        { \tl_use:N \g_fc_myfontname_tl }
                        [
                            \tl_use:N \g_fc_fontoptionsa_tl     

% NFSSFamily = \tl_use:N \l_fc_rwenamespaceb_tl % , % \l_fc_rwenamespacec_tl ]

                }
            \use:c { ffc\g_fc_namespace_tl\g_fc_myblocksetname_tl }

            \tl_set_eq:NN
                \g_fc_prevblocksetname_tl
                \g_fc_myblocksetname_tl

        }

                 } % if glyphcode match found

}

\tl_new:N \l_fc_uregex_tl \tl_new:N \g_fc_namespace_tl

%**************************************************** %* %**************************************************** %-------------------- FText \NewDocumentCommand \ftext { o +m } { % 1 = text \tl_gset:Nn \g_fc_ftext_tl { #2 } \IfNoValueTF { #1 } { \tl_clear:N \g_fc_namespace_tl } { \tl_gset:Nn \g_fc_namespace_tl { #1 } } %\tl_show:N \g_fc_namespace_tl \regex_replace_all:nnN { \c{par} } { 僜 } \g_fc_ftext_tl

            \regex_replace_all:nnN
                    { \c{itshape} }
                    {  }
                    \g_fc_ftext_tl

            \regex_replace_all:nnN
                    { \c{upshape} }
                    {  }
                    \g_fc_ftext_tl

            \regex_replace_all:nnN
                    { \c[S](.) }
                    {  }
                    \g_fc_ftext_tl

            \regex_replace_all:nnN
                    { \c{space} }
                    {  }
                    \g_fc_ftext_tl

            \regex_replace_all:nnN
                    { \c{bfseries} }
                    {  }
                    \g_fc_ftext_tl

            \regex_replace_all:nnN
                    { \c{mdseries} }
                    {  }
                    \g_fc_ftext_tl

            \regex_replace_all:nnN
                    { \c{hashtag} }
                    {  }
                    \g_fc_ftext_tl

            \tl_set:Nn \l_fc_uregex_tl { \\ }       
            \regex_replace_all:nnN
                    { \u { l_fc_uregex_tl } }
                    {  }
                    \g_fc_ftext_tl



    \group_begin:               
        \exp_args:Nx
        \str_map_function:nN 
                { \g_fc_ftext_tl }
                \fc_funcappplyfont:n
    \group_end:

% \tl_use:N % \g_fc_ftext_tl

}

%**************************************************** %* %**************************************************** %-------------------- hashtag \NewDocumentCommand \hashtag { } { # }

\ExplSyntaxOff

%------------------------------------------------------------------------------

%Data

\mfsloadaseq{list2digits}{ 32;99;Basic Latin }

\mfsloadaseq{list3digits}{ 100;127;Basic Latin ,592;687;IPA Extensions ,969;969;Greek ω }

\mfsloadaseq{list4digits}{ 1080;1080;Cyrillic и ,1377;1377;Armenian ա }

\mfsloadaprop{block2blockset}{ Basic Latin=ipa ,IPA Extensions=ipa ,Greek ω=ipa ,Cyrillic и=ipa ,Armenian ա=ipa2 }

\mfsloadaprop{blockset2font}{ ipa=Noto Sans Mono ,ipa2=Noto Sans Armenian }

\mfsloadaprop{blockset2fontoptions}{ ipa=Colour;blue-Scale;1.1 ,ipa2=Colour;blue-Scale;1.1 }

%Data

\mfsloadaseq[x2]{list2digits}{ 32;99;Basic Latin }

\mfsloadaseq[x2]{list3digits}{ 100;127;Basic Latin ,592;687;IPA Extensions ,969;969;Greek ω }

\mfsloadaseq[x2]{list4digits}{ 1080;1080;Cyrillic и ,1377;1377;Armenian ա }

\mfsloadaprop[x2]{block2blockset}{ Basic Latin=ipa ,IPA Extensions=ipa ,Greek ω=ipa ,Cyrillic и=ipa ,Armenian ա=ipa2 }

\mfsloadaprop[x2]{blockset2font}{ ipa=Noto Serif ,ipa2=Noto Serif Armenian }

\mfsloadaprop[x2]{blockset2fontoptions}{ ipa=Colour;red-Scale;2.1 ,ipa2=Colour;red-Scale;2.1 }

%Data

\mfsloadaseq[dq]{list2digits}{ 32;99;Basic Latin }

\mfsloadaseq[dq]{list3digits}{ 100;127;Basic Latin }

\mfsloadaseq[dq]{list4digits}{ 2304;2431;Devanagari ,8216;8223;quotes }

\mfsloadaseq[dq]{list5digits}{ 43232;43263;Devanagari Extended }

\mfsloadaprop[dq]{block2blockset}{ Basic Latin=dql ,Devanagari=dq ,Devanagari Extended=dq ,quotes=dql }

\mfsloadaprop[dq]{blockset2font}{ dql=Noto Serif ,dq=Noto Serif Devanagari }

\mfsloadaprop[dq]{blockset2fontoptions}{ dq=Script;Devanagari }

%=============================================================== \begin{document}

a vessel [\ftext{ա ωesʹiиl}, \ftext[x2]{ա ωesʹiиl}].

\bigskip \ftext[dq]{“कखागीघ्ङूचिः”}

\end{document}

Cicada
  • 10,129
  • Can we use commands inside \ftext like \ftext{$\sin$ $\alpha$}? Since, I am getting this output $\protect \mathop {\mathgroup \symoperators sin}\nolimits $ $\alpha $ while using the same. – Kumaresh PS Apr 26 '23 at 06:35
  • @KumareshPS Ask a separate question. I'm not sure what you mean ($\sin$ will print in the maths font, not the text font). Here, simple text formatting commands (tokens, like \itshape) are being hidden from the regex parser's replacement function by being converted to arbitrary Unicode characters unlikely to be used (domino tiles) then re-inserted as tokens again. If you use unicode-math package, you can define custom math fonts and use them to print in math mode. Math mode and text mode are separate processing modes in TeX. – Cicada May 05 '23 at 09:09