5

The ucharclasses package is very helpful for automatic font switching based on Unicode blocks. However, it is very, very slow.

One way to speed it up is to restrict its scope on certain Unicode blocks/scripts by giving the blocks or block ranges as an option. Unfortunately, this can have negative side effects when blocks are encountered, that are not selected at package load time. (see also the discussion around my previous question here: How can I use ucharclasses to change the font for a special script and then restore to what it was before?)

Ideally, I'd like to avoid restricting ucharclasses to certain Unicode blocks and load the package without any options. Is there any way conveivable to speed up the usage of the package? Potentially by rewriting parts of the package?

kongo09
  • 2,486
  • I can't test right at the moment. Is loading the package slow, xor are things slow when compiling the main part of the document? – Bruno Le Floch Oct 22 '11 at 14:33
  • I guess it is loading the package. There are a few loops that assign stuff to all (relevant) Unicode characters, as far as I understand. And these are quite a few... – kongo09 Oct 22 '11 at 17:27
  • Then it can definitely be improved (a quick look at the code tells me that they are not using optimal loops). – Bruno Le Floch Oct 22 '11 at 17:53
  • 1
    I provide a solution to speed up font switching. – Leo Liu Oct 22 '11 at 18:40
  • With a completely stripped down version of the package I get, with a file that loads all blocks and define transitions for each (with \setDefaultTransitions), a compilation time of 4.06 seconds (for the original package they are 43.41). – egreg Oct 23 '11 at 21:00

2 Answers2

5

The setup code in ucharclasses uses the awfully slow \forloop for speed critical parts of the code. Using a \loop ... \repeat construction instead is roughly 100 times faster. I agree that it is not quite as clean, but since we are talking about setting up every Unicode character, there is a lot of work to be done for each document.

Hence, it would be better if the code was

\newcommand{\@ucc@forloop}[1]
  {\expandafter\@ucc@forloop@\csname c@#1\endcsname}
\newcommand{\@ucc@forloop@}[4]{%
  #1=#2\relax
  \loop
    #4\relax
  \ifnum#1<#3\relax
    \advance#1 by \@ne
  \repeat
}
\newcounter{glyphcounter}
\newcommand{\@defineUnicodeClass}[3]{%
  \newXeTeXintercharclass#1
  \@ucc@forloop {glyphcounter}{#2}{#3}
    {\XeTeXcharclass\value{glyphcounter}=#1}
}

instead of the current definition of \@defineUnicodeClass, which is

\newcounter{glyphcounter}
\newcommand{\@defineUnicodeClass}[3]{%
  \newXeTeXintercharclass#1
  %\message{Package ucharclasses Message: #1 was assigned \the#1}
  \forloop{glyphcounter}{#2}{\value{glyphcounter}<#3}{\XeTeXcharclass\value{glyphcounter}=#1}
  \XeTeXcharclass#3=#1}

So essentially, changing the \forloop line into a faster variant would increase performance of the setup code by a sizeable factor.

However, I should warn you that the license of this package is non-free, so I am not sure whether changing your sty, even renaming it, is legal.

  • 1
    I emailed the XeTeX mailing list to propose that change, and ask about the license issue. – Bruno Le Floch Oct 22 '11 at 19:14
  • Using \count@ instead of a LaTeX counter would save some expansions. – egreg Oct 22 '11 at 21:46
  • @egreg: you are entirely right. In order to maximize the chances of this fix going into the code, I tried to make it as small a change as possible. Of course, I think we can probably squeeze a factor of 2 to 5 by optimizing here and there in the package, but I'd first like to get the big stuff down :). – Bruno Le Floch Oct 22 '11 at 22:44
  • 1
    Avoiding \newcommand\enableX and saying \DeclareOption{X}{\overrideClassLoading\let\enableX\@empty} and then test with \ifdefined\enableX might save some loading time. All those control sequences are just flags. There's also a question here: http://tex.stackexchange.com/questions/30742/how-best-to-run-through-a-series-of-elements-a-1-i-b-2-ii-etc – egreg Oct 22 '11 at 23:00
  • Ah, now that you point to it, I remember seeing that question some time ago. That's good. It means that the package maintainer is ready to modify the code, so perhaps to bring in speed improvements. – Bruno Le Floch Oct 22 '11 at 23:10
  • I've succeeded into downsizing ucharclasses.sty down to 31805 bytes from 84279 and the loading of all blocks is very fast. All block definitions are in one list, so this list can be reused many times. Some other optimization can be done by acting on block groups. – egreg Oct 23 '11 at 10:16
  • @egreg, that's good, but the main question is whether the package's maintainer will care. The discussion I started on the XeTeX mailing list diverged to become a licensing discussion :(. – Bruno Le Floch Oct 23 '11 at 21:34
  • I've mailed directly to Max – egreg Oct 23 '11 at 21:50
  • Thanks for this wealth of insight and contribution in such short time. This helped me (and certainly quite a few others) a lot. Let's hope, Max will stick this in (and maybe even loosens his license so that the package can make it into TeXLive...) – kongo09 Oct 24 '11 at 08:40
  • Latest version (2.3 from 2017, Unicode 10 support) says license is public domain. The loop repeat was put in in v2.0. Load time for all classes is seconds. – Cicada Sep 05 '19 at 10:06
4

\fontspec is quite slow. If you use \setTransitionTo, use low-level font commands instead, that will be much faster.

Example:

\documentclass{article}
\usepackage[Devanagari]{ucharclasses}
\font\mangal="Mangal"
\setTransitionsFor{Devanagari}{\begingroup\mangal}{\endgroup}

\begin{document}
text and ताजा धनिया के साथ अनायास and text
\end{document}

It is much faster to use low-level font \mangal instead of \fontspec{Arial Unicode MS} or a command defined by \newfontfamily. If you use quite a lot of these transitions, it is quite clear.

However, it is not compatible with LaTeX2e's NFSS, very bad. For a better solution, see below.


(For advanced users)

In xeCJK, we use a font cache mechanism. Similar thing can be done: when change to a new Unicode block, cache the font by calling \fontspec and \external@font, and use the low-level command later.

Full code:

\documentclass{article}
\usepackage[Devanagari]{ucharclasses}
\usepackage{fontspec}
\newfontfamily\mangal{Mangal}

\makeatletter
% inherited from \xeCJK@setfont of xeCJK
\def\sethindifont{% Only one family here, it is simpler than xeCJK
  \ifcsname hindi@\f@series/\f@shape/\f@size\endcsname
    \@nameuse{hindi@\f@series/\f@shape/\f@size}%
  \else
    \mangal
    \get@external@font
    \expandafter\global\expandafter\font
      \csname hindi@\f@series/\f@shape/\f@size\endcsname=\external@font
  \fi}
\makeatother

% proof of code only, should have a loop
% but the code in ucharclasses have too many extra spaces
\def\ResetTransitionTo#1{%
  \XeTeXinterchartoks 255 \csname#1Class\endcsname{\relax}}

\setTransitionsFor{Devanagari}
  {\begingroup\ResetTransitionTo{Devanagari}\sethindifont}
  {\endgroup}

\begin{document}
text and ताजा धनिया के साथ अनायास and text \textbf{and ताजा धनिया के साथ अनायास and text}
\end{document}
Leo Liu
  • 77,365
  • I think that the main speed problem is in the setup phase: if you don't give any option to the ucharclasses package, it loads every Unicode block, and does so in a very inefficient way (I gave up after 60 seconds). – Bruno Le Floch Oct 22 '11 at 19:03
  • @Bruno: Indeed. \forloop is very slow. I couldn't imagine that would slow down loading the package so much. Font cache can speed up compiling large document efficiently, but not that useful for small documents. – Leo Liu Oct 22 '11 at 19:17
  • What exactly do you mean by proof of code only? What would I have to add/change to make production use of it? – kongo09 Oct 24 '11 at 08:46
  • @kongo09: You should reset all tokens from state i to state \DevenagariClass, where i is not equals to \DevenagariClass. That is to say, a improved version of \setTransitionsTo{Devanagari}{\relax} after \begingroup. Personally I don't like ucharclasses package, we often need about 2 or 3 character classes only, but the package is too heavy. – Leo Liu Oct 24 '11 at 13:24
  • Thanks. I have to admit, I don't really understand the inner workings here and wouldn't know how to use your reply. Since in my case it seems to be working without this resetting and ucharclasses overall seems to ignore it as well, I hope I'm not going awfully wrong without it for the time being... – kongo09 Oct 24 '11 at 13:39
  • @kongo09: My code will fail if you use textताजा (no spaces between different language). It may be not very important for some cases. – Leo Liu Oct 24 '11 at 14:11