0

Context

I am using Sphinx and latexpdf to build reStructured documentation for my project. The documentation draws in remote data that it cannot control (only escape/parse etc).

My current build command is:

make latexpdf; make html;

I recently encountered errors which were stopping my Bitbucket Pipelines when latexpdf was encountering certain UTF-8 characters in the rst files it was trying to parse.

I found that the solution to the UTF-8 issue is:

Which solved the problem with the following DeclareUnicodeCharacter 'commands':

#conf.py
...
latex_elements = {
    # The paper size ('letterpaper' or 'a4paper').
    #
    # 'papersize': 'letterpaper',

    # The font size ('10pt', '11pt' or '12pt').
    #
    # 'pointsize': '10pt',

    # Additional stuff for the LaTeX preamble.
    #
    #'preamble': r'\DeclareUnicodeCharacter{FF08}{$\bullet$}',
    'preamble': r'''
        \DeclareUnicodeCharacter{FF08}{$\bullet$}
        \DeclareUnicodeCharacter{FF09}{$\bullet$}
        \DeclareUnicodeCharacter{FF0C}{$\bullet$}
        \DeclareUnicodeCharacter{2161}{$\bullet$}
        ''',


    # Latex figure (float) alignment
    #
    # 'figure_align': 'htbp',

    ...
}
...

The setting above converts the following UTF-8 Characters to a bullet point:

  • U+FF08
  • U+FF09
  • U+FF0C
  • U+2161

So, I have fixed the problem by suppressing it.

The only reason why I replace it with a bullet is that I borrowed that code from one of the solutions found on the net.

Aside from {$\bullet$}, what other values can be inserted into the second 'parameter' of the DeclareUnicodeCharacter 'commands'?

Where can I discover the list of values that are available here?

David Carlisle
  • 757,742

1 Answers1

2

That parameter takes arbitrary latex code, you can put {my name is John Walker} or anything else you want there.

But the settings you show are rather odd. U+FF08 is FULLWIDTH LEFT PARENTHESIS so a form of ( intended for full width CJK typesetting, but you have set it to $\bullet$ ?

U+FF09 is the matching )

U+FF0C is FULLWIDTH COMMA, which again would not normally typeset as a bullet.

and U+2161 is ROMAN NUMERAL TWO which should typeset as ii not as a bullet.

David Carlisle
  • 757,742
  • Thanks. I agree. The only reason why I use a $\bullet$ is that I copied it from someone on the internet.

    It is handy to know that the bullet is appropriate for some of these.

    My question is really trying to understand how you would know that $\bullet$ can be used instead of, $\bulletpoint$. Do you know where the keyphrase bullet is published as an option for that second parameter to DeclareUnicodeCharacter?

    For example. If I wanted to replace U+FF08 with a normal left parenthesis, I guess that I should use: $\leftparenthesis$ to replace it, but that is a stab in dark.

    – John Walker Dec 31 '19 at 14:50
  • 1
    @JohnWalker the fact that the command is called \bullet not \bulletpoint is unrelated to \DeclareUnicodeCharacter it is just asking "what latex commands are defined" ? (actually \textbullet would be better than $\bullet$ if you wanted a bullet as then it would use the current font if that had a bullet character. – David Carlisle Dec 31 '19 at 14:53
  • @JohnWalker the command to get a normal left parenthesis is just ( – David Carlisle Dec 31 '19 at 14:53
  • @JohnWalker to clarify, bullet is not a keyword or option to \DeclareUnicodeCharacter, $\bullet$ may be used there simply because $\bullet$ is valid latex code that does not generate an error. Any latex code can be used. – David Carlisle Dec 31 '19 at 15:05
  • Thanks. I am getting a bit more familiar with this.

    I thought that DeclareUnicodeCharacter{FF08}{$\bullet$} was saying "replace FF08 with a character defined as $\bullet$. Hence I was wondering what all the other options were available.

    From your comment I, now, understand it means: "replace FF08 with a character defined by the output of the $\bullet$ command.".

    So, where is the $\bullet$ command defined? Do, I have to define it myself. If so where?

    Sorry for all these questions, as you can see, I have not found my bearings in the latex world.

    – John Walker Dec 31 '19 at 15:06
  • @JohnWalker no it means replace the character FF08 by the tex source $\bullet$ it is literally doing a string replacement of the first argument by the second. \bullet is defined in the latex format but you can use any latex code that you like there. but we just keep saying the same thing over and over, I am not sure how else to say this. – David Carlisle Dec 31 '19 at 16:42
  • Thanks. I did not realize that the syntax of the preamble was latex code. So, that was the key thing I was missing.

    I am coming at this from: ReadTheDocs the system is writing all my latex for me. So, I have not dabbled in this - complete newbie.

    – John Walker Dec 31 '19 at 17:21
  • I think this might be the kind of thing I was seeking: https://artofproblemsolving.com/wiki/index.php/LaTeX:Symbols

    and this: http://mirror.ox.ac.uk/sites/ctan.org/info/symbols/comprehensive/symbols-a4.pdf

    It appears to list the kinds of values I can use.

    – John Walker Dec 31 '19 at 17:29
  • @JohnWalker well you can use those but if you want to make U+FF09 into a section heading you can do \DeclareUnicodeCharacter{FF08}{\section{this is a section}} the lists that you reference have no connection with \DeclareUnicodeCharacter other than they list fragments of valid latex syntax. – David Carlisle Dec 31 '19 at 17:48
  • Great. Thanks for your patience. I have a much better understanding now. – John Walker Dec 31 '19 at 18:01