137

This is with ref to my previous question Package clash in multilingual report.

\documentclass[11pt,table,a4paper]{article}
\usepackage{lmodern}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}

\usepackage{CJKutf8}
\usepackage[english,russian]{babel}

\newenvironment{SChinese}{%
 \CJKfamily{gbsn}%
 \CJKtilde
 \CJKnospace}{}

 \begin{document}
 \selectlanguage{russian}
  Это мой первый многоязычный докладе.
  Инфантильный гипертрофический стеноз привратника - это серьёзное 
 \selectlanguage{english}
  This is my first multilingual report.

 \begin{CJK}{UTF8}{}
  \begin{SChinese}
    这是我的第一个多语种的报告。
  \end{SChinese}
  \end{CJK}

  \end{document}

when I try to compile it, I get following error message.

LaTeX Warning: Unused global option(s):
    [table].

(./data.aux
(/usr/local/texlive/2011/texmf-dist/tex/latex/cyrillic/t2acmr.fd))
(/usr/local/texlive/2011/texmf-dist/tex/latex/lm/t1lmr.fd)

LaTeX Font Warning: Font shape `T2A/lmr/m/n' undefined (Font)         
using `T2A/cmr/m/n' instead on input line 15.


! Package inputenc Error: Unicode char \u8:  not set up for use with
LaTeX.

See the inputenc package documentation for explanation. Type  H
<return>  for immediate help.  ...                                    

l.18 ...�ный гипертрофический стеноз привра...

How can I avoid such error message.

Manish
  • 1,571

13 Answers13

101

The error you get is due to a "no-break space" character, according to what I can gather by copying an pasting your message.

This character is not usually set up by the [utf8] option and it's invisible to many editors, so it can slip in a document without the typist knowing it.

Solution: add in your preamble

\DeclareUnicodeCharacter{00A0}{ }

if you don't mean to type a no-break space, or

\DeclareUnicodeCharacter{00A0}{~}

if you want that the character stands for what its name says.

UPDATE

Recent (after 2015-01-01) versions of the UTF8 configuration file for inputenc do define U+00A0 as \nobreakspace, so this should be of no concern, now.

egreg
  • 1,121,712
  • Sorry, I was wrong in my last comment. The fffe I got was put there by the command I used to see the encoding of the pasted text. Anyway, I cannot get the 00A0 either, so I hold my question. How did you came to it? – JLDiaz Mar 18 '13 at 11:36
  • 1
    @JLDiaz Copy paste the error message; the conversion of the char \u8:  not bit gives the following sequence of Unicode points: 0063 0068 0061 0072 0020 005C 0075 0038 003A 00A0 0020 006E 006F 0074, where you clearly see 00A0. – egreg Mar 18 '13 at 11:36
  • @egreg Thanks. That's what I did, but I got 0020 instead. I guess that my browser was "too smart" when copying characters to the clipboard. I pasted then at several Unicode converters online, and also in a terminal and used xxd, but I got always 0020 for the 00A0 char. – JLDiaz Mar 18 '13 at 12:29
  • Does this solution also apply/work for LuaLaTeX ? – nutty about natty Jul 08 '13 at 20:45
  • 3
    @nuttyaboutnatty You can use \newunicodechar{^^a0}{~} after loading \usepackage{newunicodechar} – egreg Jul 08 '13 at 20:51
  • @nuttyaboutnatty What does babel have to do with this? – egreg Jul 08 '13 at 21:00
  • Thank you so much for this answer. I had already gone crazy trying to find where the error lied, then gone crazy once again when I saw that deleting the problematic part and writing it again, exactly like before solved the problem... Invisible characters! Who'd have thought it would be something like that...! – Bruno Stonek Oct 30 '14 at 14:41
  • @egreg Where did you find the unicode character code for LuaLaTeX ? My error message says : "! Package inputenc Error: Unicode char \u8:édi not set up for use with LaTeX." I deduced from precedent discussions (but I could be wrong) that the symbol "é" is troublesome for LuaLaTeX. Can you help me ? And sorry for using this very old post of you, but this is the top google hit for this error message, and others are not helpful... – Nicolas May 18 '16 at 16:38
  • 1
    @Nicolas You don't load inputenc with LuaLaTeX – egreg May 18 '16 at 16:52
  • @egreg It works! I do thank you for your help! – Nicolas May 18 '16 at 17:30
  • A life saver. Thanks! – stephanmg Apr 01 '20 at 09:59
71

As this is one of the top google hits for this error message, here's a more general answer with an example:

The cause is a unicode character in one of your input files that isn't mapped to an output. This may -- especially if you're using the (unicode-supporting) biblatex/biber system -- be in your bibliography. This is a good place to look for errors as .bib files downloaded from publishers website are often malformed. You can tell if the error comes from the bib file - the line number in the error message will be that of your \end{document}, which makes tracking down the actual error rather tricky (inspecting various aux files doesn't appear to help).

Some of these errors are subtle, like the non-breaking space in the question, or the hyphen (U+2010) character given to me by one journal, which looks identical to the hyphen-minus produced by the keyboard.

Copying the character after the hyphen and searching for it should help - unless your command window or editor "helps" by converting it to the more common equivalent or replacing unicode with blanks - in that case copy it from your .log and search all the input files.

(I'm happy to expand this in response to comments or watch it grow -- it's just an attempt to be helpful to searchers)

Chris H
  • 8,705
  • 1
    I often mistakenly type this character in ViM on Mac OS X by hitting ALT+SHIFT+SPACE, often after writing {\some-command ...}. Thank you for the explanation! – csl Apr 14 '14 at 18:15
  • 1
    It's a good strategy. It's often needed to clean the output files after fixing the problem. – user2821 May 13 '15 at 12:49
  • That is one hell of a subtle error. I indeed had a hyphen (U+2010) instead of the hyphen-minus. Thanks. – Roald Dec 06 '16 at 16:31
  • How one may generally view what kind character is causing problems? In my case I somehow got U+FEFF 'ZERO WIDTH NO-BREAK SPACE' (hunted down by modyfying file in git repository and using git diff - but is there a more general solution)? – reducing activity Dec 22 '16 at 07:44
  • @MateuszKonieczny for symbols that are invisible or very similar to existing ones it's tricky. Assuming an editor that supports different encodings (I use jEdit, notepad++) if you don't otherwise use unicode, reloading (the log, tex, bib files) with a different (ASCII) encoding and looking for garbage can work. I have an idea for a script that might help - maybe I'll have time to fiddle with it later. – Chris H Dec 22 '16 at 08:02
  • This olution is not ideal for me, in Polish accented characters are normal... Maybe hex editor would work? But it is a purely blind guess. – reducing activity Dec 22 '16 at 09:29
  • @MateuszKonieczny, someone had solved this on SO for *nix. You can use grep to give you the line number and highlight the offending character (note on my system you get two red missing character symbols per error by default, as my terminal isn't set up for unicode). The downside is that if you use unicode for accented characters they will show up as well. I'm still thinking about blocks and code points – Chris H Dec 22 '16 at 09:29
  • @MateuszKonieczny I know. How much do you know about unicode? with Perl-compatible regular expressions in unicode mode it should be possible to define a range of characters that should work, but I don't know enough about Polish to help very much. I assume you're using \usepackage[UTF8]{inputenc} to get your accented chars -- I do even though I use very few – Chris H Dec 22 '16 at 09:32
  • Hopefully I will remember about it on the next trouble with unusual characters - I failed to notice that regexp would be helpful. "How much do you know about unicode?" - far more than I want :) But it is necessary given than in XXI century many programs/libraries still assume that text will be limited to ASCII (and many are completely unable to support characters outside ASCII). – reducing activity Dec 22 '16 at 09:45
  • @MateuszKonieczny here's something worth trying next time (an attempted whitelist): grep --colour='auto' -P -n "(*UTF8)[^\p{Latin}-,.:()?#/\$_\d\\\ \*\{\}\[\]~;]" SPI_photodiode.tex. I'm told XeLaTeX is better for unicode support, but I don't use it myself. In general LaTeX support for unicode is not bad, but certain rare characters and control characters cause it to choke. Spacing is a particular issue as you've seen. While this question has become the go-to for the error message, maybe there's a more generic mystery-unicode troubleshooting question waiting to be asked. – Chris H Dec 22 '16 at 09:52
  • Well, so far in all cases where LaTeX crashed on weird characters the weird character was anyway not supposed to be there. Main irritation for me is that in 2016 software still needs to be told to use utf8 (\usepackage[utf8]{inputenc}), but backward combatibility has its price. – reducing activity Dec 22 '16 at 10:03
  • 1
    Agreed, this is a general case as it could be many different unicode characters causing the issue. In a recent case it was &lowast; which interestingly is an HTML entity but for some reason needs to be declared as unicode &#8727. – jeffmcneill Oct 08 '19 at 13:20
32

\usepackage[utf8x]{inputenc}

Ubuntu:

You must install texlive-latex-extra before use it.

Fedora:

You must install texlive-collection-latexextra before use it.

Jeeppler
  • 103
Evgenii
  • 428
4

I had two similar problems:

  1. "Unicode char \u8: "
  2. "Unicode char \u8:." (with dot)

The problems were related to the .bib file (references list).

The first problem was solved based on the \DeclareUnicodeCharacter{00A0}{ } stated by @egreg.

The second one was helped by the @Chris H 's answer. I opened the generated file .log and looked for errors. I found:

! Package inputenc Error: Unicode char \u8:\C3. not set up for use with LaTeX.

Then, I looked the "\C3" string in the generated file .bbl and I found out that the letter "Ó" (the first letter of an author's name, Óscar Oballe-Peinado) was the problem. So, I changed it in the bibliography file for {\'{O}} and voilà!

Despite I'm using "\usepackage[utf8]{inputenc}", it seems not working specifically with the accented first letter of the first name of authors.

cgnieder
  • 66,645
  • The name "Oscar" in Spanish doesn't carry an accent on the first syllable. – user26732 Jun 19 '16 at 18:17
  • 2
    Ok, but that author really presents an accent:

    https://www.researchgate.net/profile/Oscar_Oballe-Peinado

    http://www.mdpi.com/1424-8220/15/8/20409

    – Fernando Aug 08 '16 at 08:07
3

You may get this error also if you use different language for bibtex. In that case project.bbl may contain characters in different encoding (e.g latin2).

What you need to do is swap encoding when rendering bibliography to latin2 and switch back to utf8 after.

\inputencoding{latin2}
\bibliography{mybib}
\inputencoding{utf8}

Hope this helps.

zub0r
  • 131
  • Welcome to TeX.SX! This is an answer to an (old) question. You should elaborate a little bit on your post and show a fully working example –  Nov 23 '14 at 20:06
  • I installed texlive-lang-czechslovak package to use slovak translations of entries in bibtex, but was getting errors while compiling project in utf8. I found out that it is caused by different encoding of .bbl file (evidently czechslovak package uses latin2 for .bbl files by default, bc. before using it I had no problems). – zub0r Nov 25 '14 at 00:11
3

This happened to me when I did save my .tex file with utf8 but forgot to save also the .bib file with the same encoding (it was still in ANSI).

Instead of returning back to ANSI on my .tex file I just opened the .bib file with Notepad++ and chose to convert to utf8.

Then after compiling everything was working OK.

Werner
  • 603,163
David
  • 31
3

I have found the same problem but none of the above answers solved it. In the end, I found the code \'{\i} in my .bib file. This was supposed to yield í but was producing a crazy unicode char that broke compilation. This .bib file was exported from CiteULike based on a reference that I entered mannually or copy-&-pasted from somewhere else. I suppose something wrong happened while converting a mannually entered/pasted í to \'{i}.

M.B.
  • 31
  • 3
2

If this happens in the bibliography, try specifying the language explicitly to bibtextu:

bibtextu -l ru my_paper_with_russian_bibliography

This fixed it for me.

Igor
  • 21
2

I had this Error because I accidentally saved an included .tex file as ANSI while the master file was in UTF-8.

You can change file encoding in Notepad++ for example. But you will need to copy it from the the ANSI version and paste it into the UTF-8 version.

Tobi G.
  • 151
2

XeTeX is more suitable than most other TeX engines for unicode : replace the lines

\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}

with

\usepackage{fontspec}

and compile with xelatex myfile.tex.

1

Inserting

% !TEX encoding = UTF-8 Unicode

at the beginning of the file solved this issue for me.

user27221
  • 121
1

My error was related, but slightly different:

! Package inputenc Error: Unicode character ╩│ (U+02B3)

or

! Package inputenc Error: Unicode character ᵉ (U+1D49)  
(inputenc)                not set up for use with LaTeX.

I had these errors because of a bibliographic entries generated in French babel which typesets the edition field, e.g. 1ʳᵉ éd. or 3ᵉ éd. using a raised re or e, resulting in U+02B3 and U+1D49 which apparently are not valid, even though (so far) all the other accented characters, e.g., é in édition are valid...

Because of my complex pandoc settings from Markdown with Libertine font, I don't want to swap to another *Tex engine...

So, my band-aid fix was to use the math mode to raise these letters:

\DeclareUnicodeCharacter{1D49}{$^\text{e}$}
\DeclareUnicodeCharacter{02B3}{$^\text{r}$}

Also, if you're using pandoc, put them in your header-includes: part of the front matter.

0

An anwser for a slightly different case:

! Package inputenc Error: Unicode character ° (U+B0)
(inputenc)                not set up for use with LaTeX.

Here, the answer is very simple:

\usepackage{textcomp}

This packages defines the \textdegree macro and also sets up ° to use it.

Vincent Fourmond
  • 486
  • 2
  • 11