inputenc Error: Unicode char \u8: not set up for use with LaTeX

Question

This is with ref to my previous question Package clash in multilingual report.

\documentclass[11pt,table,a4paper]{article}
\usepackage{lmodern}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}

\usepackage{CJKutf8}
\usepackage[english,russian]{babel}

\newenvironment{SChinese}{%
 \CJKfamily{gbsn}%
 \CJKtilde
 \CJKnospace}{}

 \begin{document}
 \selectlanguage{russian}
  Это мой первый многоязычный докладе.
  Инфантильный гипертрофический стеноз привратника - это серьёзное 
 \selectlanguage{english}
  This is my first multilingual report.

 \begin{CJK}{UTF8}{}
  \begin{SChinese}
    这是我的第一个多语种的报告。
  \end{SChinese}
  \end{CJK}

  \end{document}

when I try to compile it, I get following error message.

LaTeX Warning: Unused global option(s):
    [table].

(./data.aux
(/usr/local/texlive/2011/texmf-dist/tex/latex/cyrillic/t2acmr.fd))
(/usr/local/texlive/2011/texmf-dist/tex/latex/lm/t1lmr.fd)

LaTeX Font Warning: Font shape `T2A/lmr/m/n' undefined (Font)         
using `T2A/cmr/m/n' instead on input line 15.


! Package inputenc Error: Unicode char \u8:  not set up for use with
LaTeX.

See the inputenc package documentation for explanation. Type  H
<return>  for immediate help.  ...                                    

l.18 ...�ный гипертрофический стеноз привра...

How can I avoid such error message.

Try loading fontenc with \usepackage[T1,T2A]{fontenc}. The T2A option is needed for Russian letters. Also note that lmodern doesn't load any cyrillic characters, hence the name "Latin Modern" (if you didn't already know this). — user2473, Nov 20 '12 at 04:01
I used \usepackage[T1,T2A]{fontenc} still same error message. — Manish, Nov 20 '12 at 04:04
Your example compiles fine on my machine. I see you're using TeX Live 2011. Something seems to be acting funny with your fonts. I suggest upgrading to an updated version of TeX Live 2012. In the meantime, perhaps removing \usepackage{lmodern} might help? — user2473, Nov 20 '12 at 04:07
I installed Tex Live 2012 and remove lmodern message but i m still getting same error message. — Manish, Nov 20 '12 at 06:11
The problem seems to be in the "no-break space" character. Add \DeclareUnicodeCharacter{00A0}{~} to your preamble. — egreg, Nov 20 '12 at 07:36
Did you copy and paste something into your text editor!?!?!? \shame on you! — Dale, Jun 11 '13 at 01:02
possible duplicate of "inputenc Error: Unicode char \u8" error while trying to write a degree symbol (invisible character) — Antal Spector-Zabusky, Jul 21 '13 at 22:45
@AntalS-Z While the problem is similar, the space here is wanted. — egreg, Jul 21 '13 at 23:23
I got this error when copying and pasting from MS Word. The solution is to delete and retype the offending characters immediately prior to the error. — Marc, Nov 07 '17 at 14:36

egreg · Accepted Answer · 2017-07-04T11:32:09.637

101

The error you get is due to a "no-break space" character, according to what I can gather by copying an pasting your message.

This character is not usually set up by the [utf8] option and it's invisible to many editors, so it can slip in a document without the typist knowing it.

Solution: add in your preamble

\DeclareUnicodeCharacter{00A0}{ }

if you don't mean to type a no-break space, or

\DeclareUnicodeCharacter{00A0}{~}

if you want that the character stands for what its name says.

UPDATE

Recent (after 2015-01-01) versions of the UTF8 configuration file for inputenc do define U+00A0 as \nobreakspace, so this should be of no concern, now.

edited Jul 04 '17 at 11:32

answered Nov 20 '12 at 07:46

egreg

1,121,712

Sorry, I was wrong in my last comment. The fffe I got was put there by the command I used to see the encoding of the pasted text. Anyway, I cannot get the 00A0 either, so I hold my question. How did you came to it? – JLDiaz Mar 18 '13 at 11:36
1

@JLDiaz Copy paste the error message; the conversion of the char \u8: not bit gives the following sequence of Unicode points: 0063 0068 0061 0072 0020 005C 0075 0038 003A 00A0 0020 006E 006F 0074, where you clearly see 00A0. – egreg Mar 18 '13 at 11:36
@egreg Thanks. That's what I did, but I got 0020 instead. I guess that my browser was "too smart" when copying characters to the clipboard. I pasted then at several Unicode converters online, and also in a terminal and used xxd, but I got always 0020 for the 00A0 char. – JLDiaz Mar 18 '13 at 12:29
Does this solution also apply/work for LuaLaTeX ? – nutty about natty Jul 08 '13 at 20:45
3

@nuttyaboutnatty You can use \newunicodechar{^^a0}{~} after loading \usepackage{newunicodechar} – egreg Jul 08 '13 at 20:51
@nuttyaboutnatty What does babel have to do with this? – egreg Jul 08 '13 at 21:00
Thank you so much for this answer. I had already gone crazy trying to find where the error lied, then gone crazy once again when I saw that deleting the problematic part and writing it again, exactly like before solved the problem... Invisible characters! Who'd have thought it would be something like that...! – Bruno Stonek Oct 30 '14 at 14:41
@egreg Where did you find the unicode character code for LuaLaTeX ? My error message says : "! Package inputenc Error: Unicode char \u8:édi not set up for use with LaTeX." I deduced from precedent discussions (but I could be wrong) that the symbol "é" is troublesome for LuaLaTeX. Can you help me ? And sorry for using this very old post of you, but this is the top google hit for this error message, and others are not helpful... – Nicolas May 18 '16 at 16:38
1

@Nicolas You don't load inputenc with LuaLaTeX – egreg May 18 '16 at 16:52
@egreg It works! I do thank you for your help! – Nicolas May 18 '16 at 17:30
A life saver. Thanks! – stephanmg Apr 01 '20 at 09:59

Chris H · Answer 2 · 2016-05-10T14:24:53.523

71

As this is one of the top google hits for this error message, here's a more general answer with an example:

The cause is a unicode character in one of your input files that isn't mapped to an output. This may -- especially if you're using the (unicode-supporting) biblatex/biber system -- be in your bibliography. This is a good place to look for errors as .bib files downloaded from publishers website are often malformed. You can tell if the error comes from the bib file - the line number in the error message will be that of your \end{document}, which makes tracking down the actual error rather tricky (inspecting various aux files doesn't appear to help).

Some of these errors are subtle, like the non-breaking space in the question, or the hyphen (U+2010) character given to me by one journal, which looks identical to the hyphen-minus produced by the keyboard.

Copying the character after the hyphen and searching for it should help - unless your command window or editor "helps" by converting it to the more common equivalent or replacing unicode with blanks - in that case copy it from your .log and search all the input files.

(I'm happy to expand this in response to comments or watch it grow -- it's just an attempt to be helpful to searchers)

edited May 10 '16 at 14:24

answered Feb 12 '14 at 13:26

Chris H

8,705

1

I often mistakenly type this character in ViM on Mac OS X by hitting ALT+SHIFT+SPACE, often after writing {\some-command ...}. Thank you for the explanation! – csl Apr 14 '14 at 18:15
1

It's a good strategy. It's often needed to clean the output files after fixing the problem. – user2821 May 13 '15 at 12:49
That is one hell of a subtle error. I indeed had a hyphen (U+2010) instead of the hyphen-minus. Thanks. – Roald Dec 06 '16 at 16:31
How one may generally view what kind character is causing problems? In my case I somehow got U+FEFF 'ZERO WIDTH NO-BREAK SPACE' (hunted down by modyfying file in git repository and using git diff - but is there a more general solution)? – reducing activity Dec 22 '16 at 07:44
@MateuszKonieczny for symbols that are invisible or very similar to existing ones it's tricky. Assuming an editor that supports different encodings (I use jEdit, notepad++) if you don't otherwise use unicode, reloading (the log, tex, bib files) with a different (ASCII) encoding and looking for garbage can work. I have an idea for a script that might help - maybe I'll have time to fiddle with it later. – Chris H Dec 22 '16 at 08:02
This olution is not ideal for me, in Polish accented characters are normal... Maybe hex editor would work? But it is a purely blind guess. – reducing activity Dec 22 '16 at 09:29
@MateuszKonieczny, someone had solved this on SO for *nix. You can use grep to give you the line number and highlight the offending character (note on my system you get two red missing character symbols per error by default, as my terminal isn't set up for unicode). The downside is that if you use unicode for accented characters they will show up as well. I'm still thinking about blocks and code points – Chris H Dec 22 '16 at 09:29
@MateuszKonieczny I know. How much do you know about unicode? with Perl-compatible regular expressions in unicode mode it should be possible to define a range of characters that should work, but I don't know enough about Polish to help very much. I assume you're using \usepackage[UTF8]{inputenc} to get your accented chars -- I do even though I use very few – Chris H Dec 22 '16 at 09:32
Hopefully I will remember about it on the next trouble with unusual characters - I failed to notice that regexp would be helpful. "How much do you know about unicode?" - far more than I want :) But it is necessary given than in XXI century many programs/libraries still assume that text will be limited to ASCII (and many are completely unable to support characters outside ASCII). – reducing activity Dec 22 '16 at 09:45
@MateuszKonieczny here's something worth trying next time (an attempted whitelist): grep --colour='auto' -P -n "(*UTF8)[^\p{Latin}-,.:()?#/\$_\d\\\ \*\{\}\[\]~;]" SPI_photodiode.tex. I'm told XeLaTeX is better for unicode support, but I don't use it myself. In general LaTeX support for unicode is not bad, but certain rare characters and control characters cause it to choke. Spacing is a particular issue as you've seen. While this question has become the go-to for the error message, maybe there's a more generic mystery-unicode troubleshooting question waiting to be asked. – Chris H Dec 22 '16 at 09:52
Well, so far in all cases where LaTeX crashed on weird characters the weird character was anyway not supposed to be there. Main irritation for me is that in 2016 software still needs to be told to use utf8 (\usepackage[utf8]{inputenc}), but backward combatibility has its price. – reducing activity Dec 22 '16 at 10:03
1

Agreed, this is a general case as it could be many different unicode characters causing the issue. In a recent case it was &lowast; which interestingly is an HTML entity but for some reason needs to be declared as unicode &#8727. – jeffmcneill Oct 08 '19 at 13:20

score 32 · Answer 3 · edited Jan 21 '19 at 19:50

32

\usepackage[utf8x]{inputenc}

Ubuntu:

You must install texlive-latex-extra before use it.

Fedora:

You must install texlive-collection-latexextra before use it.

edited Jan 21 '19 at 19:50

Jeeppler

103

answered Mar 18 '13 at 10:34

Evgenii

428

1

Welcome to TeX.sx!. May be you can add which OS ? Ubuntu ? how to install ? – texenthusiast Mar 18 '13 at 10:36
6

See http://tex.stackexchange.com/q/13067/15925 for discussion of utf8x. – Andrew Swann Mar 18 '13 at 11:05
Yes, it's for Ubuntu. – Evgenii Mar 18 '13 at 18:23
2

Now I get "Unknown Unicode character" – Geremia Apr 11 '14 at 01:59
@Geremia, please show string with this error – Evgenii Apr 11 '14 at 05:59
@Evgeny: ſ = U+017F = LATIN SMALL LETTER LONG S. Utf8x recognizes what it is, but it prompts me to input a glyph for it. Am I having a font issue? – Geremia Apr 11 '14 at 17:48
Thanks. I was using pandoc to convert markdown to pdf (which uses latex) and my convertion was crashing due to a → character. Installing the texlive-latex-extra package solved the problem. – alexg Mar 10 '16 at 07:57
This is the only right answer :) – fedvasu May 10 '20 at 05:35
@fedvasu: even if this is the only right answer, I still have problems with Emojis. Now what? – Thomas Weller Aug 18 '21 at 10:56

score 4 · Answer 4 · edited May 16 '16 at 19:59

I had two similar problems:

"Unicode char \u8: "
"Unicode char \u8:." (with dot)

The problems were related to the .bib file (references list).

The first problem was solved based on the \DeclareUnicodeCharacter{00A0}{ } stated by @egreg.

The second one was helped by the @Chris H 's answer. I opened the generated file .log and looked for errors. I found:

! Package inputenc Error: Unicode char \u8:\C3. not set up for use with LaTeX.

Then, I looked the "\C3" string in the generated file .bbl and I found out that the letter "Ó" (the first letter of an author's name, Óscar Oballe-Peinado) was the problem. So, I changed it in the bibliography file for {\'{O}} and voilà!

Despite I'm using "\usepackage[utf8]{inputenc}", it seems not working specifically with the accented first letter of the first name of authors.

The name "Oscar" in Spanish doesn't carry an accent on the first syllable. — user26732, Jun 19 '16 at 18:17
Ok, but that author really presents an accent:
https://www.researchgate.net/profile/Oscar_Oballe-Peinado

http://www.mdpi.com/1424-8220/15/8/20409 — Fernando, Aug 08 '16 at 08:07

score 3 · Answer 5 · answered Nov 23 '14 at 20:02

3

You may get this error also if you use different language for bibtex. In that case project.bbl may contain characters in different encoding (e.g latin2).

What you need to do is swap encoding when rendering bibliography to latin2 and switch back to utf8 after.

\inputencoding{latin2}
\bibliography{mybib}
\inputencoding{utf8}

Hope this helps.

answered Nov 23 '14 at 20:02

zub0r

131

Welcome to TeX.SX! This is an answer to an (old) question. You should elaborate a little bit on your post and show a fully working example – Nov 23 '14 at 20:06
I installed texlive-lang-czechslovak package to use slovak translations of entries in bibtex, but was getting errors while compiling project in utf8. I found out that it is caused by different encoding of .bbl file (evidently czechslovak package uses latin2 for .bbl files by default, bc. before using it I had no problems). – zub0r Nov 25 '14 at 00:11

score 3 · Answer 6 · edited Feb 14 '15 at 01:11

3

This happened to me when I did save my .tex file with utf8 but forgot to save also the .bib file with the same encoding (it was still in ANSI).

Instead of returning back to ANSI on my .tex file I just opened the .bib file with Notepad++ and chose to convert to utf8.

Then after compiling everything was working OK.

edited Feb 14 '15 at 01:11

Werner

603,163

answered Feb 14 '15 at 00:42

David

31

score 3 · Answer 7 · answered Feb 04 '16 at 22:29

I have found the same problem but none of the above answers solved it. In the end, I found the code \'{\i} in my .bib file. This was supposed to yield í but was producing a crazy unicode char that broke compilation. This .bib file was exported from CiteULike based on a reference that I entered mannually or copy-&-pasted from somewhere else. I suppose something wrong happened while converting a mannually entered/pasted í to \'{i}.

score 2 · Answer 8 · answered Jul 06 '15 at 11:38

2

If this happens in the bibliography, try specifying the language explicitly to bibtextu:

bibtextu -l ru my_paper_with_russian_bibliography

This fixed it for me.

answered Jul 06 '15 at 11:38

Igor

21

Tobi G. · Answer 9 · 2015-10-19T16:48:47.027

2

I had this Error because I accidentally saved an included .tex file as ANSI while the master file was in UTF-8.

You can change file encoding in Notepad++ for example. But you will need to copy it from the the ANSI version and paste it into the UTF-8 version.

edited Oct 19 '15 at 16:48

answered Oct 19 '15 at 15:44

Tobi G.

151

Skippy le Grand Gourou · Answer 10 · 2017-07-04T10:47:59.653

2

XeTeX is more suitable than most other TeX engines for unicode : replace the lines

\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}

with

\usepackage{fontspec}

and compile with xelatex myfile.tex.

edited Jul 04 '17 at 10:47

answered Jul 04 '17 at 10:37

Skippy le Grand Gourou

655
7
17

1

You mean “XeTeX is more suitable than pdfTeX”, right? Because XeLaTeX is still LaTeX… – cgnieder Jul 04 '17 at 10:39
@clemens Thanks for the correction. My bad, I had latex in mind. pdfTeX would be too narrow though. I hope the current formulation is better. – Skippy le Grand Gourou Jul 04 '17 at 10:48

score 1 · Answer 11 · answered Jun 03 '16 at 17:27

1

Inserting

% !TEX encoding = UTF-8 Unicode

at the beginning of the file solved this issue for me.

answered Jun 03 '16 at 17:27

user27221

121

didn't work for my case, where I wanted to render an unicode accent character from this author's name Rui M´ximo – Micah Stubbs Sep 17 '19 at 19:48

Fuhrmanator · Answer 12 · 2020-07-30T17:13:10.000

My error was related, but slightly different:

! Package inputenc Error: Unicode character ╩│ (U+02B3)

or

! Package inputenc Error: Unicode character ß╡ë (U+1D49)  
(inputenc)                not set up for use with LaTeX.

I had these errors because of a bibliographic entries generated in French babel which typesets the edition field, e.g. 1ʳᵉ éd. or 3ᵉ éd. using a raised re or e, resulting in U+02B3 and U+1D49 which apparently are not valid, even though (so far) all the other accented characters, e.g., é in édition are valid...

Because of my complex pandoc settings from Markdown with Libertine font, I don't want to swap to another *Tex engine...

So, my band-aid fix was to use the math mode to raise these letters:

\DeclareUnicodeCharacter{1D49}{$^\text{e}$}
\DeclareUnicodeCharacter{02B3}{$^\text{r}$}

Also, if you're using pandoc, put them in your header-includes: part of the front matter.

score 0 · Answer 13 · answered Jan 22 '21 at 12:31

An anwser for a slightly different case:

! Package inputenc Error: Unicode character ° (U+B0)
(inputenc)                not set up for use with LaTeX.

Here, the answer is very simple:

\usepackage{textcomp}

This packages defines the \textdegree macro and also sets up ° to use it.

inputenc Error: Unicode char \u8: not set up for use with LaTeX

13 Answers13

UPDATE

Linked

Related