I encounter a weird issue when I tried to convert a markdown file to pdf using pandoc. My markdown file contains Chinese characters and English characters. The command I use is:
pandoc --pdf-engine=xelatex -V CJKmainfont=KaiTi test.md -o test.pdf
The error message is:
Error producing PDF.
! Undefined control sequence.
pandoc: Cannot decode byte '\xbd': Data.Text.Internal.Encoding.streamDecodeUtf8With: Invalid UTF-8 stream
In fact, the error has nothing to do with UTF-8 encoding. After long hours of wrestling with the problem. Finally I find that it is because my markdown file contains backslashes followed by text, which are taken as LaTeX command by pandoc in default settings. After knowing this critical info, I was able to finally fix this problem. More information can be found in this pandoc issue .
Someone suggest in that issue this may be a problem with xelatex, because if we use
pandoc --pdf-engine=lualatex test.md -o test.pdf
The error message becomes something like the this:
Error producing PDF.
! Undefined control sequence. l.416
...宽度有问题,应该把\textwidth换成
If the error message from using xelatex engine is similar to above message. I would have solved this problem long long ago. So it appears to me that the error message may indeed be related to xelatex.
But, but, if we separate the pdf-generating step into two steps, i.e., first generate tex file, then generate pdf file from tex. Something like the following code:
pandoc -s -t latex -V CJKmainfont=KaiTi test.md -o test.tex # first step
xelatex test.tex # second step
Then the error message will change and be just like when we use lualatex engine. This suggests that the problem may not be related to xelatex. We get contradictory conclusions.
I am new to pandoc and do not know any internals of xelatex. Can anyone more knowledgeable point out which is causing the problem here. Pandoc or xelatex or both?
system and pandoc version info
I have tested the file on both Windows and Linux system (CentOS 7). The exact version of system, pandoc, TeX Live and xelatex is list below.
Windows
- system version: Windows 8.1 32bit
- Pandoc version: 2.0.5
- TeX Live: 2016/W32TeX
- xelatex: XeTeX 3.14159265-2.6-0.99996
Linux
- system version: CentOS 7.2.1511
- Pandoc version: 1.12.3.1
- TeX Live: 2017
- xelatex: 3.14159265-2.6-0.99998
update 2017.12.29
With the release of Pandoc 2.0.6, this behaviour is handled more properly:
Allow lenient decoding of latex error logs, which are not always properly UTF8-encoded
Now, it is easier to debug this kind of issues.
chcp 65001. Beside this: the first line does show the begin of the correct error message (! Undefined control sequence.) – Ulrike Fischer Dec 27 '17 at 15:21