How to convert characters of type á ê õ ção to \'a \^e \~o \c{c} \~a automatically?
- 262,582
- 14,463
5 Answers
Here is a quick python script that does the trick, it handles both combining accents as well as pre-composed characters, however it takes only a string of text and some extra work is needed to handle complete TeX files:
#!/usr/bin/env python
import unicodedata
import sys
accents = {
0x0300: '`', 0x0301: "'", 0x0302: '^', 0x0308: '"',
0x030B: 'H', 0x0303: '~', 0x0327: 'c', 0x0328: 'k',
0x0304: '=', 0x0331: 'b', 0x0307: '.', 0x0323: 'd',
0x030A: 'r', 0x0306: 'u', 0x030C: 'v',
}
def uni2tex(text):
out = ""
txt = tuple(text)
i = 0
while i < len(txt):
char = text[i]
code = ord(char)
# combining marks
if unicodedata.category(char) in ("Mn", "Mc") and code in accents:
out += "\\%s{%s}" %(accents[code], txt[i+1])
i += 1
# precomposed characters
elif unicodedata.decomposition(char):
base, acc = unicodedata.decomposition(char).split()
acc = int(acc, 16)
base = int(base, 16)
if acc in accents:
out += "\\%s{%s}" %(accents[acc], unichr(base))
else:
out += char
else:
out += char
i += 1
return out
if __name__ == '__main__':
t = unicode(sys.argv[1], "utf-8")
print(uni2tex(t))
and invoked as:
$ python uni2tex.py "á ê õ ção ̆ a ă ̆a"
which outputs \'{a} \^{e} \~{o} \c{c}\~{a}o \u{ }a \u{a} \u{a}.
- 22,859
-
Characters like æ ø Æ Ø ð Ð are not translated by the script. – Finn Årup Nielsen Feb 16 '17 at 15:16
-
@FinnÅrupNielsen: You can always extend it, I don’t know the LaTeX csnames for these myself. – خالد حسني Feb 22 '17 at 14:14
You probably want to keep a version of your document with the unreplaced characters as it is much easier to read. If you use makefiles to process your documents, you might write something along these lines
#! -*- coding: utf-8 -*-
SHELL = /bin/sh
DOCUMENT = doc
$(DOCUMENT).pdf : $(DOCUMENT).tex
cp $(DOCUMENT).tex temp_$(DOCUMENT).tex
sed -i "s/é/\\\'{e}/g" temp_$(DOCUMENT).tex
sed -i 's/ç/\\c{c}/g' temp_$(DOCUMENT).tex
# more substitutions to add...
pdflatex temp_$(DOCUMENT).tex
cp temp_$(DOCUMENT).pdf $(DOCUMENT).pdf
Actually all my makefiles copy doc.tex to temp_doc.tex before doing anything. This way, I can easily clean up any machine generated files via rm -f temp*.
However, are you really sure you want to do this? Replacing these characters with their respective macros doesn't really give you any advantage. (At least none I could readily see.) But it comes at the cost of poor kerning. (See also section 2.2.6 of l2tabuen.)
Best
- 428
If you use emacs, use the iso- functions. In your case: iso-iso2tex. I posted more details in a previous answer: emacs-accented-letters
- 5,919
- 7
- 36
- 40
starting from the work of Hosny, I created a script that can convert all Unicode to LaTeX
https://github.com/mennucc/unicode2latex
It will convert accents, e.g. è → `{e} ; also multiple accents ṩ → \.{\d{s}} .
It will express fonts, e.g. ⅅ → \symbbit{D} .
It will convert math symbols, e.g. ∩ → \cap .
Note that to compile a file that uses a command such as \symbbit{D}, you need to use xelatex or lualatex with fontspec and unicode-math packages
- 153
Several tools exist. (Tools are collected from various answers from other users from this site. Credit: 1 2 3 )
Standalone command-line tools
- pandoc-unicode-math
- pylatexenc
- recode
- Several scripts floating on the Internet: 1, 3, 4, 5
Library
- LaTeX::Recode (Perl library)
- pylatexenc (Python library)
Editor plugins
- Sublime Text plugin: https://github.com/neilanderson/UnicodeTeX
- Vim plugin: https://github.com/joom/latex-unicoder.vim
- Vim plugin: https://github.com/Konfekt/vim-latexencode
- Emacs plugin: https://gist.github.com/kbauer/e8fee6514d124d5961f51fd7ba571bfd
recode usage instruction
Usually, this will already be installed on your computer.
To use it, do something like this for example:
$ echo á | recode UTF-8..LaTeX
\'a
However, as shown in the source code, only a very small number of characters is supported -- in particular, only characters in Latin1 encoding is supported.
- 7,143
\usepackage[utf8]{inputenc}you can keep them in your source. – Alan Munn Jul 18 '11 at 17:24\include{...},\includeonly{...}and/or\input{...}? If all of the latter hold, then the stream editorsedwould work. Otherwise a simple find-and-replace would suffice. – Werner Jul 18 '11 at 20:42