10

How to convert characters of type á ê õ ção to \'a \^e \~o \c{c} \~a automatically?

Martin Scharrer
  • 262,582
Regis Santos
  • 14,463

5 Answers5

8

Here is a quick python script that does the trick, it handles both combining accents as well as pre-composed characters, however it takes only a string of text and some extra work is needed to handle complete TeX files:

#!/usr/bin/env python
import unicodedata
import sys

accents = {
    0x0300: '`', 0x0301: "'", 0x0302: '^', 0x0308: '"',
    0x030B: 'H', 0x0303: '~', 0x0327: 'c', 0x0328: 'k',
    0x0304: '=', 0x0331: 'b', 0x0307: '.', 0x0323: 'd',
    0x030A: 'r', 0x0306: 'u', 0x030C: 'v',
}

def uni2tex(text):
    out = ""
    txt = tuple(text)
    i = 0
    while i < len(txt):
        char = text[i]
        code = ord(char)

        # combining marks
        if unicodedata.category(char) in ("Mn", "Mc") and code in accents:
            out += "\\%s{%s}" %(accents[code], txt[i+1])
            i += 1
        # precomposed characters
        elif unicodedata.decomposition(char):
            base, acc = unicodedata.decomposition(char).split()
            acc = int(acc, 16)
            base = int(base, 16)
            if acc in accents:
                out += "\\%s{%s}" %(accents[acc], unichr(base))
            else:
                out += char
        else:
            out += char

        i += 1

    return out

if __name__ == '__main__':
    t = unicode(sys.argv[1], "utf-8")
    print(uni2tex(t))

and invoked as:

$ python uni2tex.py "á ê õ ção ̆ a ă ̆a"

which outputs \'{a} \^{e} \~{o} \c{c}\~{a}o \u{ }a \u{a} \u{a}.

3

You probably want to keep a version of your document with the unreplaced characters as it is much easier to read. If you use makefiles to process your documents, you might write something along these lines

#! -*- coding: utf-8 -*-

SHELL = /bin/sh
DOCUMENT = doc

$(DOCUMENT).pdf : $(DOCUMENT).tex
    cp $(DOCUMENT).tex temp_$(DOCUMENT).tex
    sed -i "s/é/\\\'{e}/g" temp_$(DOCUMENT).tex
    sed -i 's/ç/\\c{c}/g' temp_$(DOCUMENT).tex
    # more substitutions to add...
    pdflatex temp_$(DOCUMENT).tex
    cp temp_$(DOCUMENT).pdf $(DOCUMENT).pdf

Actually all my makefiles copy doc.tex to temp_doc.tex before doing anything. This way, I can easily clean up any machine generated files via rm -f temp*.

However, are you really sure you want to do this? Replacing these characters with their respective macros doesn't really give you any advantage. (At least none I could readily see.) But it comes at the cost of poor kerning. (See also section 2.2.6 of l2tabuen.)

Best

3

If you use emacs, use the iso- functions. In your case: iso-iso2tex. I posted more details in a previous answer: emacs-accented-letters

YuppieNetworking
  • 5,919
  • 7
  • 36
  • 40
1

starting from the work of Hosny, I created a script that can convert all Unicode to LaTeX

https://github.com/mennucc/unicode2latex

It will convert accents, e.g. è → `{e} ; also multiple accents ṩ → \.{\d{s}} .

It will express fonts, e.g. ⅅ → \symbbit{D} .

It will convert math symbols, e.g. ∩ → \cap .

Note that to compile a file that uses a command such as \symbbit{D}, you need to use xelatex or lualatex with fontspec and unicode-math packages

am70
  • 153
0

Several tools exist. (Tools are collected from various answers from other users from this site. Credit: 1 2 3 )

Standalone command-line tools

Library

Editor plugins

recode usage instruction

Usually, this will already be installed on your computer.

To use it, do something like this for example:

$ echo á | recode UTF-8..LaTeX
\'a

However, as shown in the source code, only a very small number of characters is supported -- in particular, only characters in Latin1 encoding is supported.

user202729
  • 7,143