How to convert characters to LaTeX code?

Question

How to convert characters of type á ê õ ção to \'a \^e \~o \c{c} \~a automatically?

Why do you need to do this? If you save your source file as UTF-8, and use \usepackage[utf8]{inputenc} you can keep them in your source. — Alan Munn, Jul 18 '11 at 17:24
Also, the answer likely depends on how you'd like to do it. In your editor? In LaTeX? In a preprocessing step? — You, Jul 18 '11 at 17:31
Alan Munn, i use [utf8]{inputenc}, but is possible convert directly in edit file.tex? — Regis Santos, Jul 18 '11 at 17:39
if you use an editor that has automatic spellchecking rules, you can probably define new rules that would do that. The same way i is often automatically changed to I. But a detailed answer is highly dependent on the editor you're using. — Lexiel, Jul 18 '11 at 18:02
These links might be useful: Python - Replace all accented characters by their LaTeX equivalent and UTF-8 support via inputenc. — Paulo Cereda, Jul 18 '11 at 18:36
I'm tempted to reply to your comment with a solution using TeXShop for the Mac, but that might be obnoxious... If you want to do this in your editor, you really need to tell us which one. — Alan Munn, Jul 18 '11 at 18:39
I would tend to agree with @Alan and @You's first comments and add that you haven't described the extent of the replacement. Are you only referring to the characters in your post, or a more general setting of 'all accented' characters? Also are these accented characters contained in a single .tex file, or possibly dispersed among several that you use via \include{...}, \includeonly{...} and/or \input{...}? If all of the latter hold, then the stream editor sed would work. Otherwise a simple find-and-replace would suffice. — Werner, Jul 18 '11 at 20:42
Reverse direction: unicode - Convert LaTex accented characters to UTF8 equivalent with Python - TeX - LaTeX Stack Exchange — user202729, Dec 04 '23 at 04:02

score 8 · Answer 1 · answered Jul 18 '11 at 21:44

Here is a quick python script that does the trick, it handles both combining accents as well as pre-composed characters, however it takes only a string of text and some extra work is needed to handle complete TeX files:

#!/usr/bin/env python
import unicodedata
import sys

accents = {
    0x0300: '`', 0x0301: "'", 0x0302: '^', 0x0308: '"',
    0x030B: 'H', 0x0303: '~', 0x0327: 'c', 0x0328: 'k',
    0x0304: '=', 0x0331: 'b', 0x0307: '.', 0x0323: 'd',
    0x030A: 'r', 0x0306: 'u', 0x030C: 'v',
}

def uni2tex(text):
    out = ""
    txt = tuple(text)
    i = 0
    while i < len(txt):
        char = text[i]
        code = ord(char)

        # combining marks
        if unicodedata.category(char) in ("Mn", "Mc") and code in accents:
            out += "\\%s{%s}" %(accents[code], txt[i+1])
            i += 1
        # precomposed characters
        elif unicodedata.decomposition(char):
            base, acc = unicodedata.decomposition(char).split()
            acc = int(acc, 16)
            base = int(base, 16)
            if acc in accents:
                out += "\\%s{%s}" %(accents[acc], unichr(base))
            else:
                out += char
        else:
            out += char

        i += 1

    return out

if __name__ == '__main__':
    t = unicode(sys.argv[1], "utf-8")
    print(uni2tex(t))

and invoked as:

$ python uni2tex.py "á ê õ ção ̆ a ă ̆a"

which outputs \'{a} \^{e} \~{o} \c{c}\~{a}o \u{ }a \u{a} \u{a}.

Characters like æ ø Æ Ø ð Ð are not translated by the script. — Finn Årup Nielsen, Feb 16 '17 at 15:16
@FinnÅrupNielsen: You can always extend it, I don’t know the LaTeX csnames for these myself. — خالد حسني, Feb 22 '17 at 14:14

score 3 · Answer 2 · answered Jul 18 '11 at 20:02

You probably want to keep a version of your document with the unreplaced characters as it is much easier to read. If you use makefiles to process your documents, you might write something along these lines

#! -*- coding: utf-8 -*-

SHELL = /bin/sh
DOCUMENT = doc

$(DOCUMENT).pdf : $(DOCUMENT).tex
    cp $(DOCUMENT).tex temp_$(DOCUMENT).tex
    sed -i "s/é/\\\'{e}/g" temp_$(DOCUMENT).tex
    sed -i 's/ç/\\c{c}/g' temp_$(DOCUMENT).tex
    # more substitutions to add...
    pdflatex temp_$(DOCUMENT).tex
    cp temp_$(DOCUMENT).pdf $(DOCUMENT).pdf

Actually all my makefiles copy doc.tex to temp_doc.tex before doing anything. This way, I can easily clean up any machine generated files via rm -f temp*.

However, are you really sure you want to do this? Replacing these characters with their respective macros doesn't really give you any advantage. (At least none I could readily see.) But it comes at the cost of poor kerning. (See also section 2.2.6 of l2tabuen.)

Best

score 3 · Answer 3 · edited Apr 13 '17 at 12:35

3

If you use emacs, use the iso- functions. In your case: iso-iso2tex. I posted more details in a previous answer: emacs-accented-letters

edited Apr 13 '17 at 12:35

Community

1

answered Jul 19 '11 at 13:39

YuppieNetworking

5,919
7
36
40

score 1 · Answer 4 · answered Nov 22 '21 at 08:45

starting from the work of Hosny, I created a script that can convert all Unicode to LaTeX

https://github.com/mennucc/unicode2latex

It will convert accents, e.g. è → `{e} ; also multiple accents ṩ → \.{\d{s}} .

It will express fonts, e.g. ⅅ → \symbbit{D} .

It will convert math symbols, e.g. ∩ → \cap .

Note that to compile a file that uses a command such as \symbbit{D}, you need to use xelatex or lualatex with fontspec and unicode-math packages

user202729 · Answer 5 · 2023-12-04T04:16:39.277

Several tools exist. (Tools are collected from various answers from other users from this site. Credit: 1 2 3 )

Standalone command-line tools

pandoc-unicode-math
pylatexenc
recode
Several scripts floating on the Internet: 1, 3, 4, 5

Library

LaTeX::Recode (Perl library)
pylatexenc (Python library)

Editor plugins

Sublime Text plugin: https://github.com/neilanderson/UnicodeTeX
Vim plugin: https://github.com/joom/latex-unicoder.vim
Vim plugin: https://github.com/Konfekt/vim-latexencode
Emacs plugin: https://gist.github.com/kbauer/e8fee6514d124d5961f51fd7ba571bfd

`recode` usage instruction

Usually, this will already be installed on your computer.

To use it, do something like this for example:

$ echo á | recode UTF-8..LaTeX
\'a

However, as shown in the source code, only a very small number of characters is supported -- in particular, only characters in Latin1 encoding is supported.

How to convert characters to LaTeX code?

5 Answers5

Standalone command-line tools

Library

Editor plugins

`recode` usage instruction

Linked

Related

How to convert characters to LaTeX code?

5 Answers5

Standalone command-line tools

Library

Editor plugins

recode usage instruction

Linked

Related

`recode` usage instruction