3

I think this should be something standard, but I'm having hard finding anything that works.

Given a plain string containing LaTeX accented characters, does anyone know of a python library which can convert the accents to UTF8 ?

I understand that one may put together a large dictionary by hand (a translation table/rule) and simply apply the translation rule on the string. But this is rather a tedious task, and before doing so, I'd much appreciate if someone can point out if there already is a working solution or not.

Hayk
  • 213
  • 1
    This might be of some use: Converting accents to UTF-8 characters. But even better, the answer to your question may be yes: https://pypi.org/project/pylatexenc/. If you get this working, please add an answer yourself. – Alan Munn Aug 26 '18 at 15:16
  • @AlanMunn, thanks for the suggestions! The stackoverflow thread is without any conclusion, some suggestions there but none is seems to resolve the problem; and the pylatexenc doesn't seem to convert accents to unicode. It's more for separating Latex markup form plain text (that I already wrote myself) as far as I understand. – Hayk Aug 27 '18 at 04:47
  • 1
    The recode commandline tool can convert in both directions between UTF-8 and old-style LaTeX encodings (as well as many others). There is a Python interface called python-recode (available via pypi). – Eric Marsden Sep 04 '18 at 20:33
  • @EricMarsden, thanks for the info, I'll check that out. After failing to find a solution on google (perhaps I was not diligent enough in my search), I just did that myself, a simple script doing only the conversion, nothing more. It's all just some straightforward string manipulations, based on simple regex, and some translation dictionary. The code is available online. – Hayk Sep 04 '18 at 20:41
  • I also found something in javascript. The dictionary is given as a simple json file, so it can be easily used with other languages. See https://github.com/paperhive/arxiv-latex-to-utf8 – gcousin Mar 23 '20 at 22:28

1 Answers1

4

Well, it does not feel particularly great to answer my own question, but I though to share the resolution here, as it might be potentially useful to others. A python script, which does the transformation is now available at this Github repository. It's self-contained, is (hopefully) easy to use and can be modified to add new features. The current version is a prototype, and is intended to be used in SciLag (possibly after some modifications), a community project for open problems in mathematics. So, please fell free to share your comments.

Hayk
  • 213
  • 1
    Hi. I happened to write code to do the opposite , convert unicode to latex ; and planning to add the latex to unicode part; I will see if I can integrate your and mine code, that is in https://github.com/mennucc/unicode2latex – am70 Jan 24 '22 at 16:10