1

The Encyclopedia of Mathematics is based on an encyclopedia that had its formulae written in TeX, which were saved as png images. On its way through the various publishers the original TeX source code was lost. Thus, the encyclopedia now contains a bunch of png images with computer generated TeX formulas. Example:

https://www.encyclopediaofmath.org/legacyimages/a/a130/a130040/a130040761.png.

I thus want to transform all these images (about 150,000 of them) back to TeX code. I have found the Mathpix API, however they are not open source. Do you know of any open source alternatives?

  • 1
    Hi. Why do you want to convert all of them to TeX codes? –  Apr 06 '19 at 14:28
  • So you can easily edit, expand and index the articles. Right now, if you want to change a formula in the Encyclopedia , you have to retype the whole thing (also the images look worse than MathJax rendering but that’s not so important) – Maximilian Janisch Apr 06 '19 at 14:29
  • I don't know of a script that can do the job, but there are some softwares / extensions / etc. which let you convert an equation to TeX code by screen-cropping. –  Apr 06 '19 at 14:30
  • Related: https://tex.stackexchange.com/q/155166/35864 (but unfortunately the only real solution for this task mentioned there seems to be Mathpix, which you know already :-(). More distantly related: https://tex.stackexchange.com/q/1443/35864. Google also returned http://www.inftyproject.org/en/index.html – moewe Apr 06 '19 at 14:33
  • That all said the graphics from the Encyclopedia of Mathematics are quite low-quality https://www.encyclopediaofmath.org/legacyimages/r/r077/r077380/r07738034.png, https://www.encyclopediaofmath.org/legacyimages/r/r077/r077380/r077380107.png and your link (https://www.encyclopediaofmath.org/legacyimages/a/a130/a130040/a130040761.png) shows that they use quite a variety of symbols, so this is at the more complicated end of the OCR scale, I imagine. – moewe Apr 06 '19 at 14:47
  • 4
    $600 does not sound a lot for say a few days work but it would take me you or 1000 volunteers longer than 3 days to process. If they guarantee to rectify 150,000 errors for $600 then cloud funding on the pages may work. just my 2 $ worth of cents –  Apr 06 '19 at 15:14
  • You are of course right; Mathpix is super cheap for the time they save. I was also asking out of curiosity if there is any open source “effort” in this area... I edited it so it doesn’t sound like I don’t want to pay any money – Maximilian Janisch Apr 06 '19 at 20:24
  • Have you tried appealing to their better nature it may be that in return for a Logo/name check/ link on each page they may be willing to assist ? –  Apr 06 '19 at 22:24
  • 2
    This sounds like a great honours project! – Will Robertson Apr 07 '19 at 00:16

1 Answers1

1

The project was indeed finished successfully, the formulas were translated first automatically using MathPix and then corrected manually by hand afterwards.

More details on the methodology can be found here and more details on the manual corrections here.