Convert different Latex styles into one single format

Question

I'm doing a project that converts science formula images into Latex strings.

During the development, I found that with the same formula, we have different ways to conduct its Latex string.

For example:

\left(A\right)\frac{125}{300};
\text { (A) } \frac{125}{300} \text {; }
(A)\frac{125}{300};

Three Latex strings above describe the same following mathematical formula image:

Are there any ways to convert all different Latex styles into one single format? If possible, I can evaluate LatexOCR with CER/WER metrics, accuracy, ... or compare between different API services more precisely and conveniently.

Updated: In my case, what I currently want is to compare relatively the rendered outputs between different OCR API services automatically. However, it will be impossible if I only lay on its Latex values (because of the difference I have listed above). Of course, when I develop models and evaluate in-house solutions, all details matter.

they don't describe the same formula, a text A has a different meaning to a math A. They not even look the same, e.g. the spacing will be different and the A will by upright or italic. — Ulrike Fischer, Nov 14 '21 at 09:28
if you mean you are generating latex by OCR of images then you shoul davoid generating the first two as the third is the correct form in almost all cases. — David Carlisle, Nov 14 '21 at 10:01
@UlrikeFischer I understand the problem you have mentioned above. But in my case, what I currently want is to compare relatively the rendered outputs between different OCR API services automatically. However, it will be impossible if I only lay on its Latex values (because of the difference I have listed above). Of course, when I develop models and evaluate in-house solutions, all details matter. — nguyendhn, Nov 17 '21 at 02:58
(by the way there are similar projects for OCRing LaTeX – inftyreader / mathpix for comparison, see also https://tex.stackexchange.com/questions/8503/how-to-convert-pdf-to-latex ) — user202729, Nov 17 '21 at 04:25
For this one it seems that the proper way is to use enumitem package to generate (A), (B) etc. automatically. // — In the general case there obviously isn't any way (because LaTeX is a programming language which allows very complex structure), if you restrict to a subset of LaTeX then it might be possible (but you have to define exactly what "equivalent" mean), but I don't think any such program exists yet. (although there are some for parsing (a subset of) LaTeX from other programming languages such as Python etc.) — user202729, Nov 17 '21 at 04:27

Convert different Latex styles into one single format

0 Answers0