Is there a way to extract math from a pdf in LaTeX format using SESHAT?

Asked May 07 '22 at 16:42

Active May 07 '22 at 16:50

Viewed 1,236 times

So I've been tasked with using SESHAT to extract the math problems from a large pdf into latex format, but seshat requires an InkML or SCGink file type as it's input. Is there a way I can convert pdf to one of those file types before feeding it into seshat to get the output?

Alternatively, is there a better way to do this, with a tool other than seshat? I can't use mathpix

Seshat: seshat

edited May 07 '22 at 16:50

asked May 07 '22 at 16:42

JustLearning321

you can use several tools eg mathpix (evn if you can' use it) to get tex from a pdf, although usually it's better to start from the source rather than the pdf, but what is seshat ? – David Carlisle May 07 '22 at 16:46
this? https://github.com/falvaro/seshat/blob/master/README.md – David Carlisle May 07 '22 at 16:50
@DavidCarlisle yeah, that's the one – JustLearning321 May 07 '22 at 16:50
@DavidCarlisle what tools could I use other than mathpix? – JustLearning321 May 07 '22 at 16:52
well it depends on how the math is encoded in the pdf, if it has good unicode mapping you may get away with pdf2text then adding some tex markup, there are other commercial ocr tools i haven't used, depends why you can't use mathpix. But basically i'd always try to look at the pdf properties for the source of the pdf then use that. If the pdf is generated from tex, starting from the tex source simplifies the problem.... – David Carlisle May 07 '22 at 17:13
Some approaches are mentioned on the related question https://tex.stackexchange.com/questions/8503/how-to-convert-pdf-to-latex and the list of linked questions from there (https://tex.stackexchange.com/questions/linked/8503?lq=1). – Marijn May 08 '22 at 10:12

Is there a way to extract math from a pdf in LaTeX format using SESHAT?

0 Answers0