0

I'm currently working on a project where I am converting PDF documents into a format that is easy to edit and display on a web interface, for this I am using Marker. Which converts PDF's to markdown and uses LaTeX tables. I have come upon difficulties when trying to display the parsed pdf onto a webpage using React. The difficulties are met when displaying the LaTeX using javascript libraries like remarkMath, reactMarkdown and reactGfm, it doesn't show at all, especially with tables.

That I know of, this is all completely latex, here is a snippet of the latex that I am trying to render:

\begin{table}
\begin{tabular}{l c c c} \hline \hline
**Model** & **ROUGE-L** & **BLEU-1** & **BLEU-4** & **METEOR** \\ \hline BiDAF (Kosisky et al., 2018) & \(6.2\) & \(5.7\) & \(0.3\) & \(3.7\) \\ BM25 + BERT (Mou et al., 2020) & \(15.5\) & \(14.5\) & \(1.4\) & \(5.0\) \\ Recursively Summarizing Books (Wu et al., 2021) & \(21.6\) & \(22.3\) & \(4.2\) & \(10.6\) \\ Retrieval + Reader (Izacard and Grave, 2022) & **32.0** & **35.3** & **7.5** & \(11.1\) \\
**RAPTOR + UnifiedQA** & 30.8 & 23.5 & 6.4 & **19.1** \\ \hline \hline \end{tabular}
\end{table}
Table 6: Performance comparison on the NarrativeQA dataset across multiple models, focusing on four metrics: ROUGE-L, BLEU-1, BLEU-4, and METEOR. RAPTOR, when paired with UnifiedQA 3B, not only surpasses retrieval methods like BM25 and DPR but also sets a new state-of-the-art in the METEOR metric.

\begin{table} \begin{tabular}{c c c} \hline \hline \multirow{2}{}{Model} & \multicolumn{2}{c}{Accuracy} \ \cline{2-3} & Test Set* & Hard Subset \ \hline Longformer-base (Beltagy et al., 2020) & (39.5) & (35.3) \ DPR and DeBERTaV3-large (Pang et al., 2022) & (55.4) & (46.1) \ CoLISA (DeBERTaV3-large) (Dong et al., 2023) & (62.3) & (54.7) \ RAPTOR + GPT-4 & 82.6 & 76.2 \ \hline \hline \end{tabular} \end{table}

**Comparison to State-of-the-art Systems** Building upon our controlled comparisons, we examine RAPTOR's performance relative to other state-of-the-art models. As shown in Table 5, RAPTOR with GPT-4 sets a new benchmark on QASPER, with a 55.7% F-1 score, surpassing the CoLT5 XL's score of 53.9%.

### Contribution of the tree structure

This is just a small snippet, its supposed to be able to render a large document / research paper.

Considered Solutions: I was thinking of using pandoc on the backend, to convert our markdown and latex documents into something readable, and to then fetch it and show it in the frontend. I also had thought of using Python libraries to convert this but did find that they may not be 100% compatible with LaTeX I've also tried using JavaScript libraries like react-Markdown, remarkGfm, and remarkMath, but none of these were able to output LaTeX tables.

Questions: If anyone has worked with something like this before, what would you recommend I do? Are there any better approaches or libraries ( JavaScript or Python ) that I might've missed? What would Pandoc convert the markdown and latex to?

Thank you so much! Any feedback would be great.

Current Approach: I have tried using javascript libraries like remarkMath, rehypeKatex, reactMarkdown, and reactGfm. The markdown converter works well, but when it gets to LaTeXtables it doesn't convert it at all. I was hoping to get some insight on how I would do this.

  • I guess that if you have a mix of markdown + LaTeX, the first should be export it with pandoc to only LaTeX to have a single language, and finally to whatever that you want, but with so many exports the results could be computer version of the broken telephone game. – Fran Mar 26 '24 at 23:16

0 Answers0