I'm currently working on a project where I am converting PDF documents into a format that is easy to edit and display on a web interface, for this I am using Marker. Which converts PDF's to markdown and uses LaTeX tables. I have come upon difficulties when trying to display the parsed pdf onto a webpage using React. The difficulties are met when displaying the LaTeX using javascript libraries like remarkMath, reactMarkdown and reactGfm, it doesn't show at all, especially with tables.
That I know of, this is all completely latex, here is a snippet of the latex that I am trying to render:
\begin{table}
\begin{tabular}{l c c c} \hline \hline
**Model** & **ROUGE-L** & **BLEU-1** & **BLEU-4** & **METEOR** \\ \hline BiDAF (Kosisky et al., 2018) & \(6.2\) & \(5.7\) & \(0.3\) & \(3.7\) \\ BM25 + BERT (Mou et al., 2020) & \(15.5\) & \(14.5\) & \(1.4\) & \(5.0\) \\ Recursively Summarizing Books (Wu et al., 2021) & \(21.6\) & \(22.3\) & \(4.2\) & \(10.6\) \\ Retrieval + Reader (Izacard and Grave, 2022) & **32.0** & **35.3** & **7.5** & \(11.1\) \\
**RAPTOR + UnifiedQA** & 30.8 & 23.5 & 6.4 & **19.1** \\ \hline \hline \end{tabular}
\end{table}
Table 6: Performance comparison on the NarrativeQA dataset across multiple models, focusing on four metrics: ROUGE-L, BLEU-1, BLEU-4, and METEOR. RAPTOR, when paired with UnifiedQA 3B, not only surpasses retrieval methods like BM25 and DPR but also sets a new state-of-the-art in the METEOR metric.
\begin{table}
\begin{tabular}{c c c} \hline \hline \multirow{2}{}{Model} & \multicolumn{2}{c}{Accuracy} \ \cline{2-3} & Test Set* & Hard Subset \ \hline Longformer-base (Beltagy et al., 2020) & (39.5) & (35.3) \ DPR and DeBERTaV3-large (Pang et al., 2022) & (55.4) & (46.1) \ CoLISA (DeBERTaV3-large) (Dong et al., 2023) & (62.3) & (54.7) \
RAPTOR + GPT-4 & 82.6 & 76.2 \ \hline \hline \end{tabular}
\end{table}
**Comparison to State-of-the-art Systems** Building upon our controlled comparisons, we examine RAPTOR's performance relative to other state-of-the-art models. As shown in Table 5, RAPTOR with GPT-4 sets a new benchmark on QASPER, with a 55.7% F-1 score, surpassing the CoLT5 XL's score of 53.9%.
### Contribution of the tree structure
This is just a small snippet, its supposed to be able to render a large document / research paper.
Considered Solutions: I was thinking of using pandoc on the backend, to convert our markdown and latex documents into something readable, and to then fetch it and show it in the frontend. I also had thought of using Python libraries to convert this but did find that they may not be 100% compatible with LaTeX I've also tried using JavaScript libraries like react-Markdown, remarkGfm, and remarkMath, but none of these were able to output LaTeX tables.
Questions: If anyone has worked with something like this before, what would you recommend I do? Are there any better approaches or libraries ( JavaScript or Python ) that I might've missed? What would Pandoc convert the markdown and latex to?
Thank you so much! Any feedback would be great.
Current Approach: I have tried using javascript libraries like remarkMath, rehypeKatex, reactMarkdown, and reactGfm. The markdown converter works well, but when it gets to LaTeXtables it doesn't convert it at all. I was hoping to get some insight on how I would do this.