2

What toolchains are available to convert scans to LaTeX I am looking for a freeware way to convert PDFs (via OCR and other methods) to LaTeX.

The scanned documents

  • are in PDF form.
  • have a resolution of about 300 to 600dpi.
  • contain only few mathematical expressions.
  • contain pictures and text.

I can work with Windows or Ubunto equally fine. My research so far turned up InftyReader and writer2latex.

Balaji
  • 2,282
  • 1
    This is probably the wrong place to ask, but nonetheless; It might be helpful if you can show us some of your "Image PDFs", so we can judge the possibilities. Are they images or pages from a document which you need to convert? – Habi Feb 06 '13 at 10:09
  • Are you talking about reading in text from a scanned document into a LaTeX document or including images that are pdfs in a LaTeX document? – Nathanael Farley Feb 06 '13 at 10:35
  • @Habi: The pages are copyrighted and the resolution of PDFs pages are 300 dpi to 600 dpi. I have need to convert pages as well as images. – Balaji Feb 06 '13 at 11:16
  • 2
    Do you have lots of mathematical expressions in your pages? Or is it mostly text? Can you make a typical example with other contents than (your, I guess?) copyrighted content, so that we see what a typical use case would look like? For instance, it would be useful to know how "clean" the image is, how elaborate the layout and mathematics are, etc. – Bruno Le Floch Feb 06 '13 at 12:43
  • @BrunoLeFloch Mathematical expressions are very less and images are simple grayscale mode only. I have PDF import plugins in OpenOffice. Now i have found writer2latex can convert odt to latex format. – Balaji Feb 06 '13 at 12:58
  • You don't answer to whether those are scans of printed text, or obtained from, e.g., word/open office/other documents. – Bruno Le Floch Feb 06 '13 at 13:10
  • If these are scans (and they probably are), I think the most you'll get would be OCR of the text. I dint see aby way of getting useful LaTeX code for font sizes, margins, etc. – Mike Renfro Feb 06 '13 at 13:18
  • 1
    @Bala I really don't see why you'd need to convert copyrighted pages to LaTeX (i.e. I see it, but it might be for the wrong reasons). I think your best bet is with the InftyReader mentioned in your question or any other OCR software like Tesseract and manual editing afterwards. – Habi Feb 07 '13 at 09:43

0 Answers0