CalTech has placed Volume I of the classic Feynman Lectures on Physics online in HTML format, using MathJax to render the LaTeX formulas and SVG for the graphic images. It's pretty cool, and it looks like they did a pretty decent job with the conversion from the original book format:
Anyway, I would actually prefer to read these lectures in PDF format, as typeset by TeX and using TeX fonts and TeX's fine typesetting. What would be involved in programmatically extracting all the source formulas, text, and images from the web pages and converting these to LaTeX source files?
There is a nice list of tools in one of the answers to this question:
but before diving in to explore those, I'm wondering if anyone can recommend a solution just by eyeballing the Feynman conversion HTML source.