It is possible to achieve what you want, provided you do not want TeX to act as a parser. In my opinion, part of the success of TeX, is that it has managed to transform itself over the years to act as a language transformation tool. First it was TeX->Postscript and now it is TeX->pdf. Tralics has been fairly successful to produce TeX->XML.
But, I think one needs to look at the problem from a different angle. With todays available technologies one, needs to have a "Universal Mark-up Language". Markdown and Yaml are scaled down tools and can never be able to be full document description languages, so going that route will limit one's efforts.
Sometime back, I designed a CMS based on text files. All mark-up was in plain text and fragments from Wikipedia's markup language. I would load the text file via php and then filter the input and produce the HTML page.
<!--
{{feature-image: http://localhost/images/sample102.jpg }}
{{feature: A collection is like a puzzle...}}
-->
The feature-image was a div and the feature-text the caption. I had commands for image-credits and the like.
Now this is not so difficult to produce with TeX. So my proposal is to actually use TeX to write an intermediate mark-up in a text file then parse with your language of choice to achieve what you wish.
Workflow depending on targets can be one of the following:
TeX->Intermediate MarkUp->HTML
TeX->pdf
TeX->plain text
Intermediate MarkUp->Translator (javascript, perl, python,
ruby, php, your language) ->TeX
In a nutshell, retain TeX and output into a new mark-up language. Markdown and other technologies can be a subset of this.
\documentclass{article}
\usepackage[demo]{graphicx}
\usepackage{verbdef}
\begin{document}
\makeatletter
%% create file and open it to write
\newwrite\file
\immediate\openout\file=wikimark.wiki
\newif\if@wikimark
\newif\if@html
\@wikimarktrue
\def\image#1#2{%
\if@wikimark
\image@@{#1}{#2}
\else
\includegraphics{dummy.png}
\fi
}
\def\Section#1{%
\if@wikimark
\section@@{#1}\relax
\else
\section{#1}
\fi
}
\def\image@@#1#2{%
\immediate\write\file{\string{\string{img:#1\string}\string}}
\immediate\write\file{\string{\string{img-caption:#2\string}\string}}
}
\edef\hash@@{\string#\string#}
\def\section@@#1{%
\immediate\write\file{\hash@@ #1}
}
\makeatother
\Section{Test Section}
\image{http://tex.stackexchange.com/questions/15440/parsing-files-through-lua-tex}{This is the caption}
\closeout\file
\end{document}
The minimal is just a proof of concept. Main idea here is not to redefine the LaTeX commands but rather add new ones with switches for other mark-up.
pdftotextpostprocessor. Extra stuff can sneak in by accident in whatever solution you try. – Aditya May 31 '11 at 18:59back-exp.lua. IIUC, it builds a tree of the entire document in memory...each macro and environment defined using\defineand\definestartstopshooks into that tree; all the core envuronments (itemize, enumerate, section, etc) hook into the tree. Then, at the end of the document, ConTeXt simply serializes the tree and writes it to a separate text file. – Aditya May 31 '11 at 21:34pdftotextis pretty reliable. Just create a pdf with teletype font, no headers and footers (AND no math and no graphics). Think of the pdf output as your text file. – Aditya May 31 '11 at 21:38\ttfamilyat the start and that seems to deal with hyphenation and ligatures. I shan't try Hebrew! My sticking point now is getting newlines in to the outputted text, in particular double newlines (in place of\par). – Andrew Stacey Jun 01 '11 at 20:04\setupwhitespace[line](which, in LaTeX parlance, sets the\parskipto be equal\baselineskip) and then usepdftotext -nopgbrk -layout. The-layoutoption also does line wrapping, so you may need to play with the paper width if you want to prevent line wrapping. – Aditya Jun 03 '11 at 04:57pdftotextas the last stage ... and it works! I was able to produce the source text to this page: http://ncatlab.org/nlab/show/equivariant+tubular+neighbourhoods by writing a LaTeX document with a special style file. So could you assemble your various comments in to an answer which I can accept? (If it's alright by you, after you've done that then I might add some details on exactly what I did, but I'd like to give you the credit for the solution.) – Andrew Stacey Jun 21 '11 at 21:45