1

I have one input file which will be used for LaTeX and HTML (Java program). Input file has HTML markups like

<b></b> for bold.
<li></li> for list.

How can I convert above markup into LaTeX markup while generating pdf?

doncherry
  • 54,637
manish
  • 9,111
  • 4
  • I don't want to convert HTML into latex but jsut need to modify some text. – manish Jan 10 '13 at 05:59
  • So you want LaTeX to convert (say) <b>hello</b> to {\bfseries hello} or \textbf{hello} within a certain piece of your code? That is, keep the code HTML but interpret it in a different way? Is this because you don't want to change the HTML mark-up to LaTeX, I guess, to easily transition between the two formats. – Werner Jan 10 '13 at 06:04

1 Answers1

2

Did you try sed?

cat test 
<b>Bold text</b>. Regular text, <i>italics</i>, <tt>teletype</tt>, regular.
Nested: <b>Bold <i>bold italics</i> again bold</b> 

sed -e 's|<b>\(.*\)</b>|\\textbf{\1}|g' -e 's|<i>\(.*\)</i>|\\textit{\1}|g' -e 's|<tt>\(.*\)</tt>|\\texttt{\1}|g' test
\textbf{Bold text}. Regular text, \textit{italics}, \texttt{teletype}, regular.
Nested: \textbf{Bold \textit{bold italics} again bold} 
Eddy_Em
  • 1,405
  • @ Eddy_Em your answer is very useful, can we convert
    also into latex markup.
    – manish Jan 16 '13 at 04:26
  • Yes: sed -e 's|<ol>\(.*\)</ol>|\\begin{enumerate}\n\1\\end{enumerate}|g' -e 's|<li>\([^<]*\)</li>|\t\\item \1\n|g' but if there could be other tags between <li></li> regexp would be more complex. Another variant is to remove all </li> and substitute by /item only <li>. Also this construction will not work for nested lists. – Eddy_Em Jan 16 '13 at 05:47
  • Above script does not convert ul or ol into \itemize – manish Jan 16 '13 at 08:48
  • echo "<ul><li>1a</li><li>2a</li></ul>" | sed -e 's|<ul>|\\begin{enumerate}|g' -e 's|</ul>|\\end{enumerate}|g' -e 's|<li>|\\item |g' -e 's|</li>||g' \begin{enumerate}\item 1a\item 2a\end{enumerate} – Eddy_Em Jan 16 '13 at 09:14