0

I have extracted subtitle text in .srt format.

It is essentially text with some keywords that indicate which type of formating to apply to different parts.

Formatting is derived from HTML tags for bold, italic, underline and color:

Bold – <b> ... </b> or {b} ... {/b}

Italic – <i> ... </i> or {i} ... {/i}

Underline – <u> ... </u> or {u} ... {/u}

Font color – <font color="color name or #code"> ... </font> (as in HTML)

Now, I would like to convert it to LaTeX format. Does anyone know how to do this?

Thanks.

EDIT: Here is a sample data:

37
00:03:28,544 --> 00:03:32,544
Maintenant une équation linéaire à deux inconnues

38
00:03:32,544 --> 00:03:36,544
est de la forme : 
<i>c·aᵢ₁(α₁ - β₁) + c·aᵢ₂(α₂ - β₂) + ... + c·aₙ₁(αₙ - βₙ)) = 0.</i>,

39
00:03:37,841 --> 00:03:44,091
je nomme cette fois les inconnues <i>x</i> et <i>y</i>, où <i>a</i>, <i>b</i> et <i>c</i> sont des nombres réels.

I extract the text using pysub2

Maintenant une équation linéaire à deux inconnues est de la forme :

<i>c·aᵢ₁(α₁ - β₁) + c·aᵢ₂(α₂ - β₂) + ... + c·aₙ₁(αₙ - βₙ)) = 0.</i>, 

je nomme cette fois les inconnues <i>x</i> et <i>y</i>, où <i>a</i>, 
<i>b</i> et <i>c</i> sont des nombres réels.

I would like to have exactly the same text in Latex form, but of course, without the formatting tags (i.e. etc.) but with Latex formatting (for instance: \textit{c} instead of <i>c</i>

Henri Menke
  • 109,596
james
  • 325
  • What is srt format? – Johannes_B Mar 03 '19 at 06:42
  • @Johannes_B please see my edited question. – james Mar 03 '19 at 06:44
  • Why do you want to convert this to a pdf? Isn't it completely useless in this format? – Johannes_B Mar 03 '19 at 06:46
  • @Johannes_B It depends for what you need it. In my case, it is not useless. – james Mar 03 '19 at 06:55
  • Isn't srt just text? You can just copy/paste it. LaTeX is also just text. – Johannes_B Mar 03 '19 at 06:56
  • @Johannes_B Yes, but I want to preserve formating. – james Mar 03 '19 at 07:05
  • https://tex.stackexchange.com/questions/45550/i-just-want-to-have-the-text-in-my-pdf-the-same-way-i-have-it-in-my-editor – Johannes_B Mar 03 '19 at 07:06
  • 1
    @james If verbatim or listings are not ok for you, then explain your question further and show what you want. –  Mar 03 '19 at 07:16
  • @JouleV Please have a look at my updated question. The srt file is essentially text with keywords indicating which formating to apply. – james Mar 03 '19 at 09:00
  • @Johannes_B Thanks for the link. Please have a look at my updated question. – james Mar 03 '19 at 09:01
  • @james (+1) Now I can see that your question is interesting! This is related (not a duplicate, of course): Converting Markdown to LaTeX, in LaTeX. –  Mar 03 '19 at 09:02
  • Have you looked on the pandoc side? https://pandoc.org – AndréC Mar 03 '19 at 09:07
  • @AndréC sounds intresting, but I cannot find srt conversion – james Mar 03 '19 at 15:40
  • Since they are HTML tags, convert from HTML to LaTeX – AndréC Mar 03 '19 at 15:41
  • @AndréC good point. It seems a good path to take. If it is not too much asked, Maybe you could create an answer with a sample srt file in which you guide us step by step of how to convert it using your envisioned method. – james Mar 03 '19 at 15:45
  • I have never created a.srt code, you should indicate in your question how you created it and give the result. – AndréC Mar 03 '19 at 15:52
  • @AndréC I added a sample srt.code. I hope this helps. :) – james Mar 03 '19 at 19:43
  • 1
    @james The first problem we face here is character encoding. Indeed, β₂ is not the way LaTeX writes this character. LaTeX writes this $\beta_2$. So, you need to find a way to convert the characters encoded in this way to LaTeX code. There are specialists in coding and fonts, perhaps they can explain how to proceed. Ask a new question. – AndréC Mar 03 '19 at 20:00
  • Listings etc. could easily handle some parts of the conversion but would need some care in its ordering that would take multiple samples to get right thus a simple answer to your mwe would not work instantly with the next but over a few cycles could. so starting with {<i>} replace with {\textit{}{8} and replace {</i>} with {}}{1} replacing {β₂} with {$\beta_2$}{9) is easy but the large number of variations would be time consuming It may be as said easier to use a dedicated string replacement program to do the conversion then simply \include that output –  Mar 03 '19 at 20:26
  • Looks like a LuaLaTeX task to me..... – JPi Mar 11 '19 at 01:14

0 Answers0