We can use b3m2a1's suggestion to extract screenshots from the existing video file using the subRip timings for the subtitles. I'll assume each slide is displayed in the video when the subtitle occurs.
The subtitle timings in the subRip file provide a quick shortcut to locate video frames for each slide -- avoiding the need to scan the video for matching frames.
Read a subRip file and locate subtitle timings
SubRip files are text files that contain subtitles and timings for a video. First, read the subRip file into a list, where each item in subRip is a line from the file. I've saved the subRip text from the question in a sample file named "subtitles.srt".
subRip = ReadList["subtitles.srt", String];
Next, extract the lines that have the times each subtitle appears and disappears on the screen. The lines for these times have a specific format that we can use to select them using: Select[subRip,StringMatchQ[#,{__~~Whitespace~~"-->"~~Whitespace~~__}]&]. The subRip times use a comma in place of a decimal point. Fix that with StringReplace[...,","->"."]. Finally, split the times at the "-->" string with StringSplit[..., Whitespace~~"-->"~~Whitespace]. Combining these steps gives the start and end times for each subtitle as strings representing hours, minutes, and seconds.
subtitleTimings =
StringSplit[
StringReplace[
Select[subRip, StringMatchQ[#, {__~~Whitespace~~"-->"~~Whitespace~~__}] &],
"," -> "."], Whitespace ~~ "-->" ~~ Whitespace]
We get these strings:
{{"00:00:04.359", "00:00:07.009"}, {"00:00:07.447", "00:00:09.873"},
{"00:00:10.073", "00:00:11.948"}, {"00:00:12.148", "00:00:14.872"},
{"00:00:15.072", "00:00:16.722"}, {"00:00:17.175", "00:00:20.188"},
{"00:00:20.388", "00:00:23.276"}}
Convert the timing strings to seconds
Convert each string with DateObject, and convert the times to seconds. Partition the values in pairs.
timesSeconds =
Partition[
UnitConvert[
Total[DateValue[
DateObject[#], {"Hour", "Minute", "Second", "Millisecond"},
Quantity]] & /@ Flatten[subtitleTimings], "Seconds"], {2}]
These are the start end end times for each subtitle.
{{4.359s,7.009s},{7.447s,9.873s},{10.073s,11.948s},
{12.148s,14.872s},{15.072s,16.722s},{17.175s,20.188s},
{20.388s,23.276s}}
Find video frames for each slide and import
The frame rate is available from the video file. We need to know the frame rate to get a frame number from a time in seconds. The frame rate is a value in seconds. Assume a QuickTime video file.
frameRate = Quantity[Import["file.mov", "FrameRate"], 1/"Seconds"];
Using frameRate, compute beginning- and end-frame for each subtitle.
framesStartEnd = Round[timesSeconds*frameRate];
Average the start and end frames. This gives the frames to extract from the video when it's assumed a slide is displayed. Use frameList to get slides from the video.
frameList =
Round[(First /@ framesStartEnd + Last /@ framesStartEnd)/2];
slides = Import["file.mov", {"ImageList", frameList}];
Timestamps and subtitles
SubRip allows multiple lines of text for each subtitle. For simplicity, assume one line of text per subtitle.
Get timestamps for each subtitle with:
timeStamps = First /@ timesSeconds
{4.359s,7.447s,10.073s,12.148s,15.072s,17.175s,20.388s}
Get the subtitles for each slide:
transcripts =
Split[
Select[subRip, !StringMatchQ[#, {__~~Whitespace~~"-->"~~Whitespace~~__}]&],
DigitQ[#] &]
{{1,"Welcome to this lesson in micro and nano fabrication."},
{2,"The picture behind me shows a colorful SEM image"},
{3,"of a bi-morph MEMS activator."},
{4,"And in the next few minutes, I will show how it was fabricated"},
{5,"in our clean room at EPFL."},
{6,"Although it does not involve all possible fabrication steps"},
{7,"that are nowadays available in advanced MEMS processes."}}
The results are slides, timeStamps and transcripts.
StringCases). Then use that to extract the appropriate screenshots, since you say you already have the video file. – b3m2a1 Oct 15 '17 at 18:26