Under Windows 8, I am using R and the knitr package in RStudio with an .Rnw script. My regex skills are passable, but only in R.
My document has about thirty pages with a different text section at the bottom of each page. The text sections start the same, \subsubsection{\textcolor{blue}{\textsf{R} Note:, and end with \clearpage.
How can I extract all the text sections after "Note:" into a file as plain text?
Here is my MWE:
\documentclass[11pt]{article}
\begin{document}
\section{How to Find and Extract Text within Fixed Markers}
\noindent\rule{150pt}{1.7pt}
\vspace{2pt}
\subsubsection{\textcolor{blue}{\textsf{R} Note: History and Current Status of \textsf{R}}}: \textsf{R} is an open source software platform used to analyze data. Written originally in the early 1990's, it is an \textbf{object-oriented}, \textbf{interpreted} programming language blessed with a wealth of constantly-growing and improving "packages" and a vibrant user community.
\clearpage
\section{Each Note is at the bottom of a page. All have the same start up to Note: They end with clearpage}
\noindent\rule{150pt}{1.7pt}
\vspace{2pt}
\subsubsection{\textcolor{blue}{\textsf{R} Note: Different text}}: Some other text. **There are NO OTHER COMMANDS after the colon except \textsf{R} and all of it consists of ONE LINE OF TEXT.**
\clearpage
\end{document}
Several questions here seemed like they would guide me, on what may well be a very simple question, but I could not figure out what to do.
relies on text being in an environment
HERE IS THE OUTPUT FILE:
This is XeTeX, Version 3.14159265-2.6-0.99991 (MiKTeX 2.9 64-bit) (preloaded format=xelatex 2015.12.3) 3 FEB 2016 20:03
entering extended mode
**soquestion.tex
(soquestion.tex
LaTeX2e <2014/05/01>
Babel <3.9l> and hyphenation patterns for 69 languages loaded.
("C:\Program Files\MiKTeX 2.9\tex\latex\base\article.cls"
Document Class: article 2014/09/29 v1.4h Standard LaTeX document class
("C:\Program Files\MiKTeX 2.9\tex\latex\base\size11.clo"
File: size11.clo 2014/09/29 v1.4h Standard LaTeX file (size option)
Requested font "cmr10" at 10.95pt
)
\c@part=\count80
\c@section=\count81
\c@subsection=\count82
\c@subsubsection=\count83
\c@paragraph=\count84
\c@subparagraph=\count85
\c@figure=\count86
\c@table=\count87
\abovecaptionskip=\skip41
\belowcaptionskip=\skip42
\bibindent=\dimen102
)
("C:\Program Files\MiKTeX 2.9\tex\latex\graphics\graphicx.sty"
Package: graphicx 2014/10/28 v1.0g Enhanced LaTeX Graphics (DPC,SPQR)
("C:\Program Files\MiKTeX 2.9\tex\latex\graphics\keyval.sty"
Package: keyval 2014/10/28 v1.15 key=value parser (DPC)
\KV@toks@=\toks14
)
("C:\Program Files\MiKTeX 2.9\tex\latex\graphics\graphics.sty"
Package: graphics 2014/10/28 v1.0p Standard LaTeX Graphics (DPC,SPQR)
("C:\Program Files\MiKTeX 2.9\tex\latex\graphics\trig.sty"
Package: trig 1999/03/16 v1.09 sin cos tan (DPC)
)
("C:\Program Files\MiKTeX 2.9\tex\latex\00miktex\graphics.cfg"
File: graphics.cfg 2007/01/18 v1.5 graphics configuration of teTeX/TeXLive
)
Package graphics Info: Driver file: xetex.def on input line 91.
("C:\Program Files\MiKTeX 2.9\tex\xelatex\xetex-def\xetex.def"
File: xetex.def 2015/03/25 v4.04 LaTeX color/graphics driver for XeTeX (TeX Liv
e/RRM/JK)
))
\Gin@req@height=\dimen103
\Gin@req@width=\dimen104
)
("C:\Program Files\MiKTeX 2.9\tex\latex\graphics\color.sty"
Package: color 2014/10/28 v1.1a Standard LaTeX Color (DPC)
("C:\Program Files\MiKTeX 2.9\tex\latex\00miktex\color.cfg"
File: color.cfg 2007/01/18 v1.5 color configuration of teTeX/TeXLive
)
Package color Info: Driver file: xetex.def on input line 137.
) (framed.sty
Package: framed 2011/10/22 v 0.96: framed or shaded text with page breaks
\OuterFrameSep=\skip43
\fb@frw=\dimen105
\fb@frh=\dimen106
\FrameRule=\dimen107
\FrameSep=\dimen108
)
("C:\Program Files\MiKTeX 2.9\tex\latex\base\alltt.sty"
Package: alltt 1997/06/16 v2.0g defines alltt environment
)
("C:\Program Files\MiKTeX 2.9\tex\latex\upquote\upquote.sty"
Package: upquote 2012/04/19 v1.3 upright-quote and grave-accent glyphs in verba
tim
)
LaTeX Warning: Unused global option(s):
[table].
(soquestion.aux)
LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 52.
LaTeX Font Info: ... okay on input line 52.
LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 52.
LaTeX Font Info: ... okay on input line 52.
LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 52.
LaTeX Font Info: ... okay on input line 52.
LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 52.
LaTeX Font Info: ... okay on input line 52.
LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 52.
LaTeX Font Info: ... okay on input line 52.
LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 52.
LaTeX Font Info: ... okay on input line 52.
Requested font "cmr12" at 14.4pt
Requested font "cmbx12" at 14.4pt
Requested font "cmbx10" at 10.95pt
Requested font "cmssbx10" at 10.95pt
Requested font "cmss10" at 10.95pt
[1
]
Overfull \hbox (21.13795pt too wide) in paragraph at lines 62--62
\OT1/cmr/bx/n/14.4 the same start up to Note: They end with clearpage
[]
[2
] (soquestion.aux) )
Here is how much of TeX's memory you used:
715 strings out of 428783
8144 string characters out of 3164549
61307 words of memory out of 3000000
4071 multiletter control sequences out of 15000+200000
5452 words of font info for 20 fonts, out of 3000000 for 9000
1096 hyphenation exceptions out of 8191
25i,5n,21p,416b,117s stack positions out of 5000i,500n,10000p,200000b,50000s
Output written on soquestion.pdf (2 pages).
\clearpage– egreg Feb 02 '16 at 22:34\subsubsectionand the\clearpage, regardless of how many lines there are? Do you want the\subsubsectioncommand itself, or just the contents of that subsubsection? – Mike Renfro Feb 03 '16 at 01:26\subsubsectionshould be excluded. Is there any chance for that subsubsection will be followed by another\subsubsectionand a\clearpage? If so, it may be difficult to restrict the matches correctly. Is there any chance there will be multiple lines in the text file between the\subsubsectionand\clearpagecommands? Everything to extract in your test file is a single line so far. – Mike Renfro Feb 03 '16 at 02:12\clearpageso that one is excluded? Why aren't you using\footnote? – cfr Feb 04 '16 at 01:47