4

Situation: At Table of equations like a Glossary or memory help list user Alastor posted a Latex document, which contains several equation-environments. Assuming the real document from his teacher will be too long for manually extractions, some automation is requested.

Question: How to extract equation-environments (or other blocks) inside a Latex document?

Bernard
  • 271,350
MS-SPO
  • 11,519
  • If you have no control over the creation of the PDF, then I cannot help. But if your goal is to create a PDF in LaTeX allowing for good copy/paste (extraction) characteristics for math, see my answer at https://tex.stackexchange.com/questions/233390/in-which-way-have-fake-spaces-made-it-to-actual-use – Steven B. Segletes Jul 15 '21 at 16:19
  • @StevenB.Segletes: Thanks, it's a good question for user Alastor ;-) – MS-SPO Jul 15 '21 at 16:23
  • 1
    Thanks. I will comment at the referenced question. – Steven B. Segletes Jul 15 '21 at 16:25
  • 1
    You might look at the endfloat package. It can extract any specific environment and copy it to a file. – John Kormylo Jul 15 '21 at 19:03
  • endfloat will bring figures and tables to the end of a document ... it does not extract equation environments/blocks, as requested. – MS-SPO Jul 20 '21 at 14:06

3 Answers3

1

Solution: Use a simple Perl-script.

I outlined in the link above, what should be done, and discussed some alternatives. Please find some specific Perl code here, which will do the required extraction.

Result: From input to extraction

Step 1: create the Perl script (extractEq.pl)

Read, extract, assemble, put out. Done

#!/usr/bin/perl
use strict; use warnings;

~~~ reading the original Latex-doc ~~~~~~~~~

my $in = "latexOrig.tex"; open F, '<', $in or die "can't open $in\n"; my $out = "latexEq.tex"; my @x = <F>; my $x = join " ", @x;

~~~ finding equation environments ~~~~~~~~~~

my @l = split/begin{equation*?}/, $x; # splits at begin{... shift @l; # get rid of preamble etc. from this list (= array)

~~~ finding and removing text after \end{equation... ~~~~

for (my $i = 0; $i < @l; $i++) { # each list item my @s = split/end{equation*?}/, $l[$i]; # now split at end $l[$i] = $s[0]; # just keep the equation part }

~~~ assembling output in Latex-format ~~~~~~~

my $s = ''; foreach my $l (@l) { $s .= "\begin{equation}"; # we removed it above $s .= $l; # this is the equation part $s .= "end{equation}\n\n"; # we removed it above, and Perl left some \ }

~~~ put out ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

open G, '>', $out or die "can't open $out\n"; print G $s;

Step 2: run it from the command shell (DOS, bash, ...)

> ...\TEX-forum\4. eq table>perl extractEq.pl

Will write it into $out, which is set to latexEq.tex, and contains just, i.e. stripped-off all other "noise" within the teachers document:

\begin{equation*}\label{formula 1}
 \frac{\partial^{2} f}{\partial x \partial y}=\frac{\partial^{2} f}{\partial y \partial x}. 
 \end{equation*}

\begin{equation} F(x)=\int_{a}^{x} f(t) d t \label{formula 2} \end{equation}

\begin{equation} \int_{a}^{b} f(t) d t=g(b)-g(a) \label{formula 3} \end{equation}

Step 3: create a new Latex-doc to display the extracted equations (EQ.tex)

I.e. just replace the documents content by an \input statement:

\documentclass[12pt,a4papper]{article}
% this all remains unchanged
\usepackage[T1]{fontenc} 
\usepackage[spanish]{babel}

\usepackage{titlesec}

\titleformat{\section}[frame] {\small}{\filcenter\small \filleft UNIDAD \thesection \ } {3pt}{\Large\bfseries\filcenter} \usepackage[left=2.5cm,top=2cm,right=2.5cm,bottom=1.5cm]{geometry} \usepackage{amsthm} %para usar \theoremstyle \usepackage{xcolor} \usepackage{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{graphicx} \usepackage{thmtools}

\declaretheoremstyle[ spaceabove=7pt, spacebelow=7pt, headfont=\normalfont\bfseries, notefont=\mdseries\bfseries\itshape, notebraces={(}{)}, bodyfont=\normalfont\itshape, postheadspace=.5em, % numberlike=section, name=Teorema, thmbox=M, %shaded={bgcolor={rgb}{1,1,1}}, headformat=\NAME~\NUMBER \NOTE % %qed=$\blacksquare$ ]{Teorema}

\declaretheorem[style=Teorema]{teo}

% here the new thing starts \begin{document} \input{latexEq} % <<< <<< <<< \end{document}

Pecularities:

A ) From experience and watching programmers it's always a good idea to include use strict and use warnings, which require namespacing variables with my: preventive programming, failing early.

B ) Lists start with @ in Perl. Think of a flexible array. E.g. @x is a list of all code lines found in the opened file, accessible by index 0..n, while $x is it's flattend counterpart, i.e. just one long string.

C ) Finding all the \begin or \end parts is done here by using them as pattern to be matched, to break $x again into substrings. Fragments, not needed, are simply discarded. So after a while @l just has, in this case 3, lines, with whatever amount of Latex-lines of equations in it.

Note: split/end\{equation\*?\}/, matches both end{equation} and end{equation*} ... even Perl needs backslashes from time to time.

Note: If you want to extract other environments, this is your place to change keywords, i.e. matching patterns.

D ) For this example I decided to go without numbering of equations. \label{formula XYZ} is still there for reference, but will not be printed, off course. Modify as required.

MS-SPO
  • 11,519
1

Solution: Use package extract. It promisses to extract any environment.

The manual https://mirror.marwan.ma/ctan/macros/latex/contrib/extract/extract.pdf shows on page 3 some code, which seems to match your situation. I replicate it here, without adaption to your code, as it should be clear from the example itself.

File with "too much text": (sorry, just a screenshot, as the copy from pdf is ... bad)

document with too much information

Result:

extracted text

MS-SPO
  • 11,519
1

Many years ago I wrote the program mathgrep to carry out grep-like code on mathematics in a LaTeX document. To extract all the maths code, simply do:

mathgrep '/.*/' document.tex

(Note: if you use dollars for delimiting maths then you should first run debuck to correct this.)

Andrew Stacey
  • 153,724
  • 43
  • 389
  • 751