How to extract equation-environments (or other blocks) inside a Latex document?

Question

Situation: At Table of equations like a Glossary or memory help list user Alastor posted a Latex document, which contains several equation-environments. Assuming the real document from his teacher will be too long for manually extractions, some automation is requested.

Question: How to extract equation-environments (or other blocks) inside a Latex document?

If you have no control over the creation of the PDF, then I cannot help. But if your goal is to create a PDF in LaTeX allowing for good copy/paste (extraction) characteristics for math, see my answer at https://tex.stackexchange.com/questions/233390/in-which-way-have-fake-spaces-made-it-to-actual-use — Steven B. Segletes, Jul 15 '21 at 16:19
@StevenB.Segletes: Thanks, it's a good question for user Alastor ;-) — MS-SPO, Jul 15 '21 at 16:23
You might look at the endfloat package. It can extract any specific environment and copy it to a file. — John Kormylo, Jul 15 '21 at 19:03
endfloat will bring figures and tables to the end of a document ... it does not extract equation environments/blocks, as requested. — MS-SPO, Jul 20 '21 at 14:06

score 1 · Accepted Answer · answered Jul 15 '21 at 16:14

Solution: Use a simple Perl-script.

I outlined in the link above, what should be done, and discussed some alternatives. Please find some specific Perl code here, which will do the required extraction.

Result:

Step 1: create the Perl script (extractEq.pl)

Read, extract, assemble, put out. Done

#!/usr/bin/perl
use strict; use warnings;
~~~ reading the original Latex-doc ~~~~~~~~~
my $in = "latexOrig.tex"; open F, '<', $in  or die "can't open $in\n";
my $out = "latexEq.tex";
my @x = <F>;
my $x = join " ", @x;
~~~ finding equation environments ~~~~~~~~~~
my @l = split/begin{equation*?}/, $x; # splits at begin{...
shift @l; # get rid of preamble etc. from this list (= array)
~~~ finding and removing text after \end{equation... ~~~~
for (my $i = 0; $i < @l; $i++) {  # each list item
    my @s = split/end{equation*?}/, $l[$i];  # now split at end
    $l[$i] = $s[0];  # just keep the equation part
}
~~~ assembling output in Latex-format ~~~~~~~
my $s = '';
foreach my $l (@l) {
    $s .= "\begin{equation}";   # we removed it above
    $s .= $l;                     # this is the equation part
  $s .= "end{equation}\n\n";   # we removed it above, and Perl left some \
}
~~~ put out ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
open G, '>', $out or die "can't open $out\n";
print G $s;

Step 2: run it from the command shell (DOS, bash, ...)

> ...\TEX-forum\4. eq table>perl extractEq.pl

Will write it into $out, which is set to latexEq.tex, and contains just, i.e. stripped-off all other "noise" within the teachers document:

\begin{equation*}\label{formula 1}
 \frac{\partial^{2} f}{\partial x \partial y}=\frac{\partial^{2} f}{\partial y \partial x}. 
 \end{equation*}
\begin{equation}
 F(x)=\int_{a}^{x} f(t) d t \label{formula 2}
 \end{equation}
\begin{equation}
 \int_{a}^{b} f(t) d t=g(b)-g(a)    \label{formula 3}
 \end{equation}

Step 3: create a new Latex-doc to display the extracted equations (EQ.tex)

I.e. just replace the documents content by an \input statement:

\documentclass[12pt,a4papper]{article}
% this all remains unchanged
\usepackage[T1]{fontenc} 
\usepackage[spanish]{babel}
\usepackage{titlesec}
\titleformat{\section}[frame]
{\small}{\filcenter\small
\filleft UNIDAD \thesection \ }
{3pt}{\Large\bfseries\filcenter}
\usepackage[left=2.5cm,top=2cm,right=2.5cm,bottom=1.5cm]{geometry} 
\usepackage{amsthm} %para usar \theoremstyle
\usepackage{xcolor}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{thmtools}
\declaretheoremstyle[
spaceabove=7pt,  spacebelow=7pt,
headfont=\normalfont\bfseries,
notefont=\mdseries\bfseries\itshape, notebraces={(}{)},
bodyfont=\normalfont\itshape,
postheadspace=.5em, %
numberlike=section,
name=Teorema,
thmbox=M,
%shaded={bgcolor={rgb}{1,1,1}},
headformat=\NAME~\NUMBER \NOTE %
%qed=$\blacksquare$
]{Teorema}
\declaretheorem[style=Teorema]{teo}
% here the new thing starts
\begin{document}
    \input{latexEq}   % <<< <<< <<< 
\end{document}

Pecularities:

A ) From experience and watching programmers it's always a good idea to include use strict and use warnings, which require namespacing variables with my: preventive programming, failing early.

B ) Lists start with @ in Perl. Think of a flexible array. E.g. @x is a list of all code lines found in the opened file, accessible by index 0..n, while $x is it's flattend counterpart, i.e. just one long string.

C ) Finding all the \begin or \end parts is done here by using them as pattern to be matched, to break $x again into substrings. Fragments, not needed, are simply discarded. So after a while @l just has, in this case 3, lines, with whatever amount of Latex-lines of equations in it.

Note: split/end\{equation\*?\}/, matches both end{equation} and end{equation*} ... even Perl needs backslashes from time to time.

Note: If you want to extract other environments, this is your place to change keywords, i.e. matching patterns.

D ) For this example I decided to go without numbering of equations. \label{formula XYZ} is still there for reference, but will not be printed, off course. Modify as required.

+1 Nice script, have you tried ltximg? I believe it can do the job you want it to do perfectly. — Pablo González L, Sep 21 '21 at 19:03

score 1 · Answer 2 · answered Jul 21 '21 at 18:11

Solution: Use package extract. It promisses to extract any environment.

The manual https://mirror.marwan.ma/ctan/macros/latex/contrib/extract/extract.pdf shows on page 3 some code, which seems to match your situation. I replicate it here, without adaption to your code, as it should be clear from the example itself.

File with "too much text": (sorry, just a screenshot, as the copy from pdf is ... bad)

Result:

score 1 · Answer 3 · answered Jul 21 '21 at 19:00

1

Many years ago I wrote the program mathgrep to carry out grep-like code on mathematics in a LaTeX document. To extract all the maths code, simply do:

mathgrep '/.*/' document.tex

(Note: if you use dollars for delimiting maths then you should first run debuck to correct this.)

answered Jul 21 '21 at 19:00

Andrew Stacey

153,724
43
389
751

1

nice solution :) – MS-SPO Jul 21 '21 at 19:28

How to extract equation-environments (or other blocks) inside a Latex document?

3 Answers3

Step 1: create the Perl script (extractEq.pl)

~ reading the original Latex-doc ~~~~~~~

~ finding equation environments ~~~~~~~~

~ finding and removing text after \end{equation... ~~

~ assembling output in Latex-format ~~~~~

~ put out ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Step 2: run it from the command shell (DOS, bash, ...)

Step 3: create a new Latex-doc to display the extracted equations (EQ.tex)

Pecularities:

Linked