22

I'm compiling a LaTeX file to pdf and then I use a pdf to text converter to convert the file to plain text. I then use LanguageTool to do grammar checking on that plain text file. The LaTeX source has some customizations to improve the quality of the plain text file (e.g., I removed page headers, page number, multicolumn environments, and I'm using a huge paper size).

Unfortunately, LanguageTool finds a lot of the grammar errors in the math parts of the file, which clearly is not very useful. So I was wondering: is it possible to just remove all math from the pdf output? Clearly, I could just delete all math from the LaTeX source, but that is not an acceptable way. The math should just not be rendered in the pdf.

yori
  • 5,681
  • 29
  • 59
  • I was just working on a similar solution to @barbarabeeton's below. The problem (if you want to consider grammar checking, which is what makes it a very interesting question, to me) is that the math technically forms part of the sentence structure. How can math (inline or display) be replaced with some generic noun (or potentially other parts of speech, which gets even trickier) for the purposes of grammar checking, but preserve any punctuation found at the end of a displayed math environment? This is quite a challenge in my mind, but maybe it can be done. – Paul Gessler Jul 12 '14 at 18:19
  • @PaulGessler: Yes, I was thinking about this as well, However, I cannot think of a generic string that would make the following two sentences grammatically correct: "The variable $x$ is nonnegative." "It follows that $x=1$." Maybe there is a clever use of a string with a double meaning? – yori Jul 12 '14 at 19:26
  • 3
    (Using Barbara's solution below I replaced all math by the word "spam" and amusingly got the output "Let spam, and define spam, spam, spam, and spam.") – yori Jul 12 '14 at 19:35
  • @yyzz "An $n$-dimensional space" is fine but "An spam-dimensional space" should be "A ...". LyX-GC can handle this, see e.g. the online version http://mccabedj.ucc.asn.au/checktex.html – gmatht Aug 18 '19 at 05:21
  • How about using a software like turnitin dedicated for checking articles? – Cyriac Antony Sep 09 '19 at 10:57

3 Answers3

16

this really should be a comment, but it's a little too complicated.

if your math is either all in-line or displayed using the \[ ... \] notation, suppressing it is quite easy:

\documentclass{article}
\def\[#1\]{}
\def\(#1\){}
\catcode`\$=13
\def$#1${}
\begin{document}
some text $xyz$ with embedded \(abc\) math.
some display math as well:
\[ def \]
more text
\end{document}

things get trickier when you use \begin{xxx} ... \end{xxx} environments, and that doesn't even consider starred environments. i haven't licked that problem yet.

many of the amsmath environments ingest the entire content between the \begin and \end markers, so could be redefined to just ignore that instead of measuring and setting.

this sounds like an interesting project for a package. anyone who wants to can steal the ideas laid out above.

Paul Gessler
  • 29,607
  • 2
  • Thank you! You are right about the \begin{xxx} ... \end{xxx} environments. There is actually a package (comment) that does something similar. I use it to suppress tables and figures. It's not perfect though; I sometimes do get compilation errors that I do not understand. – yori Jul 12 '14 at 19:30
  • Perhaps also consider environ with \RenewEnviron{<env>}{}. – Werner Jul 12 '14 at 20:02
  • @Werner -- somehow i didn't think there was one of those, but there is. but wouldn't it be \renewenvironment{<env>}{}{}? (i couldn't find one with the name in CamelCase.) (i'll come back to this later; right now, i'm on my way to a concert.) – barbara beeton Jul 12 '14 at 20:50
  • @barbarabeeton: That would still print the environment contents (and in the case of align and friends, this would contain math). \RenewEnviron{<env>}{} will gobble the contents of <env> in \BODY, although it's never set, similar to \renewcommand{<cmd>}[1]{}. – Werner Jul 12 '14 at 23:54
  • @PaulGessler: I am just an outsider, but the consensus in the linked meta discussion is that changing grammatical errors (including capitalisation) is acceptable as per the highest voted answer: http://meta.tex.stackexchange.com/a/2103/48077 The fact that egreg picked a different accepted answer means nothing per SE functioning. Your duty to the community and future readers is greater than to Barbara. – David Mulder Jul 13 '14 at 07:28
  • 2
    @DavidMulder We don't follow such things here for the sake of SE network. More in this post, since this pops up every now and then http://meta.tex.stackexchange.com/questions/3802/previous-editing-and-etiquette-discussions Also there is no duty involved on TeX-SX, it is all community driven voluntary work. – percusse Jul 13 '14 at 09:10
  • @DavidMulder - It's worth distinguishing between two types of text that ostensibly contains errors. One type reveals quickly that the writer is struggling with English grammar, syntax, and spelling rules -- and will probably appreciate gently applied corrections. The other type is easily seen as being written by someone who knows the rules perfectly well -- and chooses to break them in well-defined circumstances. I really don't think it's productive to try to "correct" the latter type. Instead, just enjoy the result. :-) – Mico Jul 13 '14 at 12:00
  • @Mico: Please take the discussion to the relevant meta posts here and here. – David Mulder Jul 13 '14 at 12:04
  • @Werner -- where is \RenewEnvironment defined? (i looked in the "obvious" places -- tex live /latex/base, e-tex, places of that sort -- but only found \renewenvironment. obviously, i missed something. ???) – barbara beeton Jul 13 '14 at 15:08
  • @barbarabeeton: It's from environ. – Werner Jul 13 '14 at 15:33
  • thanks, @Werner -- somehow i missed that one. nice! something new to experiment with. – barbara beeton Jul 13 '14 at 18:07
10

Here's a LuaLaTeX-based solution for suppressing math display-style environments, i.e., not letting them produce any output.

It employs the comment package, and it sets up (for now) lua functions to replace all instances of \begin{displaymath}, \begin{equation}, and \begin{align} with \begin{comment} as well as to replace all instances of \end{displaymath}, \end{equation}, and \end{align} with \end{comment}. Starred versions of these environments are also handled, i.e., equation* and align* environments also get replaced with comment environments.

It should be straightforward to augment the code to process additional display math environments such as gather and multline.

By adding the functions to LuaTeX's "process_input_buffer" callback, the replacements are done during a very early stage of processing of the tex file, viz., before TeX gets to do any processing.

The code is admittedly a bit clumsy for now, because it requires two separate functions for each math environment that should be excluded. I suppose this could be remedied by making use of Lua's lpeg library, which provides some pretty fancy pattern matching methods.

Note that the proposed approach has the following two limitations: First, the various \end{...} statements -- \end{displaymath}, \end{align*}, etc -- must be the only items on the line and must start at the beginning of the line; this is a requirement of the comment package. Second, if your document already makes use of the comment package and its eponymous environment, you will run into trouble if the code in the pre-existing comment portions contains math environments.

The MWE below consists of two files: The main "driver" file and a file that should be called "mathcomment.lua"; the latter contains the lua code and is loaded by the driver file with a \directlua{ require(...) } directive. The following two screenshots show the output produced by the driver file if the instruction \directlua{ require( "mathcomment.lua" ) } is (a) included or (b) commented out.

enter image description here


enter image description here


% !TEX TS-program = lualatex
\documentclass{article}
\usepackage{amsmath,comment,luatexbase}
\directlua{require("mathcomment.lua")} % if commented out, display math stuff is not suppressed
\setlength\textwidth{2in} %% just for this example

\begin{document}
\noindent
aaa
\begin{displaymath}
a^2+b^2=c^2a
\end{displaymath}
bbb
\begin{equation}\label{eq:einstein}
E=mc^2
\end{equation} 
ccc
\begin{align}
1+1&=2\\
2+2&=4
\end{align}
ddd
\begin{align*}
0+0&=0\\
a+a&=2a
\end{align*}
ee
\end{document}

-- mathcomment.lua

--displaymath
local function comment_begin_displaymath ( line )
   return string.gsub ( line, 
      "\\begin{displaymath}", "\\begin{comment}" )
end
local function comment_end_displaymath ( line )
   return string.gsub ( line, 
      "\\end{displaymath}",  "\\end{comment}" )
end

--equation, equation*
local function comment_begin_equation ( line )
   return string.gsub ( line, 
      "\\begin{equation%*?}", "\\begin{comment}" )
end
local function comment_end_equation ( line )
   return string.gsub ( line, 
      "\\end{equation%*?}",  "\\end{comment}" )
end

--align, align*
local function comment_begin_align ( line )
   return string.gsub ( line, 
      "\\begin{align%*?}", "\\begin{comment}" )
end
local function comment_end_align ( line )
   return string.gsub ( line, 
      "\\end{align%*?}",  "\\end{comment}" )
end

-- register the functions as callbacks

luatexbase.add_to_callback( "process_input_buffer", 
   comment_begin_displaymath, "comment_begin_displaymath" )
luatexbase.add_to_callback( "process_input_buffer",  
   comment_end_displaymath, "comment_end_displaymath" )

luatexbase.add_to_callback( "process_input_buffer", 
   comment_begin_equation, "comment_begin_equation" )
luatexbase.add_to_callback( "process_input_buffer",  
   comment_end_equation, "comment_end_equation" )

luatexbase.add_to_callback( "process_input_buffer", 
   comment_begin_align, "comment_begin_align" )
luatexbase.add_to_callback( "process_input_buffer",  
   comment_end_align, "comment_end_align" )

Addition by yori: (I think the code below is a useful addition, but I did not want to create a new answer for this, because I just generalized the code above.) The following code is a bit shorter and easier to extend:

-- mathcomment.lua

local function comment_environment_function(env_regex)
   return function(line)
      line = string.gsub(line, "\\begin{" .. env_regex .. "}", "\\begin{comment}")
      line = string.gsub(line, "\\end{" .. env_regex .. "}", "\\end{comment}")
      return line
   end
end

-- register the functions as callbacks

local environments = { "displaymath", "equation%*?", "align%*?" }

for _,env in pairs(environments) do
   luatexbase.add_to_callback( "process_input_buffer", 
      comment_environment_function(env), 
      "comment_" .. env)
end
yori
  • 5,681
  • 29
  • 59
Mico
  • 506,678
  • Nice! I wonder if the first limitation can be dealt with by automatically adding a newline character after the \begin{...} commands. I have no experience with LuaTeX and tried just adding \n to the replacement string, but the line splitting has finished when the hook is run, so that the \n is just inserted in the current line. I don't know how to fix that. – yori Jul 13 '14 at 10:16
  • @Yori - Many thanks for posting the alternative Lua code -- much easier to parse and to extend! Regarding your follow-up question: It looks like what's needed to solve that issue is a way to insert an extra, all-blank line either before \begin{...} or after \end{...}. Of course, having the various \end{...} statements on lines by themselves is probably good coding practice anyway and should thus be implemented anyway, right? – Mico Jul 13 '14 at 11:11
  • I agree that it is good coding practice. However, I sometimes indent an equation with a few spaces. That's a case where this code breaks. Once you're aware of the problem, it's not a big deal to fix that in the original LaTeX code. I just ran a grammar check with your code and barbara's code, and the output is much less noisy. Great work! – yori Jul 13 '14 at 11:56
  • Is there anyway to deal with $ ... $? If two $ signs are not on the same line, this method does not work. – NonalcoholicBeer Feb 24 '15 at 13:00
  • @ablmf - Have you looked at the answer posted by Barbara Beeton? If your inline math groups contain blank lines, you'll need to change \def$#1${} to \long\def$#1${} in her code. – Mico Feb 24 '15 at 14:05
-1

Have you tried using TeXstudio together with LanguageTool? This combination should work fine for you, since TeXstudio has the option to communicate with the LanguageTool build-in webserver. At least under Windows this way works fine for me.

Another option would be to simply copy & paste the text from the PDF into the LanguageTool GUI. By either ways the math gets ignored without changing the LaTeX code itself.

kristjan
  • 615
  • Thanks for the suggestion. I don't use TeXstudio, but may give this a shot. The other option, copy and pasting, is a good option for shorter texts, but I'm working on a 700-page book. The advantages of the copy & paste method is that you'd be able to paste into Microsoft Word, which may have the best grammar checker available (not sure if this is true, I don't have MS Word). – yori Jul 16 '14 at 08:20