13

I am working with other people LaTeX files and sometimes they look very messed up. I am looking for a way to delete all comments without changing the code. The problem is that I want to preserve all spaces and exclude percent signs \%. I am feeling bit insecure in doing it by myself and I am new to Emacs. Maybe there is something already written? However if not maybe You can help me out by pointing out risky situations that I should include in my elisp code.

P.S. Also any suggestions on how to make other people code more readable would be very appreciated :).

Guido
  • 30,740
Baranas
  • 877
  • See also the Python script handling several subtleties in http://tex.stackexchange.com/questions/83663/utility-to-strip-comments-from-latex-source – jrouquie Jan 18 '16 at 11:01
  • 1
    Never automatically delete comments! Comments are an integral part of good coding! Moreover, in (La)TeX, even if some answers here want you to believe it, it is impossible to do this in a robust way (without breaking certain perfectly valid constructions)! – Paul Gaborit Dec 06 '22 at 10:17

2 Answers2

14

If you do not need to worry about verbatim or verb usage then

(query-replace-regexp "\\(^\\| *[^\\\\]\\)%.*" "" nil nil)

is probably safe (and it does query replace so you get to say yes or no anyway).

Note this removes the entire line if the comment was at the start of the line (as leaving a blank line would make a paragraph). However it does not remove line ends if the comment was not the whole line, this means it has the potential to introduce white space so it is probably safe on documents but not in macro code.

That is

blah blah
%not this line
blah blah

becomes

blah blah
blah blah

But

abc% a comment here
xyz

becomes

abc
xyz

A more correct translation (which could relatively easily be done) would be to

abcxyz

But that would have a tendency to make the entire file one long line, and you'd have to be careful about leaving a space after command names.


LaTeX is not a regular language so if you parse it with regular expression then it will mess up some constructs. That's why they are called regular expressions. The alternative is to do a full latex parser but that is hard. Consider something like xii.tex there are some examples of that on this site. Trying to locate the comments in there would be tricky (there are none, but emacs doesn't know that).

It would be relatively easy to remove a preceding linebreak when removing the percent if that's what you want. for verb and verbatim I'd probably do a pre-pass changing % to [[[PERCENTWASHERE]] and then change it back again after you have removed the %

Perhaps not unlike this (defines an interactive command you can run with M-x xxx once the definition has been evaluated) It handles 5 in verbatim and \verb| ... % | if you use other characters as \verb delimiters it would need modifying a bit.

(defun xxx ()
(interactive)

(goto-char (point-min)) (while (re-search-forward "\\begin{verbatim}" nil 1) (progn (replace-regexp "%" "@@@@@PERCENT@@@@@" nil (point) (save-excursion (progn (re-search-forward "\\end{verbatim}" nil 1) (point))))))

(goto-char (point-min)) (while (re-search-forward "\\verb|" nil 1) (progn (replace-regexp "%" "@@@@@PERCENT@@@@@" nil (point) (save-excursion (progn (re-search-forward "|" nil 1) (point))))))

(goto-char (point-min)) (query-replace-regexp "%.*\(\n\|$\)" "" nil nil)

(goto-char (point-min)) (replace-regexp "@@@@@PERCENT@@@@@" "%" nil nil) )

David Carlisle
  • 757,742
  • Thanks for your reply. However I need also to worry about verbatim environments. How can I exclude some environments from cleaning them up? And is it possible in Auctex to wrap up whole text after deletion of comments in line ends? – Baranas Nov 16 '12 at 13:24
  • too long for comment: I'll edit the answer – David Carlisle Nov 16 '12 at 13:35
  • (query-replace-regexp "\\(^\\|[ \t][^\\\\]\\)%.*" "" nil nil) followed by (query-replace-regexp "[^\\\\]%.*" "%" nil nil) would be a way to leave the line-end hiding %'s in place. – Andrew Swann Mar 16 '17 at 07:25
  • In my emacs, this kills "}%" at the end of a line. – Tokkot Dec 08 '20 at 15:51
  • @Tokkot the request was to remove comments, % at the end of a line is a comment – David Carlisle Dec 08 '20 at 16:01
  • @DavidCarlisle agreed. However, when I try this in emacs, it removes "}%" at the end of a line. That is, \textbf{hi}%comment becomes \textbf{hi – Tokkot Dec 09 '20 at 17:18
  • @Tokkot oh that would be wrong then:-) I'll look later this answer is rather old I can't remember what it was doing at all, can't look now. – David Carlisle Dec 09 '20 at 17:31
  • I have tried running xxx but it did not removed any comments, should I make changes on it? – alper Dec 30 '21 at 01:42
  • @alper try now. – David Carlisle Dec 30 '21 at 02:01
  • @Tokkot sorry, late response.... – David Carlisle Dec 30 '21 at 02:02
  • When I run xxx it just highlights commented sections. What should I do remove them all, I press enter but it did not help :-( – alper Dec 30 '21 at 18:42
  • (-1) This is a very bad answer which suggests that we can remove comments from a (La)TeX document without risk!!! ;-) – Paul Gaborit Dec 06 '22 at 10:19
  • @PaulGaborit it explicitly says "it will mess up some constructs" so I don't know why you say it suggests there is no risk. that said you can always remove comments. the risk is that you might misrecognise a verbatim construct as a comment and so remove the wrong thing – David Carlisle Dec 06 '22 at 12:07
1

There are some use cases where the regex query

(query-replace-regexp "\\(^\\| *[^\\\\]\\)%.*" "" nil nil)

proposed by David Carlisle does not work for me.

Note: Below I write "slash" to mean "backward slash".

First issue

The regex matches a non-slash followed by a percent, hence:

123%foo
456

becomes:

12
456

Second issue

The % introducing a comment might be escaped, so it is not a comment, but the escape might be escaped as well, and so it is a comment. In short, we need to make sure that the comment is preceded by zero or an even number of slashes. It may seem convoluted, but consider this:

\\% <- Don't forget the newline 

This comment is skipped by the regex, since the percent is considered escaped, but it is not. In LaTeX, long sequences of slashes are everything but rare, so we need to manage parities.

Third issue

Comments eat the trailing newline. This feature is often used on purpose, for example in macros, where you want a line break for readability, but you don't want it to be used when substituting the macro. As it follows, for full-line comments it is better to remove the comment with trailing (or leading) newline, that is

123
%foo
456

should be

123
456

For inline comments, it is better to move the comment text, while leaving the new line, that is

123%foo

should be

123%

To overcome these issues, I suggest the following macro.

(defun no-coms ()
  (interactive)
  (while (search-forward-regexp "\\(\n?\\)\\(.*?\\)\\(\\\\*\\)\\(%.*\\)" nil t)
    (when (cl-evenp (length (match-string 3))) ; bslahes should be even
      ;; Are we at bol?
      (if (and (string-empty-p (match-string 2)) (string-empty-p (match-string 3)))
      (replace-match "" nil nil nil 0)    ; if so remove whole match
    (replace-match "%" nil nil nil 4))))) ; else just remove comment text

Evaluate the macro, perhaps adding it to your init file, and after putting the cursor where you want the macro to start working, type
ALT+X no-coms

Note, I haven't tested this macro thoroughly.
Here are some explanations for the curious.

Removing ELisp escapes, the regex simplifies to:

(\n?)(.*?)(\\*)(%.*)

So it matches as subgroups: an optional newline, arbitrary characters non-greedily, zero or more slashes, a percent followed by arbitrary characters.

(while (search-forward-regexp ... keeps searching forward until the regexp finds a match.

(when (cl-evenp (length ... extracts the third subgroup match (the slashes) and proceeds only when the number of returned matches is even.

(if (string-empty-p (match-string 2))... checks if there are characters between the optional newline and the percent, that is if the second and third subgroup are both empty, in which case the percent is at the beginning of the line, and we have a full-line comment.

(replace-match "" nil nil nil 0) is the if affirmative case. We have a full-line comment, and we replace the entire match (the subgroup zero), which would include the initial newline.

(replace-match "%" nil nil nil 4) is the alternative case. Here the fourth match, the % followed by comment text, is replaced with the percent only.

antonio
  • 1,446