21

Technically speaking, this isn't a LaTeX specific question, but there are so many useful tools in LaTeX, and I'm using it to solve a LaTeX problem, so I thought here would be a good place to ask.

I have a rather large document with lots of math notation. I've used |foo| throughout to indicate the absolute value of foo. I'd like to replace every instance of |foo| with \abs{foo}, so that I can control the notation via an abs macro I define.

Does anybody know of a way to do this? Possibly using regular expressions of some sort? Are there editors with tools built in for this?

cmhughes
  • 100,947
  • 3
    It may be possible with the editor you're already using. What is it? – egreg May 24 '13 at 21:50
  • 1
    I'm in TexMaker right now. – Joey Eremondi May 24 '13 at 21:54
  • 2
    Is Edit -> Replace what you're looking for? It has forward/backward search and is also case sensitive. – Count Zero May 24 '13 at 22:00
  • 3
    It seems that the regex search and replace capabilities of Texmaker are not sufficient. With TeXStudio it seems possible with \|([a-z]*)\| in the "Find" field and \abs{\1} in the Replace field (with Regexp checked). – egreg May 24 '13 at 22:11
  • @CountZero I will be truly impressed if you can do what I'm doing without some form of regex. – Joey Eremondi May 24 '13 at 22:42
  • 1
    @jmite: It depends whether you have spaces around the |s. You could run the search to replace once the opening | and once the closing |. But I gather from your comment that this is not the case... My bad! :) ...and +1 – Count Zero May 24 '13 at 22:55
  • I would have spaces had I been smart when I wrote the document :) – Joey Eremondi May 24 '13 at 22:57
  • Do you have occurrences of | which aren't related to absolute values? If not, it could be an easy job for a generic tool like sed. It's still a regexpful solution, but I doubt you can escape from the regular expressions here (bad pun intended). – T. Verron May 24 '13 at 23:16
  • 1
    While this question, in essence, has little (if not nothing) to do with TeX, I think it may be of great help to the community here. However, having generic search-and-replace techniques that promotes this could be helpful, especially considering my interest in Consistent typography. – Werner May 25 '13 at 06:03
  • That was my thought too. It is strictly speaking a regular expression problem, but I thought it was a common enough use case for LaTeX, and a place where refactoring tools aren't quite as common compared to more conventional programming languages. – Joey Eremondi May 25 '13 at 06:13
  • 1
    Very closely related http://tex.stackexchange.com/q/46063/15925 – Andrew Swann May 28 '13 at 14:31
  • Thanks to all who gave answers. I'll try them out and select one soon. – Joey Eremondi May 28 '13 at 20:06

6 Answers6

13

Option 1: sed

The stream editing tool, sed, would be a natural first choice, but the problem is that sed can't match non-greedy regular expressions.

We need a non-greedy regular expression here- to clarify why, let's consider

sed -r 's/|(.*)|/\\abs{\1}/g' myfile.tex

If we apply this substitution to a file that contains something like

$|a|+|b|\geq|a+b|$

then we'll get

$\abs{a|+|b|\geq|a+b}$

which is clearly not what we want- regular expression matches like this are greedy by default.

To make it non-greedy, we typically use .*?, but the problem is that sed does not support this type of match. Happily (thanks Hendrik) we can use the following instead

sed -r 's/\|([^|]*)\|/\\abs{\1}/g' myfile.tex

Once you're comfortable that it does what you want, you can use

sed -ri.bak 's/\|([^|]*)\|/\\abs{\1}/g' myfile.tex

which will overwrite each file, and make a back up first, myfile.tex.bak

Option 2: perl

We could, instead, use a little perl one-liner:

perl -pe 's/\|(.*?)\|/\\abs{\1}/g' myfile.tex

When you're sure that you trust it is working correctly, you can use the following to overwrite myfile.tex

perl -pi -e 's/\|(.*?)\|/\\abs{\1}/g' myfile.tex

You can replace myfile.tex with, for example, *.tex to operate on all the .tex files in the current working directory.

Details of perl's switches are discussed here (for example): http://perldoc.perl.org/perlrun.html#Command-Switches

cmhughes
  • 100,947
  • 4
    Did you test your sed code? Note that you need \( instead of (! Moreover, note that it's easy to make the sed version non-greedy: just use [^|]* instead of .*. – Hendrik Vogt May 25 '13 at 06:01
  • @HendrikVogt thanks for the feedback. If you use sed -r .... then you gain access to ( and ) without having to escape them- sorry if I didn't make that clear. I have added your solution, thanks! – cmhughes May 25 '13 at 17:19
  • 2
    I would strongly suggest to use sed -ri .bak to have sed automatically create backup files for the in-place edits. Also you might consider editing the last paragraph into the first one regarding the sed-greedy issue. As it stands now, it might be confusing for users who are not used to these kind of tools. – Daniel May 25 '13 at 17:28
  • @cmhughes: Ah, thanks, I didn't know about the -r option! – Hendrik Vogt May 25 '13 at 19:08
  • @cmhughes Truly outstanding post! Please check my answer inspired by you in which I tried to explain why AWK is the wrong tool for the job. – Predrag Punosevac May 27 '13 at 05:19
  • Non-greedy regexps are good to know. However, one can also search for \|[^|]\|, i.e. | followed by non-| and then by |. This might help if you must use, say, grep without --perl-regexp -switch available. – Jori Mäntysalo May 27 '13 at 11:19
  • On an interesting note, -r doesn't seem to work on OSX with sed. Thankfully I have access to a Linux box, trying it now. – Joey Eremondi May 28 '13 at 20:14
  • @jmite if you don't have access to -r then you'll need to escape the ( and ), so use \( and ) – cmhughes May 28 '13 at 20:30
9
\documentclass{article}

\begin{document}

\def\abs#1{|#1|} % Let us assume that it is your definition

$\abs{c+d}$

\catcode`\|=\active
\def|#1|{\abs{#1}}

$|a|+|b|\geq|a+b|$

\end{document}

We assume that |'s are in pairs. Then the first one becomes definition, the second -- end of an argument. It seems, however, that, as for all the other solutions, a manual correction will be necessary.

6

You can also try WinEdt. It has a very well implemented 'search and replace' feature.

In your case, search for (with regex on)

\|\(0*\)\|

and replace with

\\abs\{\0\}

The whole replacement can be done in one step.

You can also choose (with a different regex) to make the replacement ONLY inside math environments.

karlkoeller
  • 124,410
4

You can use Vim to achieve this. Please also have a look here.

In your case, the command would be

%s:\(|\)\(.\{-}\)\(|\):\\abs{\2}:gc

Brief explanation:

%s - says it is the substitute command over the entire file

\( | \) matches the occurrence of | and references it as \1 in the match

\( .\{-} \) matches anything between two instances of | but makes it as small as possible, it references this chunk as \2 which we later use to put it back

\( | \) again matches another occurrence of | and references it as \3 in the match

: signals end of regex

\\abs{\2} is what you want to replace it with, where \2 is the text between the two |

:gc says everywhere in the line, and muse be confirmed.

You must confirm every replacement because this string is not perfect. Reason? If there is an odd number of | in a line then it cannot tell which two instances of | form the text in between that you are taking the absolute value of. However, it will not skip over lines, so you are relatively protected.

ste_kwr
  • 419
  • 1
    In vim I sometimes find it useful to record a macro for such cases. qa to start recording a macro labeled "a"; /|<CR> to go to the next occurrence of "|"; s\abs{<ESC> to replace "|" with "\abs"; /|<CR>r} to go to next "|" and replace with "}"; then "q" to stop recording. Then I just hit 100@a and the macro is performed 100 times or until no more occurrences of "|" can be found. – Kallus May 27 '13 at 16:00
4

This post is inspired by cmhughes proposed solutions. His post is one of the most interesting posts on TeX editing which I have ever read. I just spent 2 hours trying to produce nawk solution which actually doesn't exist (see below).

Option 3: Gawk

This solution is provided by Hendrik Vogt

gawk '{print gensub(/\|([^|]*)\|/, "\\abs{\\1}", "g", $0)}'

Option 4: Python

import sys
import re

file = sys.stdin.readlines()
for line in file:
    newline = re.sub('\|(.*?)\|', '\\\\abs{\\1}', line)
    sys.stdout.write(newline)

Why is AWK (nawk) the wrong tool for the above problem?

AWK doesn't support non-greedy regular expressions which is to be expected since it is sed's cousin but even worse AWK regular expression does not capture its groups. Even if AWK was supporting group capturing we would be in trouble as backreferences cannot be used inside character classes and we use character classes to achieve non-greediness in AWK.

A simple AWK script

NR>0{
gsub(/\|([^|]*)\|/,"\\abs{\1}")
print
}

Applied to file

$|abs|$ so on and so fourth
$$|a|+|b|\geq|a+b|$$
who is affraid of wolf $|abs|$

will unfortunately produce

$\abs{}$ so on and so fourth
$$\abs{}+\abs{}\geq\abs{}$$
who is affraid of wolf $\abs{}$
  • This doesn't seem to answer the question, just showing how not to solve it. If this is really the intend please simply reduce it to a comment to cmhughes's answer. – Martin Scharrer May 27 '13 at 06:10
  • 3
    Try gensub: awk '{print gensub(/\|([^|]*)\|/, "\\abs{\\1}", "g", $0)}' – Hendrik Vogt May 27 '13 at 10:52
  • @Martin Scharrer You are becoming silly with your opposition to my participation on this portal. This is your third or fourth very negative comment on my perfectly reasonable attempt to join the discussion. The title clearly says that original question can not be solved in NAWK and explains the reason why that can not be done. If you see no value just suspend my account. – Predrag Punosevac May 27 '13 at 11:23
  • 2
    @PredragPunosevac This is not a discussion forum. You surely can discuss alternatives, so long as you provide a solution to the problem. Martin's comment aims to keep the site in line with its stated scope. Your non-answer might be a legitimate question on SuperUser.com or Unix.sx – egreg May 27 '13 at 12:53
  • 1
    @Martin, egreg, Predrag: In fact, there is a long history both here and especially on SO of extended commentary being put in answers because it (i) furthers the question, and (ii) doesn't fit in a comment. Maybe it's best if we move this discussion over to Meta? – Charles Stewart May 27 '13 at 12:55
  • @Charles Stewart I am cool with it. – Predrag Punosevac May 27 '13 at 13:14
  • @Hendrik Vogt I already up voted your comment. Could you please just edit and put gawk instead of awk as some users might be confused? As both of us know gawk!= awk. – Predrag Punosevac May 27 '13 at 13:15
  • @CharlesStewart: Meta is only for questions about the main site. Feel free to use the chat for discussions. – Martin Scharrer May 27 '13 at 13:33
  • 1
    @PredragPunosevac: I'm sorry that you took this personally. My comments are not meant negatively. I'm just doing my job as a moderator. Also these comments are also not targeted against specific people, but in fact are created by moderators as reaction on flags added by other users. For example, this post has been flagged as "not an answer" by two users. This site has specific rules and does not work like a forum. Answer posts are for answers(=solutions) to the question. Discussions are not wanted on this site. Please have another look at the FAQ – Martin Scharrer May 27 '13 at 13:40
  • @Martin Scharrer It is all good Martin! I overreacted as I liked this question so much and I felt I had something to contribute. You and me do have some differences of opinion but I have a great respect for your deep technical knowledge and contribution to the TeX community. Thank you Sir!!! – Predrag Punosevac May 27 '13 at 13:51
  • 1
    @Predrag: Thanks for pointing this out. You're right, it has to be gawk, which my machine uses if I ask for awk. Unfortunately I can't edit my comment. – Hendrik Vogt May 27 '13 at 17:11
  • @MartinScharrer: Just to be clear, the question I was thinking of would be, is this kind of "not directly addressing the question answer" worth having? This kind of discussion is, of course, within the scope of meta. – Charles Stewart May 27 '13 at 18:29
  • @CharlesStewart: Ah, ok, of course that would be right for Meta. – Martin Scharrer May 28 '13 at 05:09
  • @MartinScharrer Martin check out now. It is a real answer! – Predrag Punosevac May 28 '13 at 14:24
1

On Mac OSX I use TextWrangler, it is an editor which—among its many features—has a Search/Replace facility able to handle regular expressions. For example I can choose:

enter image description here

The first time you click “Next” and then you can keep clicking “Replace & Find” to go through your document, validating each match before a replacement is made. This will lower the potential of messing up your document because of using an incorrect regular expression or making unintended replacements. And with infinite “Undo” you'll always be safe.

Juan A. Navarro
  • 62,139
  • 32
  • 140
  • 169