16

Today when I reviewed our project's source code, I found there are lots of unnecessary "spaces" and "tabs" located at each line's end. So I decided to delete them with regular expression.

However, I found the command sed -i '/\s+$/d' doesn't work. Until I changed the command to see -ri '/\s+$/d', it acted as my expectation. From the manual of sed, it said -r invokes extended regexp.

I'm confused, why there are so many regexp variants? Like vim/emacs/perl/sed regexp. Why can't the regexp offer an unique user interface?

Alex
  • 1,853
hero2008
  • 637

1 Answers1

20

For historical reasons. There's no one definition of "Regular expression" syntax. The concept of a regular expression itself has nothing to do with the actual syntax that formally describes it. People have come up with different ways of saying the same thing, hence different styles of regex syntax.

However, you'll find that there are mostly two groups of definitions around these days:

  1. POSIX regular expressions that specify Basic (BRE) and Extended Regular Expressions (ERE). The confusion begins where for example, Basic Regular Expressions use \( \) to denote a group, and Extended Regular Expressions use ( ) for that.

  2. Perl-based regular expressions. Perl regular expressions define a more consistent syntax, where for example a backslash will always escape a non-alphanumeric character. Perl regex syntax is found in many popular programming languages these days, from Java to Ruby.

You can check out the Wikipedia article on regex syntax for more info.

MvG
  • 1,489
slhck
  • 228,104