3

I am familiar with regular expressions in other languages (e.g., Perl, Python, ViM), but the Mathematica StringExpression is baffling me.

I'd like to match strings that look like this:

1001.200nc
12345.220nc
987654.215nc

The regular expression I'd use in Python/Perl would look like this: [0-9]+\.2[0-9]{2}nc. What is the equivalent in Mathematica StringExpression?

Related: Can I use a RegularExpression wherever a StringExpression is expected?

jlconlin
  • 247
  • 2
  • 7
  • StringExpressions are converted into RegularExpressions. I personaly prefer RegularExpressions form, it much more universal and easy to make your code readable by non Mathematica programmers. – Murta Mar 07 '15 at 07:28
  • http://mathematica.stackexchange.com/questions/25677 – Murta Mar 07 '15 at 07:34

2 Answers2

6

For me (that is very personal indeed), StringExpressions in Mathematica are much more transparant than regular expressions. Here are two StringExpressions for your strings:

p1 = NumberString ~~  ".2" ~~ DigitCharacter ~~ DigitCharacter ~~ "nc";
p2 = NumberString ~~"." ~~ (x : NumberString /; 200 <= ToExpression[x] < 300) ~~ "nc";

teststrings = {"1001.200nc", "12345.220nc", "987654.215nc"};
StringMatchQ[teststrings, p1]
StringMatchQ[teststrings, p2]

(* {True, True, True}
{True, True, True} *)
Fred Simons
  • 10,181
  • 18
  • 49
  • What confuses me about StringExpressions are the multitude of ~. Are they separators? Do they indicate a pattern? I'm not sure what they mean. – jlconlin Mar 05 '15 at 17:15
  • @Jeremy. The double ~~ is just an infix notation for StringExpression, so in between the ~~ we have the arguments of the function StringExpression. That is the technical explanation. More friendly is to read it as 'followed by'. So my first pattern consists of a pattern for a numberstring, followed by the string ".2", followed by a digit, followed by a digit, followed by the string "nc". In the second StringExpression, between the brackets you see a pattern for a numberstring that has to satisfy a condition, just as with normal pattern matching in Mathematica. – Fred Simons Mar 05 '15 at 17:38
  • 1
    @Jeremy As Fred says, a~~b~~c is really StringExpression[a, b, c]. – chuy Mar 05 '15 at 17:39
  • slight variation: p1 = NumberString ~~ ".2" ~~ Repeated[DigitCharacter, 2] ~~ "nc" – Mike Honeychurch Mar 05 '15 at 22:53
3

Replace \. with \\., and wrap your regular expression with quotes "..." and RegularExpression:

regex = RegularExpression["[0-9]+\\.2[0-9]{2}nc"]

strings = {"1001.200nc", "12345.220nc", "987654.215nc", "nomatch"};

StringMatchQ[strings, regex]
(* {True, True, True, False} *)

StringCases[strings, regex]
(* {{"1001.200nc"}, {"12345.220nc"}, {"987654.215nc"}, {}} *)
kglr
  • 394,356
  • 18
  • 477
  • 896