3

Preface: This will turn out to be a bad case of brain-malfunction. In any case, the precedence is surprising and I haven't checked the FullForm... the one thing we tell every new user. Oh my


This question is for reference. I believe this is a bug. Assume you want to match 3 cases: 123, .123, and (123. or 123.123). The documentation of StringExpression states:

img

Let's try both versions we can use:

n1 = {DigitCharacter .., "." ~~ DigitCharacter .., 
      DigitCharacter .. ~~ "." ~~ DigitCharacter ...};

n2 = DigitCharacter .. | "." ~~ DigitCharacter .. |
     DigitCharacter .. ~~ "." ~~ DigitCharacter ...;

Now you will note that the created regular expressions look differently

First[StringPattern`PatternConvert[#]] & /@ {n1, n2}
{
 "(?ms)(?:\\d+|\\.\\d+|\\d+\\.\\d*)", 
 "(?ms)(?:\\d+|\\.)(?:\\d+|\\d+)\\.\\d*"
} 

and in fact, they have different semantics

StringMatchQ[".1", #] & /@ {n1, n2}
(* {True, False} *)

Any suggestions, why this shouldn't be a bug?

halirutan
  • 112,764
  • 7
  • 263
  • 474

1 Answers1

3

Look at the FullForm of n2:

n2 = DigitCharacter .. | "." ~~ DigitCharacter .. | DigitCharacter .. ~~ "." ~~ DigitCharacter ...;
n2 //FullForm

StringExpression[Alternatives[Repeated[DigitCharacter],"."],Alternatives[Repeated[DigitCharacter],Repeated[DigitCharacter]],".",RepeatedNull[DigitCharacter]]

Note that the head is not Alternatives. The issue is the precedence of | vs ~~. The following version works:

n2 = Alternatives[
    DigitCharacter ..,
    "." ~~ DigitCharacter ..,
    DigitCharacter .. ~~ "." ~~ DigitCharacter ...
];
StringMatchQ[".1", n2]

True

Carl Woll
  • 130,679
  • 6
  • 243
  • 355
  • Nice! Indeed, I looked at the full-form but it's already 5:25 in the morning and I must have missed this obviously. I just assumed that Alternatives is Flat. Oh my.. Thanks. – halirutan Jan 24 '18 at 04:25
  • In any case, this is a bad trap and makes the documentation that says or very questionable. Who uses an even mildly interesting regex without the ~~ operator? – halirutan Jan 24 '18 at 04:28
  • Forget the Flat comment. It's far too late. – halirutan Jan 24 '18 at 04:33