This is a follow-up question to Mathematica lexer: Symbols and UTF-8 and this it is about numbers. In Mathematica, we have a large number of ways to input numbers. They can contain a notation for the base, precision, accuracy, a scientific form *^12, or a combination of them.
I crafted a simple StringExpression that should catch most cases
number = {DigitCharacter .., "." ~~ DigitCharacter ..,
DigitCharacter .. ~~ "." ~~ DigitCharacter ...};
baseNumber = {HexadecimalCharacter .., "." ~~ HexadecimalCharacter ..,
HexadecimalCharacter .. ~~ "." ~~ HexadecimalCharacter ..};
base = DigitCharacter .. ~~ "^^";
precicion = "`" ~~ RepeatedNull[RepeatedNull["`", 1] ~~ number, 1];
scientific = "*^" ~~ RepeatedNull["+" | "-", 1] ~~ DigitCharacter ..;
final = {number, base ~~ baseNumber} ~~ RepeatedNull[precicion, 1] ~~
RepeatedNull[scientific, 1];
testMe[str_String] := StringMatchQ[str, final]
We can test this with most common forms
testMe /@ {"123", ".123", "123.123", "16^^aa", "16^^.aa",
"16^^.aa``30*^+10", "16^^0.*^3"}
(* {True, True, True, True, True, True, True} *)
Question 1: Do you find valid Mathematica numbers that return False? There are some restrictions:
- A leading minus sign is not allowed as this will be caught as operator later and is of no concern now.
I implemented only up to base 16, so funny examples like
32^^ListLinePlot (* 777888725646235421 *)are unfortunately not allowed.
Question 2: Do you find invalid number forms that return True? Restriction:
- While
2^^abcis an invalid number, there is no way for the lexer to know this because when matching theabc, it has no knowledge about the context that you used only base2.