75

Suppose I have a string containing the C-representation of a floating point number; for example

s = "1.23e-5"

and I want to convert this to a Mathematica number. How can I do this?

ToExpression[s] gives Plus[-5, Times[1.23`, e]].

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
Ian Hinder
  • 2,125
  • 1
  • 14
  • 22
  • 2
    The only way I know how to do this is ImportString["1.23e-5", "Table"][[1, 1]] which seems like rather a large hack! – Ian Hinder Feb 14 '12 at 13:13
  • 1
    Amazing that the language doesn't include a simple straightforward function to do this! – a06e Apr 03 '16 at 20:10

11 Answers11

81

I think probably the cleanest way to do this (at least, if you have only a single string, or are faced with a separate string for each number you wish to convert as a result of some other process) is to use the undocumented function Internal`StringToDouble, i.e.:

s = "1.23e-5";
Internal`StringToDouble[s]

which gives:

0.0000123

However, if you are trying to convert many such numbers at once, the standard, documented methods (Import, Read, etc.), are likely to represent better approaches.

UPDATE: As of at least version 12.3 the proper way to invoke this is:

Internal`StringToMReal["1.23e-5"]
John
  • 2,429
  • 3
  • 17
  • 16
Oleksandr R.
  • 23,023
  • 4
  • 87
  • 125
  • 16
    Another one from Mr.Undocumented! – rm -rf Feb 14 '12 at 15:32
  • 10
    Always try to avoid ToExpression! – FJRA Feb 14 '12 at 17:37
  • 7
    This seems like functionality that should be available (officially) in Mathematica. Will Wolfram accept feature-requests? – Ian Hinder Feb 23 '12 at 16:06
  • 1
    @FJRA Why should ToExpression be avoided? – George Wolfe Oct 17 '12 at 16:19
  • 10
    @GeorgeWolfe because it might lead you to a code leak. What if there is dangerous code within the string? Or something innocent like a equal sign (=) may set any of your variables. – FJRA Oct 18 '12 at 20:34
  • 3
    Keep in mind that Internal\StringToDouble[]can produce unexpected results:"-1"is parsed as -1.0 but" -1"(extra leading space) is parsed as 1.0 (sign dropped). Also "1.0000000000000000" is parsed as 1. but if you add one more zero it returns$Failed["Bignum"]` and no message is generated. – Gustavo Delfino Jun 30 '20 at 15:09
22
s = "1.23e-5"

# &[Read[#, Number], Close@#]&[ StringToStream@s ]

Which is not as good as what you started with. Note that it is important to close the stream.


Szabolcs says this is difficult to read. That was surely not my intention. You could also write it verbosely like this:

fromC =
    Module[{output, stream},
      stream = StringToStream[#];
      output = Read[stream, Number];
      Close[stream];
      output
    ] &;

fromC[s]
Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • Making the Close in the second argument of a function that never uses it is a nice trick. I did some benchmarking and found the more readable version to be about 55% slower. An alternative way of writing it as a composition of functions is fromC = StringToStream /* {Read[#, Number] &, Close} /* Through /* First which is just about 10% slower. In any case it doesn't matter because Internal\StringToDouble` is much faster than any of these. – Gustavo Delfino May 22 '20 at 21:41
17

On version 7 Internal`StringToDouble fails on long strings, and fails to recognize exponents:

Internal`StringToDouble["3.1415926535897932385"]

Internal`StringToDouble /@ {"3.14159", "3.14159e-02", "3.14159e+02"}
$Failed["Bignum"]

{3.14159, 3.14159, 3.14159}

This sent me looking for another way to convert numeric strings. Using Trace on ImportString I found another internal function that does what I need: System`Convert`TableDump`ParseTable.

Being an internal function is it not error tolerant and if fed bad arguments it will crash the kernel. The syntax is as follows:

System`Convert`TableDump`ParseTable[
  table,
  {{pre, post}, {neg, pos}, dot},
  False
]
table  :   table of strings, depth = 2; need not be rectangular.  
pre    :   List of literal strings to ignore if preceding the digits (only first match tried).  
post   :   List of literal strings to ignore if following the digits (only first match tried).  
neg    :   literal string to interpret a negative sign (`-`).  
pos    :   literal string to interpret a positive sign (`+`).  
dot    :   literal string to interpret as decimal point.

(Using True in place of False causes a call to System`Convert`TableDump`TryDate that I do not yet understand.)

Example:

System`Convert`TableDump`ParseTable[
  {{"-£1,234.141592653589793e+007"}, {"0.97¢", "140e2kg"}},
  {{{"£"}, {"kg", "¢"}}, {"-", "+"}, "."},
  False
]

{{-1.2341415926535898*^10}, {0.97, 14000.}}

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
16

Another solution would be to use SemanticImportString (new in 10).

Borrowing some code from Mr.Wizard so that I can compare my solution to his:

strings =
  ToString @ Row[RandomChoice /@ {{"-", ""}, {#}, {"e"}, {"-", ""}, Range@12}] & /@ 
    RandomReal[{0, 10}, 15000];

Needs["GeneralUtilities`"]

Internal`StringToDouble /@ strings // AccurateTiming

System`Convert`TableDump`ParseTable[
  {strings}, {{{}, {}}, {"-", "+"}, "."}, False
] // AccurateTiming

Interpreter["Number"][strings]   // AccurateTiming

SemanticImportString[
     StringJoin[Riffle[strings, ";"]],
     {"Number"}, 
     "List",
     Delimiters -> ";"
] // AccurateTiming

0.00671892

0.00504799

12.980645

0.0426966

Now as you can see there is still an order of magnitude, but at least SemanticImport is strict with things that are not numbers, while Internal`StringToDouble["foo"] returns 0..

Some of the types in Interpreter will benefit from using SemanticImport internally when called on lists of strings in the future.

As far as the current speed of Interpreter there is only so much you can gain if you want to support things like

Interpreter[
    Restricted["Number", {0, 10, 0.5}],
    NumberPoint -> "baz",
    NumberSigns -> {"foo", "bar"}
]["bar5baz5"]

5.5

Carlo
  • 1,171
  • 9
  • 12
15

First[ImportString["1.23e-5", "List"]] might be slightly less hack-y than your suggestion in the comments...

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
  • What about a string like "2.12e"? You can see in MMA examples where such strings are generated as CForm/FortranForm ScientificForm[2.12, NumberFormat -> (Row[{#1, "e", #3}] &)] – PlatoManiac Feb 14 '12 at 13:47
  • @Plato is that a standard form? It looks like an error. – Mr.Wizard Feb 14 '12 at 13:53
  • @Plato: That's funny; I don't think I've ever seen a superfluous e being added to numbers between $1$ and $10$, I must say. Neither CForm[] nor FortranForm[] do this, and ScientificForm[] will only do that if you mess with options like you have. – J. M.'s missing motivation Feb 14 '12 at 13:54
  • @J.M. You are right! It does not generate numbers that ends with such a "e" as I wrote. Actually I got misled by the documentation of ScientificForm. You can also check there the NumberFormat example in the Options section of the documentation for ScientificForm. There they show how to produce Fortran-like forms. Test with a number like "2.12" and see the foolish "e" appears. But it is indeed not a general truth about the CFormor FortranForm. – PlatoManiac Feb 14 '12 at 14:07
  • @Plato: Okay, but I think that's a rather contrived example. I don't think I've seen an entity like 2.12e in applications... – J. M.'s missing motivation Feb 14 '12 at 14:19
11

Version 10 introduced Interpreter which would seem suited to this task:

Interpreter[form]
represents an interpreter object that can be applied to a string to try to interpret it as an object of the specified form.

Interpreter["Number"]["1.23e-5"]
0.0000123

Unfortunately it seems that like many new-in-10 functions this is far from optimized. In fact I would say its performance is nothing short of abysmal for this particular task.

Some string data to test with:

strings =
  ToString @ Row[RandomChoice /@ {{"-", ""}, {#}, {"e"}, {"-", ""}, Range@12}] & /@ 
    RandomReal[{0, 10}, 15000];

Timings for Interpreter against StringToDouble and ParseTable (see the other answers):

Needs["GeneralUtilities`"]

Internal`StringToDouble /@ strings // AccurateTiming

System`Convert`TableDump`ParseTable[
  {strings}, {{{}, {}}, {"-", "+"}, "."}, False
] // AccurateTiming

Interpreter["Number"] /@ strings   // AccurateTiming
0.0052075

0.00645107

10.625608

At more than three orders of magnitude slower than the old methods the new function is simply not appropriate for general use. Hopefully it will be improved in a future release.

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • 2
    I really need a useful function that can parse a number or tell me that the input is not a number (and won't execute arbitrary code like ToExpression). I also tried Interpreter and found it to be unusable slow, unfortunately (I can't wait for half a minute for a file to import). I'm reporting the problem now and hoping for a fix ... – Szabolcs Aug 09 '14 at 15:32
  • @Szabolcs I think Read is still your best bet. What is the format or structure of the files you need to import? – Mr.Wizard Aug 09 '14 at 16:42
  • Read is useful when the precise format is known in advance, i.e. you know what type of expect for the next token. Take for example a mixture of strings and number. If reading as a number fails, read as a string. This comment wasn't motivated by the need to read a single file type only. – Szabolcs Aug 09 '14 at 16:58
  • @Szabolcs I understand that; this is still applicable, as is StringReplace, depending on the specifics. I'd welcome a Question from you on the subject. – Mr.Wizard Aug 09 '14 at 17:42
  • @Szabolcs I've added an answer to this question that might serve your purpose. – Carlo Aug 21 '14 at 16:32
7

updated based on comment feedback

One more approach, using LibraryLink. Create a C file called strto.cpp as follows:

#include <cstdlib>
#include "WolframLibrary.h"

EXTERN_C DLLEXPORT int wolfram_strtol(WolframLibraryData libData, mint Argc, MArgument *Args, MArgument Res) {
  char *string;
  mint base;
  mint result;
  string = MArgument_getUTF8String(Args[0]);
  base = MArgument_getInteger(Args[1]);
  result = strtol(string, NULL,base);
  MArgument_setInteger(Res,result);
  return LIBRARY_NO_ERROR;
}

EXTERN_C DLLEXPORT int wolfram_strtod(WolframLibraryData libData, mint Argc, MArgument *Args, MArgument Res) {
  char *string;
  mint base;
  mreal result;
  string = MArgument_getUTF8String(Args[0]);
  result = strtod(string, NULL);
  MArgument_setReal(Res,result);
  return LIBRARY_NO_ERROR;
}

This is a very thin wrapper for the C++ strtol and strtod standard library functions.

Create the library:

Needs["CCompilerDriver`"];
lib = CreateLibrary[{"wolfram_strto.cpp"}, "wolfram_strto"]

Load the two library functions:

strtol = LibraryFunctionLoad[lib, "wolfram_strtol", {"UTF8String", Integer}, Integer];
strtod = LibraryFunctionLoad[lib, "wolfram_strtod", {"UTF8String"}, Real];

Test the basics:

strtol["104", 10]

This should return the integer 104

strtod["10e4"]

This should return the real 100000.

Check some harder cases:

strtod /@ {"3.14159", "3.14159e-02", "3.14159e+02", "1.23e-5", "1E6", "1.734E-003", "2.12e1"}

Try a hex number:

strtol["0x2AF3", 0]

This should return 10995 (e.g. same as 16^^2AF3)

Measure the elapsed time to 15,000 randomly generated reals:

strings = ToString @ Row[ RandomChoice /@ {{"-", ""}, {#}, {"e"}, {"-", ""}, Range@12}] & /@ RandomReal[{0, 10}, 15000]
First@AbsoluteTiming[ strtod /@ strings]

Returns in about 0.017 seconds on my machine.

For big numbers, there is another difference:

Internal`StringToDouble["1e4000"]
strtod["1e4000"]

The StringToDouble function gives $Failed["IEEE Exception"] and the strtod function gives DirectedInfinity[1].

In the case of underflow you get, respectively, $Failed["IEEE Underflow"] and 0.

Also, StringToDouble recognizes WL notation (e.g. 6.022*^23) and strtod does not recognize this format.

Arnoud Buzing
  • 9,801
  • 2
  • 49
  • 58
  • source code here: https://github.com/arnoudbuzing/wolfram-librarylink-examples/tree/master/01-BasicExamples/02-Strings/StringToNumber – Arnoud Buzing Aug 02 '19 at 22:06
  • 1
    My C compiler's strtod is a two-argument function (no base argument). (+1) – Michael E2 Aug 02 '19 at 23:45
  • It indeed has only two arguments in any standard-conforming C compiler, see cppreference. – Ruslan Aug 03 '19 at 08:28
  • It's correct in the github code (I had the same problem, but I did find some version of it which wanted three arguments...) – Arnoud Buzing Aug 03 '19 at 17:34
  • I guess there is more than one variant here: https://linux.die.net/man/3/strtol – Arnoud Buzing Aug 03 '19 at 17:35
  • ok, I've updated the post to use a (hopefully) more compliant #include <cstdlib> on Windows (together with a switch to the C++ compiler, which is more compliant on Windows) – Arnoud Buzing Aug 03 '19 at 18:30
  • 1
    I have another LibraryLink implementation here: https://mathematica.stackexchange.com/a/118402/12 The biggest problem with StringToDouble is that it cannot indicate that the string does not represent a number (not that it's internal). – Szabolcs Aug 04 '19 at 08:38
6

May be one can try the following

convert[inp_?StringQ] := ToExpression@StringReplace[inp, "e" -> "*10^"];
PlatoManiac
  • 14,723
  • 2
  • 42
  • 74
  • Still this is not fully correct! If numbers like 2.12 is represented as "2.12e" than the expected "2.12e1". MMA does so as I mentioned in the above comment on the answer given by @J.M – PlatoManiac Feb 14 '12 at 13:51
  • 8
    It works, but let me give one comment: whenever you use ToExpression on data read from a file, you make it possible to inject code into a program even inadvertently (one can never tell what sort of erroneous input the program might get by mistake). I generally try not to use ToExpression for just reading in data (as opposed to converting code) – Szabolcs Feb 14 '12 at 13:55
  • @Szabolcs thanks for explaining the issue with ToExpression. Your implementation is pretty cool. I did not know about the function StringToStream thanks for introducing... – PlatoManiac Feb 14 '12 at 14:01
  • You can actually replace "*10^" with "*^", which would be the Mathematica's syntax for floating-point exponents. E.g. InputForm[N[5^-9]] will give you 5.12*^-7 as the output. – Ruslan Aug 03 '19 at 08:24
2

Here is a mathematica function which accepts a string and return a number or a string containing an error message.

ConvertScientificNumberStringToNumber[string_String] := Block[
   {regexSciNum, regexNumOnly, regexNumEOnly},
   regexSciNum = "^ *(\\+|-)?(\\d+(\\.\\d+)?|\\.\\d+)((e|E)((\\+|-)?\\d+)?)? *$";
   regexNumOnly = "^ *(\\+|-)?(\\d+(\\.\\d+)?|\\.\\d+) *$";
   regexNumEOnly = "^ *(\\+|-)?(\\d+(\\.\\d+)?|\\.\\d+)(e|E) *$";
   If[! StringMatchQ[string, RegularExpression[regexSciNum]],
     Return["String is not a valid Scientific Format Number"];
   ];
   If[ StringMatchQ[string, RegularExpression[regexNumOnly]],
     Return[ToExpression[string]];
   ];
   If[ StringMatchQ[string, RegularExpression[regexNumEOnly]],
     (* If nothing appears after e|E then We need to strip everything after e|E *)
     Return[ToExpression[StringReplace[string, RegularExpression["(e|E)(.+)?$"] -> ""]]]
   ,
     Return[ ToExpression[StringReplace[string, RegularExpression["(e|E)"] -> "*^"]]]
   ];
   Return["Error we should not reach this point in the function."];
];
Steven Siew
  • 121
  • 1
1

This works for me with large data (1E6 points) in Ver 8.0.1:

test = Import["scope_29_1.csv", "Data"];
test2 = ToExpression[Drop[test, 2]];

"Data" forces mathematica to convert 1.734E-003 into 0.001734 but keeps as string because the first 2 lines contains names. "Drop" Keeps the first non-numerical lines out.

Leo
  • 11
  • 1
0
ToExpression@StringReplace[s, "e" -> "*10^"]
J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
JL AP
  • 1