How do you convert a string containing a number in C scientific notation to a Mathematica number?

Question

Suppose I have a string containing the C-representation of a floating point number; for example

s = "1.23e-5"

and I want to convert this to a Mathematica number. How can I do this?

ToExpression[s] gives Plus[-5, Times[1.23`, e]].

The only way I know how to do this is ImportString["1.23e-5", "Table"][[1, 1]] which seems like rather a large hack! — Ian Hinder, Feb 14 '12 at 13:13
Amazing that the language doesn't include a simple straightforward function to do this! — a06e, Apr 03 '16 at 20:10

score 81 · Accepted Answer · edited Nov 08 '22 at 13:44

81

I think probably the cleanest way to do this (at least, if you have only a single string, or are faced with a separate string for each number you wish to convert as a result of some other process) is to use the undocumented function Internal`StringToDouble, i.e.:

s = "1.23e-5";
Internal`StringToDouble[s]

which gives:

0.0000123

However, if you are trying to convert many such numbers at once, the standard, documented methods (Import, Read, etc.), are likely to represent better approaches.

UPDATE: As of at least version 12.3 the proper way to invoke this is:

Internal`StringToMReal["1.23e-5"]

edited Nov 08 '22 at 13:44

John

2,429
3
17
16

answered Feb 14 '12 at 15:11

Oleksandr R.

23,023
4
87
125

16

Another one from Mr.Undocumented! – rm -rf Feb 14 '12 at 15:32
10

Always try to avoid ToExpression! – FJRA Feb 14 '12 at 17:37
7

This seems like functionality that should be available (officially) in Mathematica. Will Wolfram accept feature-requests? – Ian Hinder Feb 23 '12 at 16:06
1

@FJRA Why should ToExpression be avoided? – George Wolfe Oct 17 '12 at 16:19
10

@GeorgeWolfe because it might lead you to a code leak. What if there is dangerous code within the string? Or something innocent like a equal sign (=) may set any of your variables. – FJRA Oct 18 '12 at 20:34
3

Keep in mind that Internal\StringToDouble[]can produce unexpected results:"-1"is parsed as -1.0 but" -1"(extra leading space) is parsed as 1.0 (sign dropped). Also "1.0000000000000000" is parsed as 1. but if you add one more zero it returns$Failed["Bignum"]` and no message is generated. – Gustavo Delfino Jun 30 '20 at 15:09

score 22 · Answer 2 · edited Feb 14 '12 at 14:26

22

s = "1.23e-5"

# &[Read[#, Number], Close@#]&[ StringToStream@s ]

Which is not as good as what you started with. Note that it is important to close the stream.

Szabolcs says this is difficult to read. That was surely not my intention. You could also write it verbosely like this:

fromC =
    Module[{output, stream},
      stream = StringToStream[#];
      output = Read[stream, Number];
      Close[stream];
      output
    ] &;

fromC[s]

edited Feb 14 '12 at 14:26

Szabolcs

234,956
30
623
1,263

answered Feb 14 '12 at 13:18

Mr.Wizard

271,378
34
587
1,371

Making the Close in the second argument of a function that never uses it is a nice trick. I did some benchmarking and found the more readable version to be about 55% slower. An alternative way of writing it as a composition of functions is fromC = StringToStream /* {Read[#, Number] &, Close} /* Through /* First which is just about 10% slower. In any case it doesn't matter because Internal\StringToDouble` is much faster than any of these. – Gustavo Delfino May 22 '20 at 21:41

score 17 · Answer 3 · edited Apr 13 '17 at 12:55

On version 7 Internal`StringToDouble fails on long strings, and fails to recognize exponents:

Internal`StringToDouble["3.1415926535897932385"]

Internal`StringToDouble /@ {"3.14159", "3.14159e-02", "3.14159e+02"}

$Failed["Bignum"]
{3.14159, 3.14159, 3.14159}

This sent me looking for another way to convert numeric strings. Using Trace on ImportString I found another internal function that does what I need: System`Convert`TableDump`ParseTable.

Being an internal function is it not error tolerant and if fed bad arguments it will crash the kernel. The syntax is as follows:

System`Convert`TableDump`ParseTable[
  table,
  {{pre, post}, {neg, pos}, dot},
  False
]

table  :   table of strings, depth = 2; need not be rectangular.  
pre    :   List of literal strings to ignore if preceding the digits (only first match tried).  
post   :   List of literal strings to ignore if following the digits (only first match tried).  
neg    :   literal string to interpret a negative sign (`-`).  
pos    :   literal string to interpret a positive sign (`+`).  
dot    :   literal string to interpret as decimal point.

(Using True in place of False causes a call to System`Convert`TableDump`TryDate that I do not yet understand.)

Example:

System`Convert`TableDump`ParseTable[
  {{"-£1,234.141592653589793e+007"}, {"0.97¢", "140e2kg"}},
  {{{"£"}, {"kg", "¢"}}, {"-", "+"}, "."},
  False
]

{{-1.2341415926535898*^10}, {0.97, 14000.}}

Nice work! I'm sure this function will be useful. Regarding Internal`StringToDouble on 7: exponents are recognised if you first use StringReplace[nums, "e"|"E" -> "*^"]. — Oleksandr R., Aug 21 '12 at 23:54
ParseTable is great. Very fast and handles integers as well as reals. I've used it a bunch of times when I know my input is clean, thanks a bunch! — ssch, Dec 07 '13 at 14:30

score 16 · Answer 4 · answered Aug 21 '14 at 16:15

Another solution would be to use SemanticImportString (new in 10).

Borrowing some code from Mr.Wizard so that I can compare my solution to his:

strings =
  ToString @ Row[RandomChoice /@ {{"-", ""}, {#}, {"e"}, {"-", ""}, Range@12}] & /@ 
    RandomReal[{0, 10}, 15000];

Needs["GeneralUtilities`"]

Internal`StringToDouble /@ strings // AccurateTiming

System`Convert`TableDump`ParseTable[
  {strings}, {{{}, {}}, {"-", "+"}, "."}, False
] // AccurateTiming

Interpreter["Number"][strings]   // AccurateTiming

SemanticImportString[
     StringJoin[Riffle[strings, ";"]],
     {"Number"}, 
     "List",
     Delimiters -> ";"
] // AccurateTiming

0.00671892

0.00504799

12.980645

0.0426966

Now as you can see there is still an order of magnitude, but at least SemanticImport is strict with things that are not numbers, while Internal`StringToDouble["foo"] returns 0..

Some of the types in Interpreter will benefit from using SemanticImport internally when called on lists of strings in the future.

As far as the current speed of Interpreter there is only so much you can gain if you want to support things like

Interpreter[
    Restricted["Number", {0, 10, 0.5}],
    NumberPoint -> "baz",
    NumberSigns -> {"foo", "bar"}
]["bar5baz5"]

5.5

Thank you Carlo, this is much appreciated! – Szabolcs Aug 21 '14 at 17:13 — Szabolcs, Aug 21 '14 at 17:13

score 15 · Answer 5 · answered Feb 14 '12 at 13:19

15

First[ImportString["1.23e-5", "List"]] might be slightly less hack-y than your suggestion in the comments...

answered Feb 14 '12 at 13:19

J. M.'s missing motivation

124,525
11
401
574

What about a string like "2.12e"? You can see in MMA examples where such strings are generated as CForm/FortranForm ScientificForm[2.12, NumberFormat -> (Row[{#1, "e", #3}] &)] – PlatoManiac Feb 14 '12 at 13:47
@Plato is that a standard form? It looks like an error. – Mr.Wizard Feb 14 '12 at 13:53
@Plato: That's funny; I don't think I've ever seen a superfluous e being added to numbers between $1$ and $10$, I must say. Neither CForm[] nor FortranForm[] do this, and ScientificForm[] will only do that if you mess with options like you have. – J. M.'s missing motivation Feb 14 '12 at 13:54
@J.M. You are right! It does not generate numbers that ends with such a "e" as I wrote. Actually I got misled by the documentation of ScientificForm. You can also check there the NumberFormat example in the Options section of the documentation for ScientificForm. There they show how to produce Fortran-like forms. Test with a number like "2.12" and see the foolish "e" appears. But it is indeed not a general truth about the CFormor FortranForm. – PlatoManiac Feb 14 '12 at 14:07
@Plato: Okay, but I think that's a rather contrived example. I don't think I've seen an entity like 2.12e in applications... – J. M.'s missing motivation Feb 14 '12 at 14:19

score 11 · Answer 6 · answered Aug 08 '14 at 02:25

11

Version 10 introduced Interpreter which would seem suited to this task:

Interpreter[form]
represents an interpreter object that can be applied to a string to try to interpret it as an object of the specified form.

Interpreter["Number"]["1.23e-5"]

0.0000123

Unfortunately it seems that like many new-in-10 functions this is far from optimized. In fact I would say its performance is nothing short of abysmal for this particular task.

Some string data to test with:

strings =
  ToString @ Row[RandomChoice /@ {{"-", ""}, {#}, {"e"}, {"-", ""}, Range@12}] & /@ 
    RandomReal[{0, 10}, 15000];

Timings for Interpreter against StringToDouble and ParseTable (see the other answers):

Needs["GeneralUtilities`"]

Internal`StringToDouble /@ strings // AccurateTiming

System`Convert`TableDump`ParseTable[
  {strings}, {{{}, {}}, {"-", "+"}, "."}, False
] // AccurateTiming

Interpreter["Number"] /@ strings   // AccurateTiming

0.0052075
0.00645107
10.625608

At more than three orders of magnitude slower than the old methods the new function is simply not appropriate for general use. Hopefully it will be improved in a future release.

answered Aug 08 '14 at 02:25

Mr.Wizard

271,378
34
587
1,371

2

I really need a useful function that can parse a number or tell me that the input is not a number (and won't execute arbitrary code like ToExpression). I also tried Interpreter and found it to be unusable slow, unfortunately (I can't wait for half a minute for a file to import). I'm reporting the problem now and hoping for a fix ... – Szabolcs Aug 09 '14 at 15:32
@Szabolcs I think Read is still your best bet. What is the format or structure of the files you need to import? – Mr.Wizard Aug 09 '14 at 16:42
Read is useful when the precise format is known in advance, i.e. you know what type of expect for the next token. Take for example a mixture of strings and number. If reading as a number fails, read as a string. This comment wasn't motivated by the need to read a single file type only. – Szabolcs Aug 09 '14 at 16:58
@Szabolcs I understand that; this is still applicable, as is StringReplace, depending on the specifics. I'd welcome a Question from you on the subject. – Mr.Wizard Aug 09 '14 at 17:42
@Szabolcs I've added an answer to this question that might serve your purpose. – Carlo Aug 21 '14 at 16:32

Arnoud Buzing · Answer 7 · 2019-08-03T18:27:47.867

updated based on comment feedback

One more approach, using LibraryLink. Create a C file called strto.cpp as follows:

#include <cstdlib>
#include "WolframLibrary.h"

EXTERN_C DLLEXPORT int wolfram_strtol(WolframLibraryData libData, mint Argc, MArgument *Args, MArgument Res) {
  char *string;
  mint base;
  mint result;
  string = MArgument_getUTF8String(Args[0]);
  base = MArgument_getInteger(Args[1]);
  result = strtol(string, NULL,base);
  MArgument_setInteger(Res,result);
  return LIBRARY_NO_ERROR;
}

EXTERN_C DLLEXPORT int wolfram_strtod(WolframLibraryData libData, mint Argc, MArgument *Args, MArgument Res) {
  char *string;
  mint base;
  mreal result;
  string = MArgument_getUTF8String(Args[0]);
  result = strtod(string, NULL);
  MArgument_setReal(Res,result);
  return LIBRARY_NO_ERROR;
}

This is a very thin wrapper for the C++ strtol and strtod standard library functions.

Create the library:

Needs["CCompilerDriver`"];
lib = CreateLibrary[{"wolfram_strto.cpp"}, "wolfram_strto"]

Load the two library functions:

strtol = LibraryFunctionLoad[lib, "wolfram_strtol", {"UTF8String", Integer}, Integer];
strtod = LibraryFunctionLoad[lib, "wolfram_strtod", {"UTF8String"}, Real];

Test the basics:

strtol["104", 10]

This should return the integer 104

strtod["10e4"]

This should return the real 100000.

Check some harder cases:

strtod /@ {"3.14159", "3.14159e-02", "3.14159e+02", "1.23e-5", "1E6", "1.734E-003", "2.12e1"}

Try a hex number:

strtol["0x2AF3", 0]

This should return 10995 (e.g. same as 16^^2AF3)

Measure the elapsed time to 15,000 randomly generated reals:

strings = ToString @ Row[ RandomChoice /@ {{"-", ""}, {#}, {"e"}, {"-", ""}, Range@12}] & /@ RandomReal[{0, 10}, 15000]
First@AbsoluteTiming[ strtod /@ strings]

Returns in about 0.017 seconds on my machine.

For big numbers, there is another difference:

Internal`StringToDouble["1e4000"]
strtod["1e4000"]

The StringToDouble function gives $Failed["IEEE Exception"] and the strtod function gives DirectedInfinity[1].

In the case of underflow you get, respectively, $Failed["IEEE Underflow"] and 0.

Also, StringToDouble recognizes WL notation (e.g. 6.022*^23) and strtod does not recognize this format.

source code here: https://github.com/arnoudbuzing/wolfram-librarylink-examples/tree/master/01-BasicExamples/02-Strings/StringToNumber — Arnoud Buzing, Aug 02 '19 at 22:06
My C compiler's strtod is a two-argument function (no base argument). (+1) — Michael E2, Aug 02 '19 at 23:45
It indeed has only two arguments in any standard-conforming C compiler, see cppreference. — Ruslan, Aug 03 '19 at 08:28
It's correct in the github code (I had the same problem, but I did find some version of it which wanted three arguments...) — Arnoud Buzing, Aug 03 '19 at 17:34
I guess there is more than one variant here: https://linux.die.net/man/3/strtol — Arnoud Buzing, Aug 03 '19 at 17:35
ok, I've updated the post to use a (hopefully) more compliant #include <cstdlib> on Windows (together with a switch to the C++ compiler, which is more compliant on Windows) — Arnoud Buzing, Aug 03 '19 at 18:30
I have another LibraryLink implementation here: https://mathematica.stackexchange.com/a/118402/12 The biggest problem with StringToDouble is that it cannot indicate that the string does not represent a number (not that it's internal). — Szabolcs, Aug 04 '19 at 08:38

PlatoManiac · Answer 8 · 2012-02-14T13:48:22.277

6

May be one can try the following

convert[inp_?StringQ] := ToExpression@StringReplace[inp, "e" -> "*10^"];

edited Feb 14 '12 at 13:48

answered Feb 14 '12 at 13:21

PlatoManiac

14,723
2
42
74

Still this is not fully correct! If numbers like 2.12 is represented as "2.12e" than the expected "2.12e1". MMA does so as I mentioned in the above comment on the answer given by @J.M – PlatoManiac Feb 14 '12 at 13:51
8

It works, but let me give one comment: whenever you use ToExpression on data read from a file, you make it possible to inject code into a program even inadvertently (one can never tell what sort of erroneous input the program might get by mistake). I generally try not to use ToExpression for just reading in data (as opposed to converting code) – Szabolcs Feb 14 '12 at 13:55
@Szabolcs thanks for explaining the issue with ToExpression. Your implementation is pretty cool. I did not know about the function StringToStream thanks for introducing... – PlatoManiac Feb 14 '12 at 14:01
You can actually replace "*10^" with "*^", which would be the Mathematica's syntax for floating-point exponents. E.g. InputForm[N[5^-9]] will give you 5.12*^-7 as the output. – Ruslan Aug 03 '19 at 08:24

score 2 · Answer 9 · answered Jul 16 '18 at 09:37

Here is a mathematica function which accepts a string and return a number or a string containing an error message.

ConvertScientificNumberStringToNumber[string_String] := Block[
   {regexSciNum, regexNumOnly, regexNumEOnly},
   regexSciNum = "^ *(\\+|-)?(\\d+(\\.\\d+)?|\\.\\d+)((e|E)((\\+|-)?\\d+)?)? *$";
   regexNumOnly = "^ *(\\+|-)?(\\d+(\\.\\d+)?|\\.\\d+) *$";
   regexNumEOnly = "^ *(\\+|-)?(\\d+(\\.\\d+)?|\\.\\d+)(e|E) *$";
   If[! StringMatchQ[string, RegularExpression[regexSciNum]],
     Return["String is not a valid Scientific Format Number"];
   ];
   If[ StringMatchQ[string, RegularExpression[regexNumOnly]],
     Return[ToExpression[string]];
   ];
   If[ StringMatchQ[string, RegularExpression[regexNumEOnly]],
     (* If nothing appears after e|E then We need to strip everything after e|E *)
     Return[ToExpression[StringReplace[string, RegularExpression["(e|E)(.+)?$"] -> ""]]]
   ,
     Return[ ToExpression[StringReplace[string, RegularExpression["(e|E)"] -> "*^"]]]
   ];
   Return["Error we should not reach this point in the function."];
];

score 1 · Answer 10 · answered Nov 04 '14 at 08:28

This works for me with large data (1E6 points) in Ver 8.0.1:

test = Import["scope_29_1.csv", "Data"];
test2 = ToExpression[Drop[test, 2]];

"Data" forces mathematica to convert 1.734E-003 into 0.001734 but keeps as string because the first 2 lines contains names. "Drop" Keeps the first non-numerical lines out.

score 0 · Answer 11 · edited Sep 25 '20 at 00:41

0

ToExpression@StringReplace[s, "e" -> "*10^"]

edited Sep 25 '20 at 00:41

J. M.'s missing motivation

124,525
11
401
574

answered Sep 24 '20 at 22:48

JL AP

1

How do you convert a string containing a number in C scientific notation to a Mathematica number?

11 Answers11

Linked

Related