27

I have some whitespace-separated matrix data that I read with Import[..., "Table"]. The data contained mixed strings and numbers (the strings are for row and column names).

I noticed this weird behaviour:

ImportString["123c", "Table"]

(* ==> {{123}} *)

Mathematica ate the letter c!! Why?

It doesn't eat any other letters:

ImportString["123a", "Table"]
(* {{"123a"}} *)

ImportString["123e", "Table"]
(* {{"123e"}} *)

What is the explanation and what is a good workaround?


Update:

It seems that this happens even if the labels is quoted in the file:

ImportString["\"24c\"", "CSV"]

(* ==> {{24}} *)
Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263
  • It's either a bug or it's interpreting 123c as some non-string datatype, I just can't figure out what. – Szabolcs Jul 18 '14 at 16:50
  • 7
    This is because it is SMILES :). Try this StringFormat["123c"] Mathematica graphics – Nasser Jul 18 '14 at 17:01
  • 7
    Also notice what it says in help if the format specification is not given: "attempts to determine the format of the string from its contents." the keyword is attempt. I do not think this is a good way to make functions. This is all fuzzy type programming. What does "Attempt" actually mean? How do I grade this attempt? A grade? B grade? 85% attempt? A function should be clear. It should tell exactly what is the input and what is the output. – Nasser Jul 18 '14 at 17:17
  • 6
    @Nasser The reason why I consider this very bad is that I didn't even notice the incorrect import until I started doing some consistency checks on the data. Only after that did I discover that this lead to incorrect results (I had both labels "24" and "24c", and they needed to be distinguishable). Most people wouldn't suspect that importing a simple CSV or similar file will change their data. It's completely unexpected, and thus difficult to discover, and will easily lead to ruined work. "CurrencyTokens" -> None should be the default. – Szabolcs Jul 18 '14 at 18:40

1 Answers1

32

Mathematica is interpreting c as a currency marker. This is controlled by the "CurrencyTokens" import option for "Table".

The default setting for "CurrencyTokens" is

{{"$ ", "£", "¥", "€"}, {"c ", "¢", "p ", "F "}}

so this also happens with the letters p or F.

The workaround is

ImportString["123c", "Table", "CurrencyTokens" -> None]
(* {{"123c"}} *)

Notice: The same applies to the "CSV", "TSV" and "List" import formats.

Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263