6

Bug introduced in 8 or earlier and persisting through 11.2.0


I have to to use r = Import["foo.txt", "Table", "FieldSeparators" -> {","}] since my data is Table which is comma separated and it contains a mixture of strings and numerical data. Here is an example of file, of two lines:

0,"a","0"

1,"c","xyz"

The problem is that

r = Import["foo.txt", "Table", "FieldSeparators" -> {","}]

will convert the field "0" to numeric 0 when imported. This causes me problems since that field can sometimes actually be a string (much longer than shown).

If I use

r = Import["foo.txt", "Table", "FieldSeparators" -> {","}, "Numeric" -> False]

Then all the fields are imported as String.

I looked and can't find an option to tell Import to please please just leave my data "is as" and just import it as it is on the file.

I could import everything as strings, and then convert those fields that I know should really be numerical (in this example, the fields in the first column), back to numerical after importing the data using a post-processing step, as last resource (using ToExpression applied to each field in each record which is meant to be numeric).

But thought to ask if there is an option I overlooked. For reference

 ref/format/Table

Version 11.1

Nasser
  • 143,286
  • 11
  • 154
  • 359

1 Answers1

3

Diagnosis

According to the Documentation, the "Table" format (as well as "CSV" and "TSV")

  • Stores numerical and textual information, formatted as a table.

and for separating textual information from numerical in these data formats we have

  • Import and Export option:
    "TextDelimiters"Automatic string or list of strings used to delimit non-numeric fields

Based on this I consider the following behavior as a bug:

ImportString["0000123,\"0000123\"", "CSV", "TextDelimiters" -> "\""]
{{123, 123}}

Below is what I would expect to get from Import according to the Docs and the common sense:

{{123, "0000123"}}

Tracing the evaluation shows that Import removes the substrings specified via "TextDelimiters" before converting the fields which match (an analog of) NumberString into numbers. The correct implementation would be to remove the delimiters after converting numeric fields into numbers.

The Documentation statement

  • Double-quote characters delimiting text fields are not imported by default.

is in accord with my interpretation: with the option "TextDelimiters" -> "\"" the textual field above should be imported as "0000123" (i.e. without the double-quotes inside of the string).

Current behavior of "TextDelimiters" -> None is correct – it imports all non-NumberString fields "as is":

ImportString["0000123,\"0000123\", Sin[x] ,'another field'", "CSV", 
 "TextDelimiters" -> None]    
{{123, "\"0000123\"", " Sin[x] ", "'another field'"}}
ImportString["0000123,\"0000123\", Sin[x] ,'another field'", "CSV", 
 "TextDelimiters" -> None, "Numeric" -> False]
{{"0000123", "\"0000123\"", " Sin[x] ", "'another field'"}}

But the case of "TextDelimiters" -> " " demonstrates another bug:

ImportString["0000123,\"0000123\", Sin[x] ,'another field'", "CSV", 
 "TextDelimiters" -> " "]
{{"0000123,\"0000123\", Sin[x] ,'another field'"}}

Below is what I would expect to obtain instead:

{{123, "\"0000123\"", "Sin[x]", "'another field'"}} 

A workaround

A workaround is to use "TextDelimiters" -> None with postprocessing:

ImportString["0,\"a\",\"0\"
1,\"c\",\"xyz\"", "CSV", "TextDelimiters" -> None]
{{0, "\"a\"", "\"0\""}, {1, "\"c\"", "\"xyz\""}}
% /. field_String :> 
  StringReplace[field, StartOfString ~~ "\"" ~~ s___ ~~ "\"" ~~ EndOfString :> s]
{{0, "a", "0"}, {1, "c", "xyz"}}

The suggested approach works correctly in the general case:

ImportString["0000123,\"0000123\",Sin[x],'another field'", "CSV", 
  "TextDelimiters" -> None] /. 
 field_String :> 
  StringReplace[field, StartOfString ~~ "\"" ~~ s___ ~~ "\"" ~~ EndOfString :> s]
{{123, "0000123", "Sin[x]", "'another field'"}}

P.S. Another shortcoming of the CSV importer which may be considered as a bug both in "TextDelimiters" and "LineSeparators":

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368
  • I am sorry, but how do I use ImportString to import the data from the file? – Nasser Mar 23 '17 at 03:38
  • Just use Import with the same options. – Alexey Popkov Mar 23 '17 at 03:39
  • What about ​\\​ character in the string? it seems to disappear when you make an expression out of it... – Stitch Mar 23 '17 at 04:09
  • @Stitch, in Alexey's example, the backslash is intended to escape the " character (thus, "\""). A literal backslash in a string would look like "\\". – J. M.'s missing motivation Mar 23 '17 at 04:12
  • @J.M. Do you agree that the described behavior contradicts the Docs? Is it a bug? – Alexey Popkov Mar 23 '17 at 04:14
  • @J.M. I mean exactly the "\", that is literally a single backslash in the file that gets imported as "\", but then after expression, it is eliminated from the string. Try ToExpression["\"ab \\ c\""], the result you get is simply "ab c" – Stitch Mar 23 '17 at 04:16
  • @Alexey, before that... have you seen the result of ImportString["0000123,\"0000123\"", "CSV", "TextDelimiters" -> "\"\""]? – J. M.'s missing motivation Mar 23 '17 at 04:20
  • @J.M. Thanks, now I see your point. We have to specify two delimiters as one string: ImportString["0000123,\"0000123'", "CSV", "TextDelimiters" -> "\"'"]. But it looks like another bug as well as the excessive \" in the imported textual fields. – Alexey Popkov Mar 23 '17 at 04:27
  • @J.M. No, it just becomes equivalent to "TextDelimiters" -> None for this case what also looks as a bug. – Alexey Popkov Mar 23 '17 at 04:29
  • Yes, it looks to be the same. It is unfortunate that the docs do not give a demonstration of how to properly set "TextDelimiters", so it is hard to tell what is and what isn't buggy. – J. M.'s missing motivation Mar 23 '17 at 04:32
  • @J.M. There is the statement ""TextDelimiters"-> "\"" puts double quotes around each text field when exporting to CSV." And also "With "TextDelimiters"->{"<",">"}, Export wraps text fields in angular brackets." – Alexey Popkov Mar 23 '17 at 04:33
  • Right, the behavior for Export[]/ExportString[] seems self-evident; it's the importing that's confusing. I'm now reminded of something messy I needed to deal with years ago... – J. M.'s missing motivation Mar 23 '17 at 04:40
  • @Stitch I updated the answer with better post-processing method which will work correctly in the general case. – Alexey Popkov Mar 23 '17 at 06:42