5

Given

ds = Dataset[{"a b", "c-d"} ]

multi-character StringSplit is broken with Dataset (10.1 regression?)

ds[All, StringSplit[#, {" ", "-"}] &]

enter image description here

though single split charaters works:

ds[All, StringSplit[#, " "] &] // Normal    

{{"a", "b"}, {"c-d"}}

As does plain non-Dataset version of multi-char of course (same output as above)

ds // Normal // Map[StringSplit[#, {" ", "-"}] &]
alancalvitti
  • 15,143
  • 3
  • 27
  • 92

2 Answers2

3

This issue is due to the same type-inferencing problem described here.

Using printSignatures from the referenced answer, we can see that the type inferencer will only accept a single string as the second argument, not a list:

printSignatures[StringSplit]
  (*
    {Vector[Atom[String], n_]}
    {Atom[String]}
    {Atom[String], Atom[String]}
    {Vector[Atom[String], n_], Atom[String]}
  *)

This list of valid signatures will only accept a single string as the second argument.

The referenced answer shows how to dodge the type-inferencer. We can use similar work-arounds here: either by using Query directly on the raw data...

ds // Normal // Query[Dataset, StringSplit[#, {" ", "-"}] &]

dataset screenshot

... or by disguising the StringSplit operator:

ds[All, StringSplit&[][#, {" ", "-"}] &]

dataset screenshot

Notice how the second work-around loses useful type information in this case, causing the dataset visualization to fall back to a cruder form. We can restore the missing type information by inserting a terminal Dataset ascending operator into the query:

ds[Dataset, StringSplit&[][#, {" ", "-"}] &]

dataset screenshot

This last operation causes the proper type information to be deduced from the final output data (using TypeSystem`DeduceType), restoring the proper visualization.

WReach
  • 68,832
  • 4
  • 164
  • 269
  • This is fixed in 10.2. P.S. Because StringSplit works on lists of strings, ds[StringSplit[#, " "] &] will work, too. as will just ds[StringSplit] (if you're splitting on whitespace, that is). – Taliesin Beynon Jul 02 '15 at 07:31
2

It seems to be a problem of your specific version. Im running 10.0.1.0 MacOS and it works just fine.

enter image description here

elbOlita
  • 1,649
  • 12
  • 22