41

Update

It was a bug in the documentation of V10.0, this functionality was not implemented yet, and V10.1 changed the documentation, removing it. It's a pity, because it's a very useful operations, common in other languages like R. I miss data.frame like notation in Mathematica.

Mathematica graphics


In the new guide Computation With Structured Datasets we can find this part, on how to change a Dataset

enter image description here

But if we create a Dataset like:

ds=Dataset[{<|"a"->1,"b"->"x"|>,<|"a"->2,"b"->"y"|>,<|"a"->6,"b"->"z"|>}];

And then make:

ds[[1, 1]] = 2

Or, closer to my real case test:

ds[[All, "a"]] = Accumulate@Normal@ds[[All, "a"]]

We get an error:

"Part specification ds[[1,1]] is longer than depth of object"

"Part specification ds[[All,1]] is longer than depth of object. "

Is this a Bug?

Setting is not working on Dataset as stated by documentation.

This post on Wolfram Community

Murta
  • 26,275
  • 6
  • 76
  • 166
  • 2
    Unfortunately not in V10.0.1 yet... – Murta Sep 17 '14 at 01:42
  • 1
    StringReplace[%,"V10.0.1"-> "V10.0.2"] – Murta Dec 11 '14 at 01:35
  • 1
    StringReplace[%%,"V10.0.1"-> "V10.1.0"] – Murta Mar 30 '15 at 21:01
  • 1
    This is no longer documented to work as of 10.1.0. As Tali mentions in his answer below, the inclusion of this comment in the original documentation was erroneous. – Stefan R Jun 08 '15 at 15:07
  • 1
    @StefanR I know about that. But this would be a nice way to handler data, and should be considered in the future. In R, it's a very natural way to do Data Frame manipulations. – Murta Jun 08 '15 at 18:41
  • @Murta I edited bug information to conform with the standard header here: http://meta.mathematica.stackexchange.com/questions/1610/standard-header-for-bugs-tagged-posts-for-easy-searching. I removed [tag:version-10] because that also is the policy adopted by the community here: http://meta.mathematica.stackexchange.com/questions/1361/how-should-we-tag-longstanding-bugs-that-have-been-fixed. I certainly didn't mean to change the intention of your post. Perhaps, though, you could find a way to include the header and adjust the tags? – Michael E2 Aug 08 '15 at 21:41
  • 1
    I have checked on-line documentation and experimented with Dataset L-value assignment at Wolfram cloud. It seems nothing have changed since this post. Taliesin made an interesting comment about different representations of Dataset in logical and physical data model. I wonder if there has been any progress on this ? Have you seen my LinkedIn post on data modeling of AtomicDB using atomic information resource units that are self-referenced and uniquely identified in a 4D space ? – Athanassios Aug 22 '16 at 16:25

5 Answers5

35

I'm the developer of Dataset.

Yes, this is a gross documentation oversight. We planned this functionality but had to push it back to a point release. Somehow no-one caught this piece of legacy documentation.

I've filed a bug on the documentation problem right now, it's easy to fix.

As for when L-value assignment will be available, I'm hoping 10.0.1 or 10.0.2, which are in the next month or two. It gets complicated, because you might well want to write things like:

dataset[ Select[#age > 30&] , "salary"] *= 2

That's certainly a powerful kind of operation, but also hard to implement. Even part-like assignments can get complicated when you are assigning multidimensional datasets to each other.

Thanks for trying the functionality, though!

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
Taliesin Beynon
  • 10,639
  • 44
  • 51
  • 3
    Tks for your clarification. I'll wait for it, it's a very useful operation and I'm happy that I won't need to wait for V11. Using this opportunity, have you saw this post in Wolfram Community about Dataset memory consumption? There are plans to efficient Tabular Data in V10? – Murta Jul 10 '14 at 19:31
  • 2
    @Murta Yes, moving to column-oriented will make things much better. But before I could do that I had to lay the groundwork in the form of a type system that could represent the "logical shape", even if the "physical layout" is different. And of course Leonid is working on making this whole process scale to out-of-core computation against data that lives on disk. – Taliesin Beynon Jul 10 '14 at 20:26
  • @TaliesinBeynon your example is quite funny ! – faysou Nov 17 '14 at 08:46
  • @Taliesin, it seems there is an alternative approach to column-oriented data modeling, but I have seen very few attempts to build a fully operational database system based on this. I am talking about associative data modeling, the way it has been implemented on Qlikview (In-Memory, file based database) and on AtomicDB (full DBMS). I cannot also find any open-source implementation on this, so I decided to build one on top of graph, object-relational databases that support references to persistent objects ! – Athanassios Aug 22 '16 at 16:33
  • 3
    I'm wondering what's the status of the problem. Looks like it's not in V11.1. – xslittlegrass Apr 01 '17 at 16:47
  • 1
    @xslittlegrass which problem? mutable updating of Datasets? i've implemented the kernel functionality that is required for it, but I don't have immediate plans to do it for Dataset. However, see the answer I just posted. – Taliesin Beynon Apr 05 '17 at 15:55
  • @TaliesinBeynon Thanks for the updates, and thanks for listening to us! By the way, do you have any comments on this question on Dataset performance: Dataset is 20X slow when adding column heads – xslittlegrass Apr 05 '17 at 16:04
  • 3
    @xslittlegrass Mr. Wizard's answer is correct. With a lot of optimization work we could make Dataset opportunistically store tables in column-oriented form. Indeed that's always been the plan. But that will take months to implement properly and currently my priorities lie with neural networks. – Taliesin Beynon Apr 05 '17 at 16:59
21

I have implemented the underlying kernel functionality that is needed to make this possible. However it is not yet implemented on the Dataset side. I don't think this will happen in the immediate future owing to other priorities.

Here is a stop-gap that implements a simple version of mutable updating, this of course is not production-grade. I'm happy for anyone who wants to modify this answer to extend its functionality, add error handling, etc.

Unprotect[Dataset];
Language`SetMutationHandler[Dataset, DatasetMutationHandler];

SetAttributes[DatasetMutationHandler, HoldAllComplete];
DatasetMutationHandler[Set[sym_Symbol[[args___]], newvalue_]] := Block[{tmp},
    tmp = Normal[sym];
    tmp[[args]] = If[Dataset`ValidDatasetQ[newvalue], Normal[newvalue], newvalue];
    sym = Dataset[tmp];
];

You can use it as follows:

In[51]:= d = Dataset[{1, 2, 3}];
d[[2 ;; 3]] = 99;
Normal[d]

Out[53]= {1, 99, 99}
Taliesin Beynon
  • 10,639
  • 44
  • 51
  • I see that this function was added in 10.4. Is it already reliable and usable there? (I do not have a use for it at this moment, just asking for the future.) – Szabolcs Apr 05 '17 at 18:35
  • 2
    @Szabolcs yes, it is. it's used in production to implement CloudExpression. you may notice however that Language`HasMutationHandlerQ returns the opposite of the correct answer, but it's not a very important function. – Taliesin Beynon Apr 06 '17 at 00:25
19

In lieu of Set, the Query syntax offers various ways to update selective elements of a dataset. For example, we can change the value of the field a in the first row like this:

ds[{1 -> (<| #, "a" -> 999|> &)}]

dataset screenshot

or like this:

ds[{1 -> Query[{"a" -> (999 &)}]}]

dataset screenshot

Multiple fields can be updated simultaneously:

ds[{1 -> (<| #, "a" -> 999, "b" -> "ZZZ" |> &)}]

dataset screenshot

We can update selective rows, in this case field "b" in rows with even a:

ds[All, If[EvenQ[#a], <| #, "b" -> "!!!!"|>, #] &]

dataset screenshot

The accumulation use case can be accomplished like this:

With[{a = ds[Accumulate, "a"]}
, ds @ MapIndexed[<| #, "a" -> a[[First@#2]] |> &]
]

dataset screenshot

or like this:

Module[{acc = 0}, ds[All, {"a" -> (acc += # &)}]]

dataset screenshot

Note that none of these operations involve destructively altering the dataset, so they should all read ds = ds[...] if desired. Presumably Set will eventually perform destructive updates in those restricted circumstances that Mathematica tolerates mutation.

WReach
  • 68,832
  • 4
  • 164
  • 269
  • 1
    Examples such as these are sure to increase the fun factor for the WRI employees working to compile the Query language into SQL ;) – WReach Jul 27 '14 at 00:11
  • 1
    Nice examples. +1. – Murta Jul 27 '14 at 00:15
  • @WReach, are WRI employees working to compile the Query language into SQL? Will this be brought into DatabaseLink? – ArgentoSapiens Nov 06 '14 at 16:24
  • 1
    @ArgentoSapiens I have no current information about this. My glib comment was based upon the fact that pre-release versions of the V10 documentation contained extensive references to such capability. Those references were withdrawn very late, just before the official V10 release. I speculate that the functionality under discussion in the question would (or did) prove to be challenging to support across multiple back-end technologies. – WReach Nov 06 '14 at 16:55
16

Though I don't know what is the efficiency impact of it, a workaround could be converting the Dataset to Association by Normal, making the update on the Association, then converting it back to Dataset.

ds = Dataset[{<|"a" -> 1, "b" -> "x"|>, <|"a" -> 2, "b" -> "y"|>, <|"a" -> 6, "b" -> "z"|>}]

ds = Module[{temp = Normal[ds]},
            temp[[All, "a"]] = Accumulate[temp[[All, "a"]]];
            temp // Dataset]

Dataset updating

Silvia
  • 27,556
  • 3
  • 84
  • 164
1

It looks like that in Mathematica 13, there is still no easy way to modify values in a Dataset.

kilasuelika
  • 119
  • 4