Error changing Dataset using Part

Question

Update

It was a bug in the documentation of V10.0, this functionality was not implemented yet, and V10.1 changed the documentation, removing it. It's a pity, because it's a very useful operations, common in other languages like R. I miss data.frame like notation in Mathematica.

Mathematica graphics

In the new guide Computation With Structured Datasets we can find this part, on how to change a Dataset

enter image description here

But if we create a Dataset like:

ds=Dataset[{<|"a"->1,"b"->"x"|>,<|"a"->2,"b"->"y"|>,<|"a"->6,"b"->"z"|>}];

And then make:

ds[[1, 1]] = 2

Or, closer to my real case test:

ds[[All, "a"]] = Accumulate@Normal@ds[[All, "a"]]

We get an error:

"Part specification ds[[1,1]] is longer than depth of object"

"Part specification ds[[All,1]] is longer than depth of object. "

Is this a Bug?

Setting is not working on Dataset as stated by documentation.

This post on Wolfram Community

This is no longer documented to work as of 10.1.0. As Tali mentions in his answer below, the inclusion of this comment in the original documentation was erroneous. — Stefan R, Jun 08 '15 at 15:07
@StefanR I know about that. But this would be a nice way to handler data, and should be considered in the future. In R, it's a very natural way to do Data Frame manipulations. — Murta, Jun 08 '15 at 18:41
@Murta I edited bug information to conform with the standard header here: http://meta.mathematica.stackexchange.com/questions/1610/standard-header-for-bugs-tagged-posts-for-easy-searching. I removed [tag:version-10] because that also is the policy adopted by the community here: http://meta.mathematica.stackexchange.com/questions/1361/how-should-we-tag-longstanding-bugs-that-have-been-fixed. I certainly didn't mean to change the intention of your post. Perhaps, though, you could find a way to include the header and adjust the tags? — Michael E2, Aug 08 '15 at 21:41
I have checked on-line documentation and experimented with Dataset L-value assignment at Wolfram cloud. It seems nothing have changed since this post. Taliesin made an interesting comment about different representations of Dataset in logical and physical data model. I wonder if there has been any progress on this ? Have you seen my LinkedIn post on data modeling of AtomicDB using atomic information resource units that are self-referenced and uniquely identified in a 4D space ? — Athanassios, Aug 22 '16 at 16:25

score 35 · Answer 1 · edited Apr 15 '17 at 15:49

35

I'm the developer of Dataset.

Yes, this is a gross documentation oversight. We planned this functionality but had to push it back to a point release. Somehow no-one caught this piece of legacy documentation.

I've filed a bug on the documentation problem right now, it's easy to fix.

As for when L-value assignment will be available, I'm hoping 10.0.1 or 10.0.2, which are in the next month or two. It gets complicated, because you might well want to write things like:

dataset[ Select[#age > 30&] , "salary"] *= 2

That's certainly a powerful kind of operation, but also hard to implement. Even part-like assignments can get complicated when you are assigning multidimensional datasets to each other.

Thanks for trying the functionality, though!

edited Apr 15 '17 at 15:49

J. M.'s missing motivation

124,525
11
401
574

answered Jul 10 '14 at 18:43

Taliesin Beynon

10,639
44
51

3

Tks for your clarification. I'll wait for it, it's a very useful operation and I'm happy that I won't need to wait for V11. Using this opportunity, have you saw this post in Wolfram Community about Dataset memory consumption? There are plans to efficient Tabular Data in V10? – Murta Jul 10 '14 at 19:31
2

@Murta Yes, moving to column-oriented will make things much better. But before I could do that I had to lay the groundwork in the form of a type system that could represent the "logical shape", even if the "physical layout" is different. And of course Leonid is working on making this whole process scale to out-of-core computation against data that lives on disk. – Taliesin Beynon Jul 10 '14 at 20:26
@TaliesinBeynon your example is quite funny ! – faysou Nov 17 '14 at 08:46
@Taliesin, it seems there is an alternative approach to column-oriented data modeling, but I have seen very few attempts to build a fully operational database system based on this. I am talking about associative data modeling, the way it has been implemented on Qlikview (In-Memory, file based database) and on AtomicDB (full DBMS). I cannot also find any open-source implementation on this, so I decided to build one on top of graph, object-relational databases that support references to persistent objects ! – Athanassios Aug 22 '16 at 16:33
3

I'm wondering what's the status of the problem. Looks like it's not in V11.1. – xslittlegrass Apr 01 '17 at 16:47
1

@xslittlegrass which problem? mutable updating of Datasets? i've implemented the kernel functionality that is required for it, but I don't have immediate plans to do it for Dataset. However, see the answer I just posted. – Taliesin Beynon Apr 05 '17 at 15:55
@TaliesinBeynon Thanks for the updates, and thanks for listening to us! By the way, do you have any comments on this question on Dataset performance: Dataset is 20X slow when adding column heads – xslittlegrass Apr 05 '17 at 16:04
3

@xslittlegrass Mr. Wizard's answer is correct. With a lot of optimization work we could make Dataset opportunistically store tables in column-oriented form. Indeed that's always been the plan. But that will take months to implement properly and currently my priorities lie with neural networks. – Taliesin Beynon Apr 05 '17 at 16:59

score 21 · Answer 2 · answered Apr 05 '17 at 15:58

I have implemented the underlying kernel functionality that is needed to make this possible. However it is not yet implemented on the Dataset side. I don't think this will happen in the immediate future owing to other priorities.

Here is a stop-gap that implements a simple version of mutable updating, this of course is not production-grade. I'm happy for anyone who wants to modify this answer to extend its functionality, add error handling, etc.

Unprotect[Dataset];
Language`SetMutationHandler[Dataset, DatasetMutationHandler];

SetAttributes[DatasetMutationHandler, HoldAllComplete];
DatasetMutationHandler[Set[sym_Symbol[[args___]], newvalue_]] := Block[{tmp},
    tmp = Normal[sym];
    tmp[[args]] = If[Dataset`ValidDatasetQ[newvalue], Normal[newvalue], newvalue];
    sym = Dataset[tmp];
];

You can use it as follows:

In[51]:= d = Dataset[{1, 2, 3}];
d[[2 ;; 3]] = 99;
Normal[d]

Out[53]= {1, 99, 99}

I see that this function was added in 10.4. Is it already reliable and usable there? (I do not have a use for it at this moment, just asking for the future.) — Szabolcs, Apr 05 '17 at 18:35
@Szabolcs yes, it is. it's used in production to implement CloudExpression. you may notice however that Language`HasMutationHandlerQ returns the opposite of the correct answer, but it's not a very important function. — Taliesin Beynon, Apr 06 '17 at 00:25

score 19 · Answer 3 · answered Jul 27 '14 at 00:10

19

In lieu of Set, the Query syntax offers various ways to update selective elements of a dataset. For example, we can change the value of the field a in the first row like this:

ds[{1 -> (<| #, "a" -> 999|> &)}]

dataset screenshot

or like this:

ds[{1 -> Query[{"a" -> (999 &)}]}]

dataset screenshot

Multiple fields can be updated simultaneously:

ds[{1 -> (<| #, "a" -> 999, "b" -> "ZZZ" |> &)}]

dataset screenshot

We can update selective rows, in this case field "b" in rows with even a:

ds[All, If[EvenQ[#a], <| #, "b" -> "!!!!"|>, #] &]

dataset screenshot

The accumulation use case can be accomplished like this:

With[{a = ds[Accumulate, "a"]}
, ds @ MapIndexed[<| #, "a" -> a[[First@#2]] |> &]
]

dataset screenshot

or like this:

Module[{acc = 0}, ds[All, {"a" -> (acc += # &)}]]

dataset screenshot

Note that none of these operations involve destructively altering the dataset, so they should all read ds = ds[...] if desired. Presumably Set will eventually perform destructive updates in those restricted circumstances that Mathematica tolerates mutation.

answered Jul 27 '14 at 00:10

WReach

68,832
4
164
269

1

Examples such as these are sure to increase the fun factor for the WRI employees working to compile the Query language into SQL ;) – WReach Jul 27 '14 at 00:11
1

Nice examples. +1. – Murta Jul 27 '14 at 00:15
@WReach, are WRI employees working to compile the Query language into SQL? Will this be brought into DatabaseLink? – ArgentoSapiens Nov 06 '14 at 16:24
1

@ArgentoSapiens I have no current information about this. My glib comment was based upon the fact that pre-release versions of the V10 documentation contained extensive references to such capability. Those references were withdrawn very late, just before the official V10 release. I speculate that the functionality under discussion in the question would (or did) prove to be challenging to support across multiple back-end technologies. – WReach Nov 06 '14 at 16:55

score 16 · Answer 4 · answered Jul 10 '14 at 10:45

Though I don't know what is the efficiency impact of it, a workaround could be converting the Dataset to Association by Normal, making the update on the Association, then converting it back to Dataset.

ds = Dataset[{<|"a" -> 1, "b" -> "x"|>, <|"a" -> 2, "b" -> "y"|>, <|"a" -> 6, "b" -> "z"|>}]

ds = Module[{temp = Normal[ds]},
            temp[[All, "a"]] = Accumulate[temp[[All, "a"]]];
            temp // Dataset]

Dataset updating

score 1 · Answer 5 · answered Mar 01 '22 at 04:56

1

It looks like that in Mathematica 13, there is still no easy way to modify values in a Dataset.

answered Mar 01 '22 at 04:56

kilasuelika

119
4

1

Could you please add a short illustrative example using v13? Thanks. – Syed Mar 01 '22 at 05:08

Error changing Dataset using Part

5 Answers5

Linked