35

Suppose I want to construct an association of associations, such as a list of people with attributes:

peopleFacts=<| alice-> <|age->29,shoeSize->7|>, bob-> <|age->27,sex->male|> |>

However, I want to grow and update this organically by adding facts as I learn them.

peopleFacts[["steve","hairColor"]] = "red";
peopleFacts[["bob","age"]] = "22";
peopleFacts[["steve","major"]] = "physics";

It's possible to accomplish this awkwardly by either (a) filling the database with blank entries or (b) laboriously checking at each level of association to see if an entry is blank before filling it in (except the last level, where AssociateTo helps you). But I think there must be a more elegant way. Here is what I've tried.

This method breaks because it tosses out the second key:

 In[]:= peopleFacts[["steve","hairColor"]] = "red";
        peopleFacts

Out[]:= <|steve -> red, alice-> <|age->29,shoeSize->7|>, bob-> <|age->27,sex->male|> |>

This method drops existing data:

 In[]:= peopleFacts

Out[]:= <| alice-> <|age->29,shoeSize->7|>, bob-> <|age->27,sex->male|> |>

 In[]:= AssociateTo[peopleFacts, alice-> <|"sport"->"baseball"|>;
        peopleFacts

Out[]:= <| alice-> <|sport->baseball|>, bob-> <|age->27,sex->male|> |>

This method just doesn't evaluate:

 In[]:= AssociateTo[peopleFacts[["chris"]], "favoriteFood" -> "sushi"]

Out[]:= AssociateTo[peopleFacts[["chris"]], "favoriteFood" -> "sushi"]

EDIT: Here is a way-too-awkward method adapted from this answer by SuTron.

 In[]:= peopleFacts

Out[]:= <| alice-> <|age->29,shoeSize->7|>, bob-> <|age->27,sex->male|> |>

 In[]:= Module[{temp = peopleFacts["alice"]},
          AssociateTo[temp, "sport"->"baseball"];
          AssociateTo[peopleFacts, "alice" -> temp];
        ];
        peopleFacts

Out[]:= <| alice-> <|age->29,shoeSize->7,sport->baseball|>, bob-> <|age->27,sex->male|> |>

It's not hard to imagine defining a custom update function like

  NestedAssociateTo[peopleFacts,{"steve","haircolor","red"}]

that would handle this all for you, but I'd much rather have a nice native Mathematica solution that is optimized, and that I don't have to maintain or worry about.

Jess Riedel
  • 1,526
  • 10
  • 25

4 Answers4

33

Initial data:

peopleFacts = <|
    alice -> <|age -> 29, shoeSize -> 7|>, 
    bob -> <|age -> 27, sex -> male,  hair -> <|Color -> RGBColor[1, 0, 0]|>
    |>
|>;

Here is a version of RecurAssocMerge reduced to a single definition.

MergeNested = If[MatchQ[#, {__Association}], Merge[#, #0], Last[#]] &

MergeNested @ {peopleFacts, <|bob -> <|hair -> <|length -> 120|>|>|>}
 <|
   alice -> <|
     age -> 29, 
     shoeSize -> 7|>, 
   bob -> <|
     age -> 27, 
     sex -> male,  
     hair -> <|Color -> RGBColor[1, 0, 0], length -> 120|>
   |>
 |>

Special case of 2-level deep association

Merge[{
   peopleFacts,
   <|bob -> <|hairColor -> 1|>|>
 },
 Association
]

"Tidy" approach to write NestedMerge:

RecurAssocMerge[a : {__Association}] := Merge[a, RecurAssocMerge];
RecurAssocMerge[a_] := Last[a];
  • adding key to deep level association:

    RecurAssocMerge[
      {peopleFacts, <|bob -> <|hair -> <|length -> 120|>|>|>}
     ]
    
     <|alice -> <|age -> 29, shoeSize -> 7|>, 
       bob -> <|age -> 27, sex -> male, hair -> <|
            Color -> RGBColor[1, 0, 0], length -> 120 |>
       |>
     |>
    
  • entirely new tree

    RecurAssocMerge[
       {peopleFacts, <|kuba -> <|hair -> <|length -> 120|>|>|>}
     ]
    
     <|
        alice -> <|age -> 29, shoeSize -> 7|>, 
        bob -> <|age -> 27, sex -> male, hair -> <|Color -> RGBColor[1, 0, 0]|>
        |>, 
        kuba -> <|hair -> <|length -> 120|>|>
    |>
    

Section added by Jess Riedel:

Specialize to single new entry

RecurAssocMerge defined above is a general method for merging nested Associations. We can define an abbreviation for the special case when we are adding only a single new entry.

RecurAssocMerge[ini_Association, path_List, value_] := RecurAssocMerge[{
   ini, Fold[<|#2 -> #|> &, value, Reverse@path]
}]

Then we can just do

RecurAssocMerge[peopleFacts, {bob, hair, length}, 120]
 <|alice -> <|age -> 29, shoeSize -> 7|>, 
   bob -> <|age -> 27, sex -> male, hair -> <|
            Color -> RGBColor[1, 0, 0], length -> 120 |>
     |>
   |>

Notes

If you want to modify peopleFacts the peopleFacts = Merge... is needed of course.

Kuba
  • 136,707
  • 13
  • 279
  • 740
  • 1
    how to modify your code if it is not known per se on what level of Association to Merge? (I mean, if bob is nested deeper?) – garej Dec 20 '15 at 11:11
  • Merge is a good solution. Would be good if it had HoldFirst as the structure could be come big. – Edmund Dec 20 '15 at 12:00
  • @Kuba, can you please modify your answer to show the general case where the Associations are nested to deeper levels? I don't understand your comment to @garej about using GroupBy. – Jess Riedel Dec 20 '15 at 15:20
  • Suppose initially peopleFacts = <|alice -> <|age -> 29|>, bob -> <|music -> <|rock -> rollingStones|>|>|> and I want to add <|bob -> <|music -> <|rap -> kanye|>|>|> to get peopleFacts = <|alice -> <|age -> 29|>, bob -> <|music -> <|rock -> rollingStones, rap -> kanye|>|>|>. This is subtle, because the decision to append rather than replace at the level "music" depends on the fact that the Value associated with Key "music" is of type Association. – Jess Riedel Dec 20 '15 at 16:04
  • @Kuba Very elegant. Your solution is extremely general since, as you note in your heading, this can merge two arbitrarily nested Associations. My initial question about adding individual elements is just a special case. I find this so useful I suggest you pose a new Mathematica.SE question "How to recursively merge nested Associations?" and answer it yourself. If you don't, I'll do for you and cite you. Many thanks! – Jess Riedel Dec 20 '15 at 16:28
  • @JessRiedel, what if bob is not in the upper level? is it interesting for you? Say, sample3 = <|people -> <|bob -> <|age -> <|male -> 23, female -> 20|>, sex -> male|>|>|> and to add f[{sample3, <|bob -> <|age -> 10|>|>}] – garej Dec 20 '15 at 16:31
  • @garej In general that's an ambiguous situation, since bob could appear at several places in the tree, and at different levels. One would have to construct a function that recursively searches the entire tree, whereas my initial request was when the correct location is fully specified (so that the evaluation is efficient even when the tree is very large). It would be an interesting problem, though. – Jess Riedel Dec 20 '15 at 16:35
  • @JessRiedel, I see. Anyway, You've posted very nice question +1. – garej Dec 20 '15 at 17:34
  • @Kuba I have suggested an edit that tidies up your answer to be more useful to future readers. (I don't think keeping track of what's an edit is import now that the problem is solved; people can always check the revision history.) Please accept it if you think it's appropriate. – Jess Riedel Dec 20 '15 at 18:09
  • 1
    @JessRiedel Accepted after corrections. There were some undefined functions in the last section. But I've changed it anyway to something more compact. Also, there were string keys in the last example while everything is done on symbols, so I dropped them to be consistent. I'd call the function MergeNestedAssociation but that is not important. And finally, thanks for the edit! :) – Kuba Dec 20 '15 at 19:14
  • @JessRiedel Please take a look at the edit on top of the answer, you may find it entertaining ;) – Kuba Dec 21 '15 at 10:34
  • Very nice extensions. – Mr.Wizard Dec 22 '15 at 18:31
  • @Mr.Wizard Thanks :) Have a nice holidays, I hope you are well, I noticed you were relatively inactive lately. Best :) – Kuba Dec 24 '15 at 11:41
  • @Kuba Thank you! Merry Christmas. :-) I am well, but as noted taking a break from posting. – Mr.Wizard Dec 24 '15 at 18:11
  • 1
    v.useful, succinct solution of what would seem to be a common idiom (perhaps justifying a system implementation to accommodate the OP's initial tendency: peopleFacts[["bob","hair","Length"]] = "red") Related to 2249 and implements the advocated use of Part to "Creatively Set" values in List`s/Association`s (or with operator form peopleFacts @= CreativeSet[{"bob", "hair", "Length"}, "red"] where CreativeSet[pos_,value_] := Function[a,RecurAssocMerge[a, pos,value]]) – Ronald Monson Jan 20 '16 at 07:52
16

Update

Created an upsert function to update/insert new keys and values into a nested association structure. It automatically inserts nested associations where they do not exists and does not need to be assigned back to the original association. It updates existing keys when they are found.

ClearAll[upsert]
Attributes[upsert] = {HoldFirst};
upsert[dat_?AssociationQ, key_, value__] :=
 If[First@Dimensions@{value} == 1,
  dat[key] = value,
  (
   If[KeyExistsQ[dat, key] == False, dat[key] = <||>];
   upsert[dat[key], First@{value}, Sequence @@ Rest@{value}]
  )
  ]

Can use upsert with as many nested levels as needed.

peopleFacts = <|"alice" -> <|"age" -> 29, "shoeSize" -> 7|>, 
   "bob" -> <|"age" -> 27, "sex" -> "male"|>|>;

Insert "steve" and association "haircolor" key/value.

upsert[peopleFacts, "steve", "haircolor", "Red"];
peopleFacts

(* <|"alice" -> <|"age" -> 29, "shoeSize" -> 7|>, 
 "bob" -> <|"age" -> 27, "sex" -> "male"|>, 
 "steve" -> <|"haircolor" -> "Red"|>|> *)

Insert "tim", association "music" key/value, and nested association "rock" key/value.

upsert[peopleFacts, "tim", "music", "rock", "jimmy"];
peopleFacts

(* <|"alice" -> <|"age" -> 29, "shoeSize" -> 7|>, 
 "bob" -> <|"age" -> 27, "sex" -> "male"|>, 
 "steve" -> <|"haircolor" -> "Red"|>, 
 "tim" -> <|"music" -> <|"rock" -> "jimmy"|>|>|> *)

Update "alice" "age".

upsert[peopleFacts, "alice", "age", 25];
peopleFacts

(* <|"alice" -> <|"age" -> 25, "shoeSize" -> 7|>, 
 "bob" -> <|"age" -> 27, "sex" -> "male"|>, 
 "steve" -> <|"haircolor" -> "Red"|>, 
 "tim" -> <|"music" -> <|"rock" -> "lenny"|>|>|> *)

Original Post

Each time there is a new key that has an association as its value you must initialise it as an association. Then you can use the feature of Association that creates a key when a value is assigned to a non-existing key.

peopleFacts = <|"alice" -> <|"age" -> 29, "shoeSize" -> 7|>, "bob" -> <|"age" -> 27, "sex" -> "male"|>|>;

peopleFacts["steve"] = <||>;
peopleFacts
(* <|alice -> <|age -> 29, shoeSize -> 7|>, 
 bob -> <|age -> 27, sex -> male|>, steve -> <||>|> *)

peopleFacts["steve"]["hairColor"] = "Red";
peopleFacts
(* <|alice -> <|age -> 29, shoeSize -> 7|>, 
 bob -> <|age -> 27, sex -> male|>, steve -> <|hairColor -> Red|>|> *)

peopleFacts["bob"]["age"] = 22;
peopleFacts
(* <|alice -> <|age -> 29, shoeSize -> 7|>, 
 bob -> <|age -> 22, sex -> male|>, steve -> <|hairColor -> Red|>|> *)

peopleFacts["steve"]["major"] = "Physics";
peopleFacts
(* <|alice -> <|age -> 29, shoeSize -> 7|>, 
 bob -> <|age -> 22, sex -> male|>, 
 steve -> <|hairColor -> "Red", major -> "Physics"|>|> *)

Hope this helps.

Edmund
  • 42,267
  • 3
  • 51
  • 143
  • I'm afraid this doesn't help much. Using your method, I have to check whether I'm writing over existing data, right? If I already have a bunch of facts about steve, then initializing peopleFacts[steve] = <||> erases them all. – Jess Riedel Dec 20 '15 at 02:29
  • When you say "Each time there is a new key that has an association as its value you must initialise it as an association", are you just unaware of a more elegant method, or do you have some reason to be sure none exists? – Jess Riedel Dec 20 '15 at 02:31
  • @JessRiedel You can use KeyExistsQ to check. – Edmund Dec 20 '15 at 02:36
  • 2
    I know, but that becomes very cumbersome if you're adding elements to a database that's several layers nested. – Jess Riedel Dec 20 '15 at 02:40
  • @JessRiedel I've updated and added a small upsert function that handles the nesting and key creation. – Edmund Dec 20 '15 at 14:09
  • Thanks! Could you explain more what HoldFirst accomplishes in this situation? – Jess Riedel Dec 20 '15 at 14:12
  • 1
    HoldFirst passes the first parameter by reference. This prevents a copy of it begin created (would eat up memory for large items). It also allows the association passed in to be directly updated (no need to assign back to the original association). – Edmund Dec 20 '15 at 14:15
  • @Edmund, can you try the list sample3 = <|people -> <| bob -> <|age -> <|male -> 23, female -> 20|>, sex -> male|>|>|> with the following command upsert[sample3, "bob", "age", "10"]. – garej Dec 20 '15 at 16:25
  • bob is not equal to "bob". One is a string and the other is a symbol. Also, your path states that you want a new key at the top level of "bob". – Edmund Dec 20 '15 at 16:30
  • @Edmund, that was just an artefact of copy-paste; {bob, age, 10} is the case. Nevermind =) – garej Dec 20 '15 at 18:18
  • I noticed that you used dat_?AssociationQ instead of dat_Association. From my experiments with a List first argument this seems to be necessary. Can someone explain why? – John McGee Dec 23 '15 at 14:08
  • @JohnMcGee It is because dat is being passed by HoldFirst so it is a pointer to a position in memory not a copy of the actual object. Therefore the memory pointer in dat must be inspected to discover what it is pointing to. This is what AssociationQ does. It takes the pointer and inspects the Head of what it points to. – Edmund Dec 23 '15 at 14:35
  • @Edmund - Thanks! – John McGee Dec 23 '15 at 15:34
3

Try this, we're using it on a daily basis

Nest[Merge,f,n]

To your starting data, slightly modified (strings vs symbols):

peopleFacts = <|"alice" -> <|"age" -> 29, "shoeSize" -> 7|>, 
   "bob" -> <|"age" -> 27, "sex" -> "male"|>|>;

And new facts:

newFacts = <|
  "steve" -> <|"hairColor" -> "red", "major" -> "physics"|>,
  "bob" -> <|"age" -> 22|>|>

Update semantics for replacing existing values: (1) add newFacts last, (2) apply Last eg:

{peopleFacts, newFacts} // Nest[Merge, Last, 2]

<|"alice" -> <|"age" -> 29, "shoeSize" -> 7|>, "bob" -> <|"age" -> 22, "sex" -> "male"|>, "steve" -> <|"hairColor" -> "red", "major" -> "physics"|>|>

Keep in mind Merge does not yet have full complement of options like JoinAcross eg inner/outer/left/right and will not impute missing keys, so often KeyIntersection is required.

Also will admit for some types of ragged hierarchies, there's no current easy way to merge all relevant branches automatically (an All level spec). In other words, knowledge of the schema is required.

alancalvitti
  • 15,143
  • 3
  • 27
  • 92
  • @Kuba, chat is frozen - just added a recursive approach based on your answer. Let me know if you find it useful. This is the 3rd example of query recursion that I'd like to present @ WTC. – alancalvitti Aug 31 '17 at 21:26
  • Can't focus too much on this topic atm. Does it do anything more than my MergeNested? Or is there any reason to prefer one over another? Performance? Flexibility? Because readability goes for MergeNested I guess. – Kuba Sep 01 '17 at 11:56
  • I think the way you defined MergeNested it only works on 2 levels - if so the comparison should be made to the general recursive version. – alancalvitti Sep 01 '17 at 16:27
  • I'm not sure I understand. The example is already 3 level deep: <|bob -> <|hair -> <|length -> 120|>|>|>. Could you elaborate? – Kuba Sep 01 '17 at 18:12
  • Would MergeNested work on arbitrary-depth Associations? If so, what's the point of RecurAssocMerge – alancalvitti Sep 01 '17 at 21:03
  • It should and both should work the same, the former is one line pure function and the latter is a set of downvalues in a more readable form. The last section was added by Jess so it may be confusing. – Kuba Sep 01 '17 at 21:31
  • Ok I see the recursion set up by #0. I tested MergeNested on a deeply nested Association and it works - but beware if it's not wrapped up in a Query it fails when composed inside a Dataset, eg ds[All,Values/*MergeNested] – alancalvitti Sep 01 '17 at 22:58
2

This recursive Query is inspired by Kuba's answer:

nestedMerge[f_] := 
  Query[Merge[Identity] /* 
    Query[All, {MatchQ[{__Association}], Identity} /* 
      Replace[{{True, data_} :> 
         nestedMerge[f][data], {False, data_} :> Query[f][data]} ]]];

Test on peopleFacts - note appended data is rewritten as nested Association vs OP:

{peopleFacts, <|steve -> <|hairColor -> red|>|>, <|
    bob -> <|age -> 22|>|>, <|steve -> <|major -> physics|>|>} // 
  nestedMerge[Last] // Dataset

enter image description here

alancalvitti
  • 15,143
  • 3
  • 27
  • 92