8

Say I have two datasets: one is a detailed list of events, and one is a coarse list of events. I want to add a column to the detailed list from the coarse list (for the purposes of sorting).

For example: I have datasetA, a list many events:

datasetA = {<|"Id" -> 997, "Event" -> "Flight to Houston"|>,
<|"Id" -> 1003, "Event" -> "Woke Up"|>,
<|"Id" -> 1007, "Event" -> "Ate Breakfast"|>,
<|"Id" -> 1009, "Event" -> "Filed Taxes"|>,
<|"Id" -> 1015, "Event" -> "Flight to Chicago"|>,
<|"Id" -> 1020, "Event" -> "Ate Dinner"|>,
<|"Id" -> 1032, "Event" -> "Flight to Maryland"|>} // Dataset

and datasetB, a list of flights (which also appear in datasetA), and a location:

datasetB = {<|"Id" -> 997, "Event" -> "Flight to Houston", 
"Location" -> "Houston"|>,
<|"Id" -> 1015, "Event" -> "Flight to Chicago", 
"Location" -> "Chicaco"|>,
<|"Id" -> 1032, "Event" -> "Flight to Maryland", 
"Location" -> "Maryland"|>} // Dataset

I want a "Location" column in datasetA from the "Location" column in datasetB using the "Id" column (note that the Flight events have the same Ids in both sets).

If the "Id" of event N in datasetA is greater than or equal to the "Id" of flight event M but less than flight event M+1, I want the value of "Location" to be "Location" from datasetB for flight event M. I want to obtain the modified version of datasetA:

datasetC = {<|"Id" -> 997, "Event" -> "Flight to Houston", 
"Location" -> "Houston"|>,
<|"Id" -> 1003, "Event" -> "Woke Up", "Location" -> "Houston"|>,
<|"Id" -> 1007, "Event" -> "Ate Breakfast", 
"Location" -> "Houston"|>,
<|"Id" -> 1009, "Event" -> "Filed Taxes", "Location" -> "Houston"|>,
<|"Id" -> 1015, "Event" -> "Flight to Chicago", 
"Location" -> "Chicaco"|>,
<|"Id" -> 1020, "Event" -> "Ate Dinner", "Location" -> "Chicaco"|>,
<|"Id" -> 1032, "Event" -> "Flight to Maryland", 
"Location" -> "Maryland"|>} // Dataset

My apologies for the overlap with existing queries. I think this thread looks helpful; I am still learning mapping functions, especially with datasets.

KBL
  • 643
  • 3
  • 10
  • You can start by using JoinAcross to get the locations from set B into set A at the corresponding event IDs (you'll need a left join rather than an inner join). From there you'll then need to pad out the empty spaces. – Sjoerd Smit Jul 31 '19 at 18:40
  • I'm aware the JoinAcross option, but for the filtering and selections I want to do with the resultant dataset, the Location column needs to be filled. – KBL Jul 31 '19 at 18:50

2 Answers2

6

Here's a function that finds the last location according to datasetB:

g[assoc_] := datasetB[Select[#Id <= assoc["Id"]&] /* Last, "Location"]

Using this function, you can augment datasetA to include location information with:

datasetA[All, Association[#, "Location" -> g[#]]&]

enter image description here

Carl Woll
  • 130,679
  • 6
  • 243
  • 355
  • Thanks! This is exactly what I was looking for. Out of curiosity, how easy would it be to adapt this to look at the next destination (or two destinations ago)? I am working on my intuition for these sorts of problems. If the answer is "not easy", that is also useful. – KBL Jul 31 '19 at 20:28
4

Here's another method I hinted at in my earlier comment, but didn't have time for to post yet. It consists of 2 steps: first I use JoinAcross to get the known locations across and then I use FoldList to update the missing locations:

Dataset @ FoldList[
  If[ MissingQ[#2["Location"]],
      Append[#2, "Location" -> #Location],
      #2
  ]&,
  SortBy[#Id &] @ JoinAcross[Normal[datasetA], Normal[datasetB], Key["Id"], "Left"]
]
Sjoerd Smit
  • 23,370
  • 46
  • 75