<EDIT> after some internet research I found the database domain that deals with this kind of thing: Temporal Databases. I think my question therefore becomes how to reconstruct a temporal database in mathematica, so I've retitled my question as such. Note that TemporalData doesn't do this trick directly, although it might play a role in a possible solution. </EDIT>
<EDIT2> I've come across Indexing temporal data using existing B+ trees and it appears to be a very interesting way of modeling this dataset. I think simple enough that I could probably hack it together in Mathematica :)</EDIT2>
I am trying to reconstruct the history of a bug tracking system, and what I have access to is an event log that describes a snapshot of the bug state each time an attribute changed. But I think you could encounter this style of problem in many other kinds of system or data collected from almost anywhere.
Needs["GeneralUtilities`"]
ds = Import["https://gist.githubusercontent.com/lburton/fb01b7c693c4e294dd37/raw/a74bf793c2e4c184488aa7ee71c39d325de21922/gistfile1.json","JSON"] // ToAssociations // Dataset

This is of course obfuscated data. I wish to ask questions like "count the number of bugs that were open each day" so I can draw a graph of open bugs over time.
Since this is an event log, there are holes in the data. I can't just ask for the open bugs on a particular date, I need to somehow reconstruct the state of all bugs for any given day.
I've solved this one way in Mathematica, but as I worked on the problem I had the uncomfortable feeling that I was probably re-implementing a data structure someone had already labored on to perfection. However, I couldn't find anything in Mathematica that seemed to specifically solve the problem (maybe SparseArray?) I ended up using Dataset and lots of filtering and munging.
My question is therefore: how would you solve it? Here's the approach I've used.
Firstly, group the event data by the unique identifier for the bug. Then, apply two functions to the grouped event data:
eventData = ds[GroupBy[Key["bug_id"]], FindEvents /* EventsToDateRange];
FindEvents searches through the events for each bug id looking for "start" and "stop" events of interest. In my case I have a hard coded interest in when the bug became "open" and when it became "closed". The case of an odd number of results implies the bug has not yet transitioned to a "closed" state and therefore is considered open on today's date.
FindEvents[rows_] := Module[{events},
events = Last@Reap[Fold[MatchEvents, <| |>, SortBy[rows,Key["timestamp"]]]] // Catenate;
If[OddQ[Length[events]],AppendTo[events,<| "timestamp" -> AbsoluteTime[] |>]];
events];
The MatchEvents function that I'm Fold-ing over detects "start" and "stop" events and Sows them (note that I don't actually use the tags)
MatchEvents[previous_,current_] := Module[{foundStart,foundStop},
foundStart =
(current["state"] == "Open")&&
(Length[previous] == 0 || previous["state"] != "Open" );
If[foundStart, Sow[current,start]];
foundStop =
(current["state"] != "Open") &&(previous["state"] == "Open");
If[foundStop, Sow[current,stop]];
current];
Finally, now that we have "start" and "stop" events, I can interpolate a date range across them.
EventsToDateRange[events_] :=
Apply[DateRange] /@ Partition[( Take[Normal@DateList[#timestamp], 3] & /@ events), 2];
Now we have, for each bug_id, a single date entry in a list for each day that bug was marked as "open". All I do then is Flatten, Normal, Tally, and plot 'em:
Flatten[eventData // Values // Normal, 2] // Tally // DateListPlot
This produces the following plot. Note that in my zealousness to generate real-ish fake data and remove proprietary clutter from my code, this graph is slightly wrong. It should drop down to zero. I have not yet investigated what broke.

A generalized solution would be able to be given a date (or date range) return the state of all bugs known to exist on that date (i.e. omitting bugs not yet created). It could then have basic predicates and functions applied to it (state = open, length)
I have some other solutions in mind, but I'd much rather spend time analyzing the data I've been asked to analyze, hence my appeal for ideas here. Pursing the implementation of a generalized solution when your business doesn't need one is a trap for young players I'm trying to avoid, since I'm now anything but young ;)




TemporalData– kglr Sep 30 '14 at 00:03TemporalDatausing the optionMissingDataMethod. – kglr Sep 30 '14 at 17:35Intervaland related operations with time series data. It not only represents interval sets, it also automatically normalizes the representation by sorting and resolving unions/intersections. It's not an immediate answer to your Q but worth looking into. (I asked a related Q here: http://mathematica.stackexchange.com/questions/61019/simplify-2d-polyhedral-regions-like-interval) – alancalvitti Oct 10 '14 at 17:36