4

There are similar questions to this like list search but this question is focused on finding at subset of dates in DateList format within a range of specified dates from the original key list. The actual list of events is large at ~300,000 x 60 table covering 18 months of activities. The dates are indexed but this maybe not the best approach for find the subset items. I have generated a sample set of the same format using

sampledata = Table[DateString[DatePlus[DateList[][[1 ;; 3]],
  RandomInteger[547]], {"Month",",", "Day", ",", "Year"}], {i, 2000}];

I am attempting to use the keys from position index to

dates = PositionIndex[ DateList[{#, {"Month", "Day", "Year"}}][[1 ;; 3]] & /@ sampledata]

but the search method may be too slow to embed in a Manipulate loop with the date limits dynamically selected from the keys to display the other data associated with the dates.

upperlimit = {2015, 1, 30};
lowerlimit = {2015, 1, 15};

datekeys = Flatten[Position[(DateDifference[#, upperlimit ] > Quantity[0, "Days"] 
&& DateDifference[#, lowerlimit] < Quantity[0, "Days"]) & /@ dates, True]]

selecteddates = Sort[Flatten[dateindex[dates[[#]]] & /@ datekeys]]

Gives the index of the original data list

sampledata[[selecteddates]] 

I chose this method to generate the date index outside of the Manipulate call since the other functions are fast. Any suggestions on how to speed up the process?

ex-kiwi
  • 157
  • 5

2 Answers2

1

Try working with DateObjects instead of strings from DateString.

sampleDates = 
  DateObject[(FromDigits /@ StringSplit[#, ","])[[{3, 1, 2}]]] & /@ sampledata;

{upperlimit, lowerlimit} = 
  DateObject/@ {{2017, 1, 30}, {2017, 1, 1}};

With everything as DateObjects the matching does not need to translate from a string into a date for every comparison. Now operators like Between can be used directly.

selecteddates = 
  Flatten[Position[sampleDates, #] & /@ 
    Select[sampleDates, Between[{lowerlimit, upperlimit}]]];

sampledata[[selecteddates]]

If you want to view dates in that particular format then use the DataFormat option of DateObject.

Hope this helps.

Edmund
  • 42,267
  • 3
  • 51
  • 143
  • I tried Parallelize@Select to try and speed it up but it doesn't help as it does not parallel select. – Edmund Dec 02 '15 at 09:25
0

Adapting Edmund's answer to Dataset - given sampleDates:

ds = Dataset[sampleDates][
 Select[Between[DateObject@{{2017, 1, 1}, {2017, 1, 30}}]]]

Then, can easily post process, eg, makeshift calendar:

ds[GroupBy[DateValue[#, "Week"] &] /* KeySort, Sort]

enter image description here

Who knows why Jan 1 is in week 52.

alancalvitti
  • 15,143
  • 3
  • 27
  • 92