6

I have millions of strings representing dates in the form: "05-Mar-2004 10:15:00". I would like to convert these to absolute times. For example:

AbsoluteTime[{"05-Mar-2004 10:15:00", {"Day", "-", "MonthNameShort", 
"-", "Year", " ", "Hour24", ":", "Minute", ":", "Second"}}]

However this is pretty slow, How can I speed this up by a factor of 10 or more?

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
mdeceglie
  • 968
  • 4
  • 14

2 Answers2

6

If the date format is sufficiently rigid, you might try string patterns or regular expressions.

AbsoluteTime[{"05-Mar-2004 10:15:00", {"Day", "-", 
    "MonthNameShort", "-", "Year", " ", "Hour24", ":", "Minute", ":", 
    "Second"}}] // RepeatedTiming
(* {0.00043, 3287470500} *)

This is about 10 times faster:

months = <|"Jan" -> 1, "Feb" -> 2, "Mar" -> 3, "Apr" -> 4, "May" -> 5,
   "Jun" -> 6, "Jul" -> 7, "Aug" -> 8, "Sep" -> 9, "Oct" -> 10, 
  "Nov" -> 11, "Dec" -> 12|>

AbsoluteTime@
  First@StringCases["05-Mar-2004 10:15:00", 
    day : DigitCharacter .. ~~ "-" ~~ mon : LetterCharacter .. ~~ 
      "-" ~~ year : DigitCharacter .. ~~ Whitespace ~~ 
      hour : DigitCharacter .. ~~ ":" ~~ min : DigitCharacter .. ~~ 
      ":" ~~ sec : DigitCharacter .. :> {ToExpression@year, 
      Lookup[months, mon], ToExpression@day, ToExpression@hour, 
      ToExpression@min, ToExpression@sec}] // RepeatedTiming
(* {0.000037, 3287470500} *)

If you use RegularExpression instead of a string pattern, it may be even faster. I have not tried it. Note that ToExpression is not secure for converting strings to integers when you have no control over the input.

Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263
1

If the digit and month characters always appear in the exact same place you can use StringTake to extract them and a lookup table for all the conversions:

a = Association@Join[
    StringPadLeft[ToString[#], 2, "0"] -> # & /@ Range[0, 99],
    {"Jan" -> 1, "Feb" -> 2, "Mar" -> 3, "Apr" -> 4, "May" -> 5, "Jun" -> 6,
     "Jul" -> 7, "Aug" -> 8, "Sep" -> 9, "Oct" -> 10, "Nov" -> 11, "Dec" -> 12}];

f[{day_, mon_, y1_, y2_, hour_, min_, sec_}] := 
 AbsoluteTime[{100 y1 + y2, mon, day, hour, min, sec}]

RepeatedTiming@f[Lookup[a, StringTake["05-Mar-2004 10:15:00",
    {{1, 2}, {4, 6}, {8, 9}, {10, 11}, {13, 14}, {16, 17}, {19, 20}}]]]
(* {0.0000103, 3287470500} *)
Simon Woods
  • 84,945
  • 8
  • 175
  • 324