8

I have a set of files containing my data, and the file-names contain relevant information that I would like to extract. What I have is:

SetDirectory["c:\\data"]
files = FileNames["201406*.dpt"];
Take[files, 3]
{"20140605_SampleName-C-vert_Polarizer0Deg.dpt", 
 "20140605_SampleName-C-vert_Polarizer90Deg-Temp100K.dpt", 
 "20140606_SampleName-C-vert_Polarizer0Deg-Temp10K.dpt"}

I would like to do something like

analysis /@ files

that does something like "%d_%s_Polarizer%dDeg-Temp%dK.dpt", date, samplestr, angle, temp and returns {date, angle, temp} as in

{{{2014, 6, 5, 0, 0, 0.}, 0, Null},
 {{2014, 6, 5, 0, 0, 0.}, 90, 100},
 {{2014, 6, 6, 0, 0, 0.}, 0, 10}, ...
m_goldberg
  • 107,779
  • 16
  • 103
  • 257
rhermans
  • 36,518
  • 4
  • 57
  • 149

3 Answers3

9

I'm posting this variant in the hopes that it will be a little more educational. Otherwise doesn't add anything over Kuba's version.

Generally, parsing can be done using StringCases. You'll need to build up a string expression that describes the pattern of the file name, much the same way you'd write "%d_%s_Polarizer%dDeg-Temp%dK.dpt" when working with scanf. Except here %d is called a NumberString, %s is __, etc. (Note: scanf wouldn't actually work here because it doesn't know that it has to stop reading the %s as soon as it encounters _. scanf doesn't do pattern matching.)

So let's build up the pattern:

StringCases["20140605_SampleName-C-vert_Polarizer90Deg-Temp100K.dpt",
   date : NumberString ~~
   "_" ~~ name__ ~~
   "_Polarizer" ~~ angle : NumberString ~~
   "Deg-Temp" ~~ temp : NumberString ~~ 
   "K.dpt"   :>   {DateList[date], name, FromDigits[angle], FromDigits[temp]}
]

(* ==> {{{2014, 6, 5, 0, 0, 0.}, "SampleName-C-vert", 90, 100}} *)

Since it's just a string of atomic constructs, it should be fairly self explanatory.

This basic pattern doesn't account for file names which do not have the -Temp part. Fortunately the fix is easy: just make that part of the pattern optional, i.e. allow it to be Repeated zero or 1 times.

StringCases["20140605_SampleName-C-vert_Polarizer90Deg.dpt",
   date : NumberString ~~
   "_" ~~ name__ ~~
   "_Polarizer" ~~ 
   angle : NumberString ~~
   "Deg" ~~

Repeated["-Temp" ~~ temp : NumberString ~~ "K", {0, 1}] ~~

".dpt" :> {DateList[date], name, angle, temp}]

(* ==> {{{2014, 6, 5, 0, 0, 0.}, "SampleName-C-vert", "90", ""}} *)

If you are already familiar with regular expressions, or if you do not like the wordiness of Mathematica's pattern language, you can use RegularExpression to implement the same thing:

StringCases["20140605_SampleName-C-vert_Polarizer90Deg-Temp100K.dpt", 
  RegularExpression["([0-9]*)_(.*?)_Polarizer([0-9]*)Deg(-Temp([0-9]*)K)?\\.dpt"] :> 
  {"$1", "$2", "$3", "$5"}
]
rhermans
  • 36,518
  • 4
  • 57
  • 149
Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263
5
list = {"20140605_SampleName-C-vert_Polarizer0Deg.dpt", 
  "20140605_SampleName-C-vert_Polarizer90Deg-Temp100K.dpt", 
  "20140606_SampleName-C-vert_Polarizer0Deg-Temp10K.dpt"}

There are many ways. This one is not bulletproof but I think it should work with your data:

parse[string_] := ToExpression[{DateList@#, ##2}] & @@ Flatten[
 StringCases[string, x : NumberString ~~ # :> x] /. {}->"Null" & /@ {"_", "Deg", "K"}]


parse /@ list
{{{2014, 6, 5, 0, 0, 0.}, 0, Null}, 
 {{2014, 6, 5, 0, 0, 0.}, 90, 100},
 {{2014, 6, 6, 0, 0, 0.}, 0, 10}}
Kuba
  • 136,707
  • 13
  • 279
  • 740
  • I'm still trying to digest and understand how it works, but it does the job nicely. thanks! – rhermans Jun 10 '14 at 11:52
  • @rhermans No problem ;) start with StringCases[string, x : NumberString ~~ # :> x] & /@ {"_", "Deg", "K"} and add the rest. In case of any troubles feel free to ask. – Kuba Jun 10 '14 at 11:54
3
  list = {"20140605_SampleName-C-vert_Polarizer0Deg.dpt", 
  "20140605_SampleName-C-vert_Polarizer90Deg-Temp100K.dpt", 
  "20140606_SampleName-C-vert_Polarizer0Deg-Temp10K.dpt"};

Another way using RegularExpressions:

parse[str_]:=Module[{time,deg,temp},
    time=StringCases[str,RegularExpression["(\\d{4})(\\d{2})(\\d{2})"]:>{"$1","$2","$3",0,0,0}]//First;
    deg =StringCases[str,RegularExpression["(\\d+)Deg"]:> "$1"]/.{}->{Null}//First;
    temp=StringCases[str,RegularExpression["(\\d+)K"]:> "$1"]/.{}->{Null}//First;
    ToExpression/@{time,deg,temp}
]
parse/@list         

{{{2014,6,5,0,0,0},0,Null},{{2014,6,5,0,0,0},90,100},{{2014,6,6,0,0,0},0,10}}

And in a more condensed way:

parse2[str_String]:=
    StringCases[str
    ,RegularExpression["(\\d{4})(\\d{2})(\\d{2}).+?(\\d+)Deg.*?((\\d+)K)?\\.dpt"]:>
         {Join[FromDigits/@{"$1","$2","$3"},{0,0,0}]
      ,FromDigits@"$4"
        ,"$6"/.""-> Null/.n_String:>FromDigits@n}
    ]
parse2/@list
Murta
  • 26,275
  • 6
  • 76
  • 166