5

My question is similiar to previous questions (see here or here), although the issue is maybe somewhat more complicated:

I have a list with strings like this:

list = {{"text1, text2, 2003, text3"},{"text1, 1994, text2"},{text1, text2, text3 2014, text4}}

I now want to extract the part of the string which contains the 4-digit Number (or in this case its a year number) AND is separated by the commas so that the outcome is:

{{"2003"},{"1994"},{"text3 2014"}}

I have tried this one:

StringCases[#,", " ~~ w : (___ ~~ Repeated[DigitCharacter, {4}]) ~~ ", " :> w] & /@ list

but this extracts always the part from the first comma in each string

many thanks for you suggestions

M.A.
  • 699
  • 3
  • 9

4 Answers4

5
Map[StringTrim @* Select[StringContainsQ @ Repeated[DigitCharacter, {4}]]] @
 StringSplit[Flatten @ list, ","]
{{"2003"}, {"1994"}, {"text3 2014"}}

You can replace Select[...] with Cases[_?(StringContainsQ@Repeated[DigitCharacter, {4}])] to get the same result.

kglr
  • 394,356
  • 18
  • 477
  • 896
4
StringCases[#, RegularExpression["([^,]+[\\d]{4})"]:> StringTrim["$1"]]&/@list//Catenate

(* {{2003}, {1994}, {text3 2014}} *)

where:

list = {{"text1, text2, 2003, text3"},{"text1, 1994, text2"},{"text1, text2, text3 2014, text4"}}
user1066
  • 17,923
  • 3
  • 31
  • 49
4

The proposal in the original question can be made to work by replacing ___ with Except[","]...:

StringCases[", "~~w:(Except[","]...~~Repeated[DigitCharacter, {4}])~~", " :> w] /@ list

(* {{"2003"}}, {{"1994"}}, {{"text3 2014"}} *)

WReach
  • 68,832
  • 4
  • 164
  • 269
1
list = {{"text1, text2, 2003, text3"}, {"text1, 1994, text2"}, \
{"text1,text2,text3 2014,text4"}}

patt = WordCharacter ... ~~ Whitespace ~~ 
   Repeated[DigitCharacter, {4}];

list // Map[StringTrim@*First@*StringCases[patt]]

{{"2003"}, {"1994"}, {"text3 2014"}}

Syed
  • 52,495
  • 4
  • 30
  • 85