8

Consider the string

str = "ABCaDEFbcABCdefDEF"

I'd like to take out all substrings surrounded by the substrings "ABC" and "DEF"; that is, I want to pull out

(* {a, def} *)

If I could just get the substrings

(* {ABCaDEF, ABCdefDEF} *)

then I could easily remove the unwanted capital letters with StringTrim.

In order to do this, StringCases looks like my best bet, but I'm not sure what kind of pattern to use. Blank isn't right because it only gives me one of the desired substrings:

StringCases[str, "ABC" ~~ _ ~~ "DEF"]
(* {ABCaDEF} *)

On the other hand, BlankSequence gives me too much:

StringCases[str, "ABC" ~~ _ _ ~~ "DEF"] (* space added for emphasis *)
(* ABCaDEFbcABCdefDEF *)

What kind of rule can I use to get what I need?

Added: One thing I've considered is reversing the pattern to look for:

StringCases[str, "DEF" ~~ _ _ ~~ "ABC"]
(* {DEFbcABC} *)

followed by StringSplit

StringSplit[str, %]
(* {ABCa, defDEF} *)

and finally StringTrim to get

StringTrim[StringTrim[%, "ABC"], "DEF"]
(* {a, def} *)

Are there any better ways of going about this?

Edit 2: My attempt does not work for longer strings, however.

strNew = "ABCaDEFbcABCdefDEFghijABCklmnoDEF";

StringCases[strNew, "DEF" ~~ _ _ ~~ "ABC"]
(* {DEFbcABCdefDEFghijABC} *)

StringSplit[strNew, %]
(* {ABCa, klmnoDEF} *)

which doesn't include the substring containing def. Looks like Marco's suggestion to use Shortest is the clear winner.

user170231
  • 1,611
  • 1
  • 11
  • 17

1 Answers1

11
StringCases["ABCaDEFbcABCdefDEF", "ABC" ~~ Shortest[a__] ~~ "DEF" :> a]

(* Out: {"a", "def"} *)
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
MarcoB
  • 67,153
  • 18
  • 91
  • 189
  • I don't think I've ever used Shortest before so I'll need to check the documentation to fully grasp it, but this seems to be exactly what I need. Thanks! – user170231 May 19 '16 at 19:43
  • 1
    @user170231 With regard to using Shortest and Longest in string patterns you should read this lengthy but vital post by WReach: (108399) – Mr.Wizard May 19 '16 at 19:58