1

Say I'd like to pick substrings from some string such that these substrings have the pattern @@@q@@@, where @ is used here as a stand-in for a 'wildcard' character (i.e. any character) and q is just an example of a specific character that can be specified as desired.

The kind of lame solution might be as follows:

substringList = StringTake[testString, # + {-3, 3} & /@ StringPosition[testString, "q"]];

This, of course, runs into trouble when q appears near one of the two ends of testString. What's the correct way to do this?

CA30
  • 151
  • 7
  • Have you seen StringCases? The tutorials at the bottom of the StringCases documentation (specifically, the ones on String Patterns) will also come in useful. – Aky Apr 04 '14 at 17:46
  • @Aky I'm actually looking through that now, however it's not clear how to specify wildcard sequences flanking some character of interest? For example, this pattern "a" ~~ x_ ~~ "c" is workable, but how do we ask for something like my example ~x~~? – CA30 Apr 04 '14 at 17:47
  • @CA30 ~ joins patterns only. p.s. is each wildcard different? – Kuba Apr 04 '14 at 17:50
  • @Kuba Yes, here each 'wildcard' can be different. It just means "I don't care what's here, return it." But it does have to mean that 'some' character is there. We can't run off the ends of the string! – CA30 Apr 04 '14 at 17:51
  • 2
    Try something like StringCases["acaqaddqccxjq", RegularExpression["...q..."], Overlaps -> True], which returns {"acaqadd", "addqccx"} – Aky Apr 04 '14 at 17:54
  • @Aky Hmm, seems to work, what's going on with this expression? – CA30 Apr 04 '14 at 17:56
  • It's just using regular expression (regexps). The . is a single character wildcard (equivalent to your @). If you aren't familiar with regular expressions, you can read up on them. (They're a computer scienc-y tool, not specific to just Mathematica) – Aky Apr 04 '14 at 18:00
  • @Aky Thanks, still pretty cool. What is Overlaps doing? – CA30 Apr 04 '14 at 18:08
  • It's in the StringCases documentation. If you leave out Overlaps -> True (which is the same as saying Overlaps -> False) the second substring is not outputted, because its first three characters ("add") overlap with the last three characters of the first matched substring. – Aky Apr 04 '14 at 18:13
  • @Aky Got it, thanks. – CA30 Apr 04 '14 at 18:14

1 Answers1

5

Please reference: How do I perform string matching and replacements?

Using StringExpression:

string = ExampleData[{"Text", "DeclarationOfIndependence"}];

StringCases[string, _ ~~ _ ~~ _ ~~ "q" ~~ _ ~~ _ ~~ _]
{"d equal", " requir", "d equal", "linquis", "or quar", " acquie"}

Or:

StringCases[string, # ~~ "q" ~~ #] & @ Repeated[_, {3}]
{"d equal", " requir", "d equal", "linquis", "or quar", " acquie"}

Using RegularExpression as already shown by Aky:

StringCases[string, RegularExpression["...q..."]]
{"d equal", " requir", "d equal", "linquis", "or quar", " acquie"}

Or as I just (re)discovered using StringPattern`PatternConvert as noted in the comments below:

StringCases[string, RegularExpression[".{3}q.{3}"]]
{"d equal", " requir", "d equal", "linquis", "or quar", " acquie"}
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • Wasn't aware that Repeated worked with string patterns too.. +1. – Aky Apr 04 '14 at 20:11
  • 1
    @Aky That's the beauty of StringExpression: you can use nearly all of the native Mathematica pattern operators in application to strings. Regular Expressions are often shorter for those well accustomed to them, but since I am not it usually takes me less time to write the StringExpression equivalent. Mathematica actually converts the StringExpression to a Regular Expression before handing off the operation to an optimized string library. You can see what is produced using StringPattern`PatternConvert. For example: (continued) – Mr.Wizard Apr 04 '14 at 20:18
  • 3
    First @ StringPattern`PatternConvert[Repeated[_, {3}] ~~ "q" ~~ Repeated[_, {3}]] yields "(?ms).{3}q.{3}". You can use this to learn how to write RegularExpression patterns. (?ms) has to do with multi-line behavior; .{3} is the concise form of Repeated[_, {3}] in RE language. – Mr.Wizard Apr 04 '14 at 20:19
  • cool beans! I had no idea about all this. – Aky Apr 04 '14 at 20:22
  • @Aky As always, I'm glad I could help. :-) – Mr.Wizard Apr 04 '14 at 20:23