Please consider the situation where I give you a list where entries are drawn from a fixed set of characters:
alphabet = {0,1,2};
numElements = 10^3;
bigString = StringJoin[Map[ToString, RandomChoice[alphabet, numElements]]];
I provide you two strings: string1 and string2. I'd like to count the number of instances where string2 occurs within a lower-bound and upper-bound "distance" of string1, and by "distance" I mean this in terms of the count for the number of characters in the gap region between string1 and string2 (i.e. the number of characters counting from immediately after the last character in string1 and the immediately before the first character in string2 if string1 occurs before string2, and vice versa if string2 occurs before string1). There may be multiple instances of string1 and string2, so in terms of overcounting, each instance of string2 should only be considered a single possible "hit" (if its within the lower- and upper-bound cutoff distance of string1).
Is there a built in function, or an easy way to do this?
As Leonid Shifrin requests, let's construct a small case example:
string1 = "1111";
string2 = "1221";
lowerboundDistance = 3;
upperboundDistance = 10;
bigString = "000000111100012210111100000122100000001111000000000001221";
Now, in the above string, there are three instances of string2, so at most we can have an output count of 3. From left-to-right, here are all possible instances where string1 and string2 are separated by a gap of at least 3 characters and at most 10 characters:
[1] "11110001221"
[2] "1111000001221"
[3] "122100000001111"
Notice however that instances [2] and [3] correspond to the same instance of string2, so we only increase the count by 1 after seeing both of these instances. The final count is therefore 2.
To clarify a particular point, note that:
string1 = "1111";
string2 = "1221";
lowerboundDistance = 3;
upperboundDistance = 10;
bigString = "11110001221001221";
Should give an output count of 2 considering that "11110001221" and "11110001221001221" (abstracted as "1111.........1221") represent instances of string1 and string2 within the lower- and upperbound gap specifications.
string1andstring2to do with your example code? How doesbigStringcome into play? You never mention it your text description of the problem. – m_goldberg Apr 14 '14 at 17:59string2is special in the sense that if the values ofstring1andstring2are exchanged, the resulting value of the count may be different? – m_goldberg Apr 14 '14 at 18:13string2is special in the sense you suggest. Ultimatelystring2will be the less common of the two substring types. The idea is to get a handle on the number of pairs{string1,string2}that are possible inbigString. – S22 Apr 14 '14 at 18:16string2is the target, the item of interest, whilestring1is a landmark that only serves to define the gap. Is this correct? – m_goldberg Apr 14 '14 at 18:21string1is the landmark forstring2. When we're done, there will be some "count" (hopefully) for the number of instances ofstring2within the gap size of any landmark. – S22 Apr 14 '14 at 18:23