21

Is there a way that NotebookFind can be used to match string pattern expressions rather than just strings?

The documentation for NotebookFind states that only a string, box expression or complete cell can be used as the search term so my question is really whether or not pattern matching can be achieved through writing some additional code that wraps or replaces NotebookFind.

One obvious strategy would be to convert the notebook to a text representation using NotebookGet and then perform the pattern matching search on the text representation, but this is not ideal for my intended application because I would like any match that is found to be highlighted (by selecting it) much like NotebookFind already does.

Eventually I would like to build a replacement for Mathematica's built-in Search and Replace functionality. Two key enhancements that I hope to provide are:

  1. the ability to search and replace across all open notebooks in the front-end or all notebooks in a selected directory (which is not too difficult to accomplish) and

  2. the ability to search and replace using string pattern expressions.

I realize that Workbench already offers these features. My goal is to enable users who prefer the notebook interface (rather than the .m editor promoted by Workbench) to continue developing complex multi-notebook packages from within the front-end.

Edit:

Celtschk proposes a strategy below in the comments that may provide a partial solution. One of the issues that is still not clear however is how to deal with surrounding context in a pattern match when returning to NotebookFind.

Perhaps the following example will help clarify the potential problem. Without digressing into the theory of formal grammars, let's say that we want our string pattern language to be powerful enough to express not just wildcard patterns but also surrounding context. Imagine in particular that we want to find each occurrence of the string pattern "foo?" in some notebook that is enclosed by a pair of parentheses (not necessarily immediately surrounding the "foo?" pattern). We can do that easily using standard Mathematica string pattern expressions by operating on the string representation of the notebook.

Let's now assume that there is one occurrence of "foo1" and two occurrences of "foo2" in the notebook, the latter of which is not surrounded at any distance by a pair of parentheses. How would we then exclude the second "foo2" from being found when we return to NotebookFind to search for "foo1" and "foo2"?

Of course we could have matched the entire string plus surrounding context (which in this case would include the surrounding pair of parentheses) when searching the string representation for parentheses-enclosed instances of "foo?" -- but this is not really what we want, and in certain instances could be quite inconvenient in a tool designed to assist the user in refactoring a large body of Mathematica code.

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368
StackExchanger
  • 1,511
  • 13
  • 20
  • 1
    Well, one possible trick might be to first do the search on the text representation as you suggested, then determine the exact string matched by the search and finally use NotebookFind with that string. – celtschk May 12 '12 at 12:18
  • 1
    You might consider my strategy of using the workbench together with the front-end editing which I described in http://mathematica.stackexchange.com/q/3941/137 (3rd answer). you would then do the string-pattern search on the autogenerated .m file in WB and then go to the corresponding place in the .nb file – magma May 13 '12 at 10:18
  • @celtschk: Your suggestion to map between the formatted notebook and the plain text representations is the initial approach I plan to pursue. One issue that will have to be addressed is how best to treat certain parts of a search term containing pattern variables as "literal" for purposes of pattern matching. Any thoughts here on the right way to introduce a quoting mechanism? – StackExchanger May 14 '12 at 02:47
  • @magma: Indeed I read your post before posing this question and initially hoped that what you propose would be sufficient to cover my needs. My code however often requires a fair amount of refactoring over a rather large code base so unless "string-pattern searching the autogenerated .m file and going to the corresponding place in the .nb file" can be fully automated from within the front-end, this strategy will not be all that convenient for my purposes. Also the .m files may not contain all of the code in the corresponding .nb files (usually just the contents of the package proper). – StackExchanger May 14 '12 at 03:05
  • I'm not sure if you didn't correctly understand my comment, or I didn't correctly understand your answer. Anyways, what I meant was the following procedure: (1) Use the string expression to search for the pattern in the text representation. (2) Extract the text which is matched by this search (note: This text does not contain any patterns). (3) Now search that (non-pattern) text in the original, non-stringified notebook using NotebookFind. Note that patterns only occur in the first step (textual search on plain text representation). – celtschk May 14 '12 at 08:53
  • @celtschk: I'm starting to question whether or not this strategy can work. Let's say the string pattern finds a match in the string representation of the notebook and call the string matched StringA. The same StringA may appear elsewhere in the notebook in a context that does not produce a match against our search pattern. Running NotebookFind with StringA will find both occurrences, which defeats the purpose of using the pattern in the first place. – StackExchanger May 18 '12 at 08:11
  • @magma: I noticed in the list of new features for Workbench 2.0 that "Search notebook documents with the Mathematica pattern search utility" is now possible. This capability would make your strategy more attractive. Have you used this feature and if so can you briefly describe how to access it? Thanks. – StackExchanger May 19 '12 at 02:41
  • @StackExchanger no, I never used this feature. As I explained I basically use MMA for notebook editing and thus also searching – magma May 19 '12 at 13:14
  • @StackExchanger, I still don't understand the issue with celtschk's approach. What would be an example of a string pattern that will fit one string but won't fit another string which is the same one but on a different context? Would you give a short example? – Rojo May 20 '12 at 03:09
  • @magma: I'll probably create a separate question regarding this feature claimed for Workbench 2.0. I'd like to use the .nb file as you suggest in Workbench for editing purposes but if certain operations require working with the .m file and these changes cannot be automatically propagated back to the .nb file (which I don't believe that they can) then I think this presents a fundamental obstacle to combining the auto-generated package paradigm with the tools made available by the Workbench environment. – StackExchanger May 20 '12 at 06:56
  • @StackExchanger I do not really see what kind of operations might require working on the .m file that cannot be done (better) on the .nb file. My philosophy is that whatever you want to do with a package, you do it on the .nb file – magma May 20 '12 at 23:15
  • @magma: I'm in complete agreement with you here. My issue is that the Workbench philosophy encourages use of the .m file for code development and the .nb for testing/experimentation. If Workbench requires that search & replace for a string pattern be done in the .m file, how easy is it to propagate these changes back to the .nb file which auto-generated that .m file? I would prefer Wolfram to provide the desired functionality from Eclipse directly inside the frontend. – StackExchanger May 21 '12 at 03:32
  • @Rojo: I've edited the OP to include an example of the kind of problem that can arise in the proposed strategy of using the results of searching the string representation of the notebook for the string pattern to then search the notebook using NotebookFind. – StackExchanger May 21 '12 at 03:38
  • I disagree with your conclusion in the described example of: "Of course we could have matched the entire string plus surrounding context ... but this is not really what we want,". If you search for something like "(*foo*)", you will get a match to the actual pattern, not just the foo element. If you actually wanted to find the position of only foo in "(*foo*)", then that's a secondary search in the returned result. – jVincent Jun 01 '12 at 09:15
  • @jVincent: The issue I'm still trying to resolve here is how to use NotebookFind to perform the "secondary search in the returned result" that you describe. (Re-reading the OP will help explain why using NotebookFind is important for the intended application.) If I do the secondary search in the returned result as you suggest still using the string representation of the notebook and then search for those results back in the notebook using NotebookFind then it is clear that the resulting set of matches may not be the desired ones. – StackExchanger Jun 08 '12 at 09:07

1 Answers1

7

Ok so this is going to be a long one. This is definitely not a general purpose implementation, but It shows the general idea that one could use.

So you basicly want to be able to type out NotebookFind["(.?foo\d.?)"], which would match to for example "(something something foo4 dark side)". However you only want it to highlight foo4, and not the rest. So the way to do this is to first search through the notebook and figure out that our pattern matches the entire string, and search only for the particular realized sub-expression "foo4" and figure out which of the potentially many search result for foo4 collides with the search for the entire pattern.

So for the purpose of this implimentation I'll assume that you have a RegularExpression pattern, where the part you want to highlight the first matched subpattern (Which means you enclose it in parenthesis in the search string). So the above pattern would be: RegularExpression["[(].?(foo[\d]+).?[)]"]. We then:

  • search through strings in the notebook expression for cases where this matches
  • then extract the subexpression matched,
  • then sort out how many times we match the subexpression without matching the full.
  • Then call NotebookFind[] enough times to land on the correct match.

So here goes for the actual code. It doesn't work for matching notebook level expressions and only searches through strings.

This function just creates a pattern for the actual substitution based on the search pattern.

StringPatternWrapper[stringpattern_]:=  
 (a_String/;StringMatchQ[a,stringpattern]):>StringCases[a,stringpattern:>"$1"]

This function finds the positions and cases of the matched pattern. The pattern provided for this function should first be sent through StringPatternWrapper[]-

 findPostionAndExactMatch[nbexp_,pattern_]:=
     {Position[nbexp,pattern[[1]],∞],
      Cases[nbexp,pattern,∞]}//ridiculousFormatingFunction

where ridiculousFormatingFunction is a messy function for reformating the output.

 ridiculousFormatingFunction[list_] := 
    Map[(a\[Function]Map[{a[[1]],#}&,a[[2]]]),Transpose[list]]//
    (Flatten[Table[{#[[1,1]],#[[1,2]],n},{n,1,#[[2]]}]&/@Tally[Flatten[#,1]],1])&

And then a function for finding all the matches to the matched subexpression

findAllExactMatches[nbexp_,exact_] := 
  Flatten[Map[Table[#, {Length@StringCases[nbexp[[Sequence@@#]],exact]}]&,
  Position[nbexp,a_String/;StringMatchQ[a,exact],∞]],1]

Because some stings might contain more then one match, we need some fixing of the numbers

repeatNumberForMatch[match_,nbexp_] := 
First@Position[
    findAllExactMatches[nbexp,RegularExpression[".*?"<>match[[2]]<>".*?"]],match[[1]]
][[match[[3]]]]

And finally we have a nice little function which returns a list of all the matches, which expressions they apear in, and how many times you need to skip when using NotebookFind.

 matchTable[nbexp_,pattern_] := Prepend[#,repeatNumberForMatch[#,nbexp]]&/@
findPostionAndExactMatch[nbexp,pattern]

Here is an output example from Match table using the provided example notebook below:

  matchTable[NotebookGet[nb], 
   StringPatternWrapper[
    RegularExpression[".*?" <> "[(].*?(foo[\\d]+).*?[)]" <> ".*?"]]] //
   Prepend[#, {"Repat find number", "Indices", "Exact Match", 
     "Number inside string"}] & // Grid

output from matchTable

Here is a short usage example

notebookFindN[nb_,find_,n_]:=
    (SelectionMove[nb,Before,Notebook];Do[NotebookFind[nb,find,Next],{n}])

clickerUI[nb_,pattern_]:=
    Button[#[[3]],notebookFindN[nb,#[[3]],#[[1]]]]&/@matchTable[NotebookGet[nb],pattern]

And some test code and a test notebook:

 nb = {
     Cell["This is a direct match for the realised sub-pattern foo1 but not the full", "Text"], 
     Cell["This is another match identical to the realised one foo2, and still not the full, however this one needs to be skiped when using NotebookFind", "Text"],
     Cell["And finally ( we have a full match for foo2 the pattern ) (foo2) <- That's another one, and so is that -> (foo4)", "Text"],
     Cell["Some times a single string can have more then one entry foo2 foo2, so we need to count how many and which ones we are looking for, which makes the code slightly messy.", "Text"],
     Cell["And finally ( we have one last full match for foo2 ) enclosed in parenthesis", "Text"]
     } // CreateDocument;

 clickerUI[nb, 
   StringPatternWrapper@
   RegularExpression[".*?" <> "[(].*?(foo[\\d]+).*?[)]" <>".*?"]
 ] // Row

Hope this can be of some help. Personally I'd like to have code that could equally well search though strings and notebook level expressions, however this requires a better structuring of the method I think.

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
jVincent
  • 14,766
  • 1
  • 42
  • 74
  • 1
    You provide a very elegant solution that addresses precisely the issue that I raised in the edit to the OP. I also don't see why your "skipping over matches outside of pattern context" could not be extended in principle to search for notebook level expressions as well. Sorry for the long delay in responding but I wanted to find some time in my busy schedule to study your solution carefully and test it on my own sample notebooks before accepting the answer. Thank you for all of the effort that you put in to this response. – StackExchanger Sep 15 '12 at 07:40