5

In Working With String Patterns, Wolfram documentation yields a simple Grep function:

Grep[file_, patt_] := 
 With[{data = Import[file, "Lines"]}, 
  Pick[Transpose[{Range[Length[data]], data}], 
   StringFreeQ[data, patt], False]]

However, the actual grep function is far more sophisticated than this. For example running grep -nr -C 2 <pattern> allows us to search for <pattern> recursively through a directory, showing 2 lines of context around each match. Wolfram should in principle be able to do this (and even far better than this, perhaps using datasets?).

Concrete Question: How can one use Wolfram to create a grep function that at least reproduces grep -nr -C N <pattern> functionality? (If it simply wraps the actual grep command that's fine).

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368
George
  • 3,145
  • 7
  • 21

2 Answers2

12

Mathematica allows text searching using regular expressions (based on the PCRE library). It would take some work to re-implement the whole grep functionality within Mathematica, but for your concrete example

grep -nr -C 2 <pattern>

it is as easy as follows:

ClearAll[Grep]
Grep[files_List, patt_, c_Integer: 0, style : {__} : {Red, Bold}] := 
  Monitor[Do[Grep[files[[i]], patt, c, style], {i, Length[files]}], 
   ProgressIndicator[i, {1, Length[files]}]];
Grep::noopen = "Can't open \"``\".";
Grep[file_, patt_, c_Integer: 0, style : {__} : {Red, Bold}] := 
  Module[{lines, pos}, 
   Quiet[Check[lines = ReadList[file, "String"], 
     Return[Message[Grep::noopen, file], Module]], {ReadList::noopen, ReadList::stream}];
   pos = Flatten@Position[StringContainsQ[lines, patt], True];
   If[pos =!= {}, 
    Echo@Grid[Prepend[{#, 
          Column@StringReplace[lines[[Span[Max[# - c, 1], UpTo[# + c]]]], 
            str : patt :> 
             "\!\(\*StyleBox[\"" <> str <> "\"," <> 
              StringRiffle[ToString /@ style, ", "] <> "]\)"]} & /@ pos, 
        {file, SpanFromLeft}], Dividers -> All, Alignment -> Left]];];

where

  • file or files is a file name/path or a list of them

  • patt is a literal string, StringExpression or RegularExpression pattern to search for

  • c is number of additional lines of leading and trailing output context

  • style is a List of styling directives to be applied to the matching text (I use here great solution by halirutan from this answer); if you don't want to apply a style, put {{}} as the value for this option

For obtaining the complete listing of files in a directory and all its subdirectories at all levels one can use FileNames as Select[FileNames[All, dir, Infinity], Not@*DirectoryQ]. A very enlightening discussion of its usage for obtaining only specific filepaths can be found here.

Examples

Find lines containing the word "Welfare" and display them with 1 surrounding line of leading and trailing context:

Grep[FindFile@"ExampleData/USConstitution.txt",
 WordBoundary ~~ "Welfare" ~~ WordBoundary, 1]

screenshot1

Search for word "eye" in all files in a directory and all its subdirectories:

dir = FileNameJoin[{$InstallationDirectory,
    "Documentation/English/System/ExamplePages"}];

files = Select[FileNames[All, dir, Infinity], Not@*DirectoryQ];

Grep[files, WordBoundary ~~ "eye" ~~ WordBoundary]

(* during evaluation it displays ProgressIndicator *)

screenshot2

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368
7

Update

As recommended in the comments by @b3m2a1, you can also use RunProcess as a simpler way to execute grep. You need to supply the command as a list of the command plus the space delimited arguments and set the ProcessDirectory. To do a recursive search for NotebookDirectory in notebook files enter the following:

cmd = "grep -RH \"NotebookDirectory\" --include=\"Int*.nb\" *";
RunProcess[{"bash", "-c", cmd}, "StandardOutput", 
 ProcessDirectory -> NotebookDirectory[]]
(* "Absorption/BakedSlider/InterphaseMassTransfer_slider.nb:   \
  RowBox[{\"NotebookDirectory\", \"[\", \"]\"}], \"]\"}], \";\"}], 

Absorption/BakedSlider/InterphaseMassTransfer_slider.nb:" *)

Original Answer

Here is an example calling the system grep (using Cygwin and putting bash.exe in my path on Windows). Remember to escape special characters. The following does a recursive directory search on "NotebookDirectory" including Mathematica notebooks matching the pattern "Int*.nb".

SetDirectory[NotebookDirectory[]];
file = CreateFile[];
Run["grep -RH \"NotebookDirectory\" --include=\"Int*.nb\" * >>" <> file];
FilePrint[file]
DeleteFile[file];

(*Absorption/BakedSlider/InterphaseMassTransfer_slider.nb:     RowBox[{"NotebookDirectory", "[", "]"}], "]"}], ";"}], 
Absorption/BakedSlider/InterphaseMassTransfer_slider.nb:     RowBox[{"NotebookDirectory", "[", "]"}], "]"}], ";"}], 
Absorption/BakedSlider/InterphaseMassTransfer_sliderb.nb:     RowBox[{"NotebookDirectory", "[", "]"}], "]"}], ";"}], 
Absorption/BakedSlider/InterphaseMassTransfer_sliderb.nb:     *)
Tim Laska
  • 16,346
  • 1
  • 34
  • 58