I have a (deeply nested) directory containing 10s of thousands of wolfram files (mostly notebooks (nbs), but also scripts (.wls), packages (.m, .q wl), my and wxf data) and many subfolders each with a few hundred notebooks on average. Each notebook contains 10-100 of pages. I want to search on text in the files and within relevant notebook cells types e.g. Code, Text, Title, SubTitle, Item, Program, ExternalLanguage, etc.
I'd like to have a live search interface to quickly search through the contents of all files and visually show the matches highlighted within their context.
Is there an existing project or best practices for doing this?
The TextSeach related wolfram symbols are old and seem slow/weak, can mma even do this? Mma is obviously needed to preview the content of matches within notebooks, but what other tools would be used to build such a live deep search?
TextSearchtools? They are neither particularly old (introduced in version 10.2), nor do the seem too slow after the initial index creation from my quick testing. (To be fair, I did not test it on such huge numbers of notebooks). – Lukas Lang Jun 12 '21 at 19:08grepas a pre-filter (fast) and then use Mathematica tools as the secondary step (slow). You could also usegrepto build a token list for cell types and use that as an index. A similar thing could be done to get simple patterns like theCell["*", "Section"which you could use to build a tag list of sections/subsections in notebooks that you can then do a more sophisticated on-demand filter of by actually opening and preprocessing the content – b3m2a1 Jun 14 '21 at 23:30grepand then only doing the expensive filter on the stuff you know you want (and making an index of the e.g."Section"cells) will cause you much less grief than trying to build one massive index – b3m2a1 Jun 14 '21 at 23:31TextSearchtools don't accomplish? You mention too slow, but I just usedTextSearchto search a directory containing 93 .nb files totaling 300MB, full of large graphics. It returns in less than 1s (Ubuntu 20.04.2 / MMA 12.2.0) without indexing. Indexing the .nb files took 54s and then search returns in ~1ms. You also mention too weak but it seems to do everything you asked for in your post, with the possible exception of filtering by cell type (I'm not sure if that's possible). – bRost03 Jun 24 '21 at 15:27InputFormtext files containing various types of cells using a technique from this thread, then search in the text files using a third-party tool (or even Mathematica). – Alexey Popkov Jul 24 '21 at 00:20