18

I have a (deeply nested) directory containing 10s of thousands of wolfram files (mostly notebooks (nbs), but also scripts (.wls), packages (.m, .q wl), my and wxf data) and many subfolders each with a few hundred notebooks on average. Each notebook contains 10-100 of pages. I want to search on text in the files and within relevant notebook cells types e.g. Code, Text, Title, SubTitle, Item, Program, ExternalLanguage, etc.

I'd like to have a live search interface to quickly search through the contents of all files and visually show the matches highlighted within their context.

Is there an existing project or best practices for doing this?

The TextSeach related wolfram symbols are old and seem slow/weak, can mma even do this? Mma is obviously needed to preview the content of matches within notebooks, but what other tools would be used to build such a live deep search?

M.R.
  • 31,425
  • 8
  • 90
  • 281
  • 1
    Excellent question. Can anything outside MMA search sets of notebooks, for a "specific variable" for instance? This is a tool/question, I'd like answered too. – prog9910 Jun 11 '21 at 05:00
  • 1
    What exactly is the issue with the TextSearch tools? They are neither particularly old (introduced in version 10.2), nor do the seem too slow after the initial index creation from my quick testing. (To be fair, I did not test it on such huge numbers of notebooks). – Lukas Lang Jun 12 '21 at 19:08
  • TextSearch works for text files but not too well on many large notebooks with lots of graphics. Have you tried with a few long notebooks? I'm thinking about preprocessing all of them and exproting the cell contents to txt files – M.R. Jun 14 '21 at 23:24
  • 1
    If you only want to find cell types you can always grep as a pre-filter (fast) and then use Mathematica tools as the secondary step (slow). You could also use grep to build a token list for cell types and use that as an index. A similar thing could be done to get simple patterns like the Cell["*", "Section" which you could use to build a tag list of sections/subsections in notebooks that you can then do a more sophisticated on-demand filter of by actually opening and preprocessing the content – b3m2a1 Jun 14 '21 at 23:30
  • My guess is doing this incrementally where you first filter against some fast-to-retrieve stuff with grep and then only doing the expensive filter on the stuff you know you want (and making an index of the e.g. "Section" cells) will cause you much less grief than trying to build one massive index – b3m2a1 Jun 14 '21 at 23:31
  • You didn't say what OS you are using. My go-to tool on macOS is HoudahSpot, but I use it primarily for symbol name searches. It is suitably responsive for my dynamic searches, but mine are not complex, and my file counts are only in the thousands. Also, it does not use a clean regex syntax, but an awkward, less powerful, one derived from Apple's Spotlight capabilities. And, it relies on the underlying data set generated by Spotlight. It's not obvious that it would work for you, but it could be worth looking at. – G. Shults Jun 16 '21 at 22:13
  • @G.Shults I'm on macos but would love a crossplatform solution since I use linux a lot too – M.R. Jun 23 '21 at 22:07
  • Can you be a little more specific in exactly what you're looking for that the TextSearch tools don't accomplish? You mention too slow, but I just used TextSearch to search a directory containing 93 .nb files totaling 300MB, full of large graphics. It returns in less than 1s (Ubuntu 20.04.2 / MMA 12.2.0) without indexing. Indexing the .nb files took 54s and then search returns in ~1ms. You also mention too weak but it seems to do everything you asked for in your post, with the possible exception of filtering by cell type (I'm not sure if that's possible). – bRost03 Jun 24 '21 at 15:27
  • @bRost03 I'll try to add some more specifics to the question tonight. I'm looking for something that works like Command + F, but instead 1) searches all nbs recursively in a dir 2) which as you type updates in real-time 3) a tabular view of the highlighted matches in-place (as they would look like within their notebooks (e.g. within 2d formatting and such) as if you'd used Edit > Find > FindNext and took a crop) – M.R. Jun 24 '21 at 19:46
  • @M.R. That's a tall order! I would be very surprised if that could be implemented by a user. That said, there's some real MMA wizards on here so maybe. I'd definitely be interested in such a tool. – bRost03 Jun 24 '21 at 19:51
  • 1
    @bRost03 Right? But that's what is needed. For something like (2) check out https://resources.wolframcloud.com/FunctionRepository/resources/DynamicStringSearch – M.R. Jun 24 '21 at 19:52
  • Probably you can first convert your notebooks into interpreted InputForm text files containing various types of cells using a technique from this thread, then search in the text files using a third-party tool (or even Mathematica). – Alexey Popkov Jul 24 '21 at 00:20

1 Answers1

1

There is an MMA Notebook Indexer Tool at Wolfram Library Archive It only works on MacOS 10.4 from 2005. Otherwise, printing to PDF, and using PDF searching tools is another possibility.

Even though, there seems to be an issue of MMA exporting to PDF. No prob, I just print to Save as PDF. Time is certainly a factor. However, I would encourage others than myself, to answer the root of the question. PDF printing is best when it was created, certainly.

prog9910
  • 630
  • 4
  • 10
  • 1
    "Save as PDF myself" for thousands of notebook each of 10-100 pages. Please, could your estimate time needed for the "solution"? – Acus Jun 11 '21 at 05:41
  • Unless there is a way to maintain meta-data with your MMA notebooks? How else can it be done. We need Wolfram Research to take the lead on this. Not leave it to the user, to find answers. – prog9910 Jun 11 '21 at 15:22
  • The application is too old, doesn't install anymore... – user5601 Jun 12 '21 at 05:33
  • @prog9910 Wow 16 years old, nice archeology! In the link WRI said they would open-source it, but I can't tell if they every did... Do you know of any newer such projects that work on recent operating systems like MacOS 11+? – M.R. Jun 24 '21 at 19:49