37

My aim is to extract the contents of an Input cells as plain text. Ideally I would like to be able to select one or more Input cells and press a button which would then extract the text from these cells as a string and assign it to a variable. However, this turns out to be harder than it seems (to me at least).

One way which was suggested in a chat conversation was to copy the cell as plain text by doing something like this

Button["Set x to selection",
  FrontEndTokenExecute[SelectedNotebook[], "CopySpecial", "PlainText"];
  x = (NotebookGet@ClipboardNotebook[])[[1, 1, 1]]
  ]

This seems to work; for example, if I type this into an input cell, select the cell, and press the button

a + b == c;
d = "x";

then x is set to the string "a+b==c;\nd=\"x\";" which is what I want. The only downside of this method as far as I can tell is that it destroys the clipboard.

To preserve the clipboard I tried using NotebookRead instead to copy the contents but I only got that to work if I changed the display form of the selected cell to "TextForm" first, e.g.

Button["Set x to selection 2",
 Module[{tmpnb, nb},
  nb = SelectedNotebook[];
  tmpnb = CreateDocument[WindowSelected -> False];
  NotebookWrite[tmpnb, NotebookRead[nb], All]; 
  FrontEndTokenExecute[tmpnb, "SelectionDisplayAs", "TextForm"];
  SelectionMove[tmpnb, All, CellContents];
  x = ToString[NotebookRead[tmpnb]];
  NotebookClose[tmpnb]]] 

Of course the downside of this approach is that it creates a temporary notebook every time I press the button.

Question

Both methods seem to do their job, but they both rely on the FrontEnd manipulating the contents of the selected cell before being able to extract the string. This made me wonder if there is an easier way to get the same result.


Keywords:

String, Text, Cell, Export, ExportString, ToString, InputForm

Kuba
  • 136,707
  • 13
  • 279
  • 740
Heike
  • 35,858
  • 3
  • 108
  • 157
  • 3
    Congrats with your first question by the way. I should also break the ice some day. – Leonid Shifrin Feb 04 '12 at 21:34
  • @LeonidShifrin Thanks. At least you have a good excuse for not having posted a question yet. You already know everything about Mathematica (or so it seems to me at least). – Heike Feb 04 '12 at 22:11
  • 1
    Thanks, but I think I know a very tiny fraction of things there are to know about it. Actually, one thing which is quite amazing (to me anyways) about Mathematica is how much new stuff you can learn even after years of experience. I learn a lot here on SO / SE, which is a big motivation for me to hang around. – Leonid Shifrin Feb 04 '12 at 22:16
  • 2
    @LeonidShifrin I agree. I've learned more about Mathematica in the 7 months I've been on SO and now here on SE than in all the years before that. – Heike Feb 04 '12 at 22:22
  • Yep - I can probably say the same. – Leonid Shifrin Feb 04 '12 at 22:23

2 Answers2

41

Assuming nb is your notebook object, then this will do what you want without touching the clipboard:

First[FrontEndExecute[
  FrontEnd`ExportPacket[NotebookSelection[nb], "InputText"]]]

Some notes about this solution:

  • It preserves evaluation semantics precisely, regardless of typesetting.
  • It does not dirty the clipboard
  • If you prefer to get the appearance as opposed to the evaluation semantics, you can use "PlainText" (for example, grids copy as tabular looking things as opposed to as lists)
  • I tested this in 8.0.1, but it might not work in earlier versions

This FE packet only supports a limited number of formats. The public formats include "GIF", "PPM", "EnhancedMetafile" (Windows), "PICT" (Mac) , "PostScript", "RTF", "PDF", and "SVG".

I should say that the first argument of ExportPacket can also be any Notebook, Cell, or Box expression. Also, a NotebookObject, in which case it'd convert the entire notebook rather than just the selection.

When the selection does not contain a full cell it is enough to work around by using the results of NotebookRead. E.g.:

First[FrontEndExecute[ FrontEnd`ExportPacket[BoxData @ NotebookRead[nb], "PPM"]]]
Kuba
  • 136,707
  • 13
  • 279
  • 740
John Fultz
  • 12,581
  • 58
  • 73
  • 2
    @JohnFultz it seems that it converts \n to \r\n while using context menu does not do this. Any tips how to prevent it? e.g.: InputForm@ First[FrontEndExecute[ FrontEnd`ExportPacket[ BoxData@RowBox[{"{", " ", RowBox[{"a", "\n", ",", " ", "b"}], "\n", "}"}], "PlainText"]]] – Kuba May 25 '18 at 07:46
8

NOTE: This answer is provided for illustrative purposes only, since it shows some techniques of working with boxed data. While it illustrates how one could emulate the correct behavior in some cases, this code should NOT be used in practice (as a solution for this particular problem), because doing so may be both fragile and dangerous. Please read the discussion comments below the answer to get a more complete picture.

This is what usually works for me:

Button["Set x to selection", 
   x = 
    StringJoin@
        Cases[
          NotebookRead[SelectedNotebook[]] /. Cell[BoxData[data_], ___] :> data, 
          _String, 
          Infinity
        ]
 ]

In particular, I used similar code in my syntax highlighter generator, so I've tested that this works, on many examples (although, the code of the generator is perhaps not the best place to consult, if one wants to stay sane).

EDIT

Here is a code which is supposed to work also for non-text boxes (such as SqrtBox), but it is pretty ugly and may also be fragile, to the point that the clipboard solution seems much better.

Button["Set x to selection",
  x = 
      With[{tag = StringJoin[ ToString /@ {Unique["tag"], Unique[]}]},
        StringReplace[
            StringJoin@
               Cases[
                  NotebookRead[SelectedNotebook[]] /. Cell[BoxData[data_], ___] :>
                   (ToExpression[
                      (data /. "\n" :> tag),
                      StandardForm,
                      Function[dt, MakeBoxes[InputForm[dt]], HoldAll]
                    ] /. InterpretationBox[StyleBox[code_String, ___], ___] :> code
                   ),
                  _String, {0, Infinity}
               ], 
            tag :> "\n"
        ]
      ]
 ]

For cases when you only have code, you should be able to use the first version though.

Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
  • 2
    I may not understand the question but this turns $\sqrt{a}$ into $a$ -- is this desired? It seems dangerous. – Mr.Wizard Feb 04 '12 at 23:10
  • 1
    @Spartacus Yes, you are right - I was mostly using this for code cells, so did not run into this problem. I know how to avoid this, but it include ToExpression - MakeBoxes cycle ( along these lines: ToExpression[data, StandardForm, Function[dt, MakeBoxes[InputForm[dt]], HoldAll]]), and I lose the new lines (they get converted to Null-s). This is surely surmountable, but I've got to go now. Will correct tomorrow, if no one does that first. – Leonid Shifrin Feb 04 '12 at 23:39
  • @Spartacus I should watch myself - the number of typos I make increased dramatically in the recent days, some of which are very stupid and not something I'd normally make. – Leonid Shifrin Feb 04 '12 at 23:50
  • 1
    I am confident I still outrank you in the field of stupid mistakes. ;-) – Mr.Wizard Feb 04 '12 at 23:53
  • @Spartacus lol :) – Leonid Shifrin Feb 04 '12 at 23:54
  • Thanks, @Leonid. It looks like the safest option is to just stick to the clipboard solution. – Heike Feb 06 '12 at 09:32
  • @Heike Looks like it. It also probably depends on the context - if you only use this for code (no fancy typesetting), then my first option should work fine. Thanks for the accept! – Leonid Shifrin Feb 06 '12 at 11:55
  • 5
    The clipboard code is very careful to deal with all kinds of special cases like TagBox, InterpretationBox, all of the various script boxes, etc. I applaud your inventiveness, but I find this shotgun approach very scary, and would never personally recommend its use. The answer I separately posted uses the same mechanism used by the clipboard. – John Fultz Feb 06 '12 at 17:05
  • @John Thanks - I also mentioned that the second version is ugly and fragile. Do you think it is better to remove it from the post altogether? If you think that the entire answer is dangerous, I will gladly remove that as well. – Leonid Shifrin Feb 06 '12 at 17:13
  • @Heike Could you please check-mark John's answer? It is much better than mine, which I may wish to delete in some future. – Leonid Shifrin Feb 06 '12 at 17:14
  • @LeonidShifrin Of course. I was contemplating that myself already. Glad you don't mind. – Heike Feb 06 '12 at 17:23
  • @LeonidShifrin The answer is enlightening in that you show useful principles in working with boxes. The danger is that it appears to work often enough that someone may come to rely on it more than they should. I think that the answer, when read with the discussion comments, is educational. That plus my conservative nature about deleting non-spammy, non-stupid answers leads me to not wish for the answer to be deleted...just understood in the proper context. For the same reason, I didn't down-vote it. – John Fultz Feb 06 '12 at 18:52
  • @John Thanks, points taken. I will leave it then, perhaps edit to make it more clear that it should not really be used, and is only given as an emulation, for illustration purposes only. – Leonid Shifrin Feb 06 '12 at 19:29
  • I sort of kind of think I get the general idea of what this is doing, but I sure would appreciate a breakdown of exactly how it works -- if anyone wise enough in the Deep Magic here has time to explain. – ibeatty Aug 20 '15 at 13:50
  • @ibeatty Try the first (shorter) version first, on some sample box expression (can use MakeBoxes on any expression to create it), to see how it works. The longer one simply takes into account certain more complex cases like styled text, multiple input lines, etc. Keep in mind that this is not the recommended way to extract such strings, see the answer of John Fultz for the recommended way. – Leonid Shifrin Aug 20 '15 at 14:25