6

I am not able to figure the StringPattern to use to remove markers in string.

This is the input.

lst = {{1, 2, "this is a test", 4}, {Pi, 20, xy, 10}};
buf = ToString@TeXForm@lst

which gives

\left(
\begin{array}{cccc}
     1 & 2 & \text{this is a test} & 4 \\
     \pi  & 20 & \text{xy} & 10 \\
    \end{array}
\right)

I need to remove all the places where this pattern shows up \text{.....} and replace it with just what is inside ..... i.e. strip out the \text{ and the closing } on the other side. For each such instance in the input.

So the above should become

\left(
\begin{array}{cccc}
     1 & 2 & this is a test & 4 \\
     \pi  & 20 & xy & 10 \\
    \end{array}
\right)

I tried many things. Tried also using RegularExpression.

One attempt:

StringReplace[buf, "\\text{" ~~ x___ ~~ "}" .. :> x]

But this has a problem. It does not stop at the first closing }, but goes all the way to the ending } in the string, ending up with

\left(
\begin{array}{cccc}
 1 & 2 & this is a test} & 4 \\
 \pi  & 20 & \text{xy} & 10 \\
\end{array
\right)

Notice, it went all the way to the end, and removed the } after {array.

I did not know how to tell it to stop at the first } it sees after it sees \text{. And that is what I am struggling with. I know I wrote x__ but I needed to do this, so I can pick out the x.

Any idea how to do this? Either using StringPattern or ReqgularExpression will work.

Nasser
  • 143,286
  • 11
  • 154
  • 359

2 Answers2

9

With both StringPattern and RegularExpression the problem is greediness: wildcards will try to match as much as possible. With StringPattern this can be fixed using Shortest:

StringReplace[buf, "\\text{" ~~ Shortest[x___] ~~ "}" :> x]

With a regular expression a quantifier can be made ungreed with ? (e.g. {(.*?)}), but when you're going that way, you can actually write a safer regular expression using a negated character class:

StringReplace[buf, RegularExpression["\\\\text{(.*?)}"] :> "$1"]

Which gives the same result.

Both of these have one issue though: they're not entirely safe. When your actual string contains }, then they will stop at that. Consider:

lst = {"abc", "x}y", "123"};
buf = ToString@TeXForm@lst

This gives:

\{\text{abc},\text{x$\}$y},123\}

And using either solution will turn it into:

\{abc,x$\$y},123\}

I think to fix this, only a regular expression approach is viable, which knows exactly what characters (or combinations) are allowed within the {...}:

StringReplace[buf, RegularExpression["\\\\text{((?:\\\\.|[^\\\\}])*)}"] :> "$1"]

Which gives

\{abc,x$\}$y,123\}

as expected.

Martin Ender
  • 8,774
  • 1
  • 34
  • 60
0

A form without RegularExpression that I believe works on Martin's second example:

lst = {"abc", "x}y", "123"};
buf = ToString @ TeXForm @ lst;

StringReplace[buf, "\\text{" ~~ Shortest[a___] ~~ b : Except["\\"] ~~ "}" :> a <> b]
"\\{abc,x$\\}$y,123\\}"
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371