1

A problem about StringReplace


When I'm running into this problem possible [bug]

str11 = "<strong style=\"font-size:20px\" style=\"color:#dfa57c\"

style=&quot;font-size:20px&quot; style=&quot;color:#dfa57c&quot; class=&quot;new&quot;>";

StringCases[str11,

Repeated[Shortest["style=&quot;" ~~ ___], {2, 10}] ~~ "&quot;"]

(* {style="font-size:20px" style="color:#dfa57c",style="font-size:20px" style="color:#dfa57c" } *)

There is a problem of the above code, there are two result, how can I get just one Max repeated result? Something like the effect of Longest

Here is like the following result.

StringCases[str11, Repeated[Shortest["style=\"" ~~ ___], {4}] ~~ "\""]

(* {style="font-size:20px" style="color:#dfa57c" style="font-size:20px" style="color:#dfa57c" } *)

StringReplace[str11,

x : Repeated[Shortest["style=&quot;" ~~ ___], {2, 20}] ~~ "&quot;"]

(* <strong ="font-size:20px" style="color:#dfa57c ="font-size:20px" style="color:#dfa57c clas s="new"> *)

StringReplace[str11,

p : Repeated[Shortest["style=&quot;" ~~ ___], {4}] ~~ "&quot;" :>

"style=" <>

StringReplace[

StringJoin[

Riffle[StringSplit[

StringReplace[p, "style=" -> ""], {"style=", " "}],

";", {2, -1, 2}]], "&quot;;&quot;" -> "; "] <> "&quot;"]

(* <strong style="font-size:20px; color:#dfa57c; font-size:20px; color:#dfa57c;" class="new"> *)

Question: Can the count be removed? Are there any simpler solutions?

htmlStringTrim[x_] := (count = StringCount[x, "style"];

StringReplace[x,

p : (Repeated[Shortest["style=&quot;" ~~ ___], {count}]) ~~ "&quot;" :>

StringJoin[

Riffle[StringSplit[p, {"&quot; style=&quot;"}], "; ", {2, -1, 2}]] <>

"&quot;"])

htmlStringTrim[str11]

(* <strong style="font-size:20px; color:#dfa57c; font-size:20px; color:#dfa57c; " class="new" > *)


Thanks for Mr.Wizard's answer.

htmlStringTrimNew[x_] := 
  StringReplace[x, 
   p : (("style=\"" ~~ Except["\""] .. ~~ "\"" ~~ 
     Whitespace | "") ..) :> (
 "style=\"" <> 
  StringJoin[
   StringInsert[StringReplace[#, "\"" :> ""], ";", -2] & /@ 
    StringSplit[p, {"style=\""}], "\""])];
HyperGroups
  • 8,619
  • 1
  • 26
  • 63
  • 1
    I think this should answer your question, unless I missed the point which is possible here. - using the StringCase function and Shortest option – Kuba Apr 26 '15 at 13:28
  • HyperGroups do you agree with Kuba that your question is answered in the linked Q&A? – Mr.Wizard May 13 '15 at 09:13
  • @Mr.Wizard I don't agree that, the most confused problem of the question is how to match the repeated pattern. that problem is matching pattern from left to right something like Map[StringCases[#, Shortest["|uniprotkb:" ~~ aa__ ~~ "(gene name)"] -> aa, Overlaps -> True] &, test1] – HyperGroups May 27 '15 at 09:42
  • @HyperGroups could you reduce the question to the unique problem. It is so large now it discourages from reading, imo. – Kuba May 27 '15 at 10:28
  • @Kuba how about now – HyperGroups May 27 '15 at 11:09
  • Okay. I don't know a better solution to the Repeated problem but I'll think about it. Incidentally htmlStringTrim does not give me the output that you show; instead I get: "<strong style=\"font-size:20px; color:#dfa57c\" style=\"font-size:20px; color:#dfa57c; \" class=\"new\">" – Mr.Wizard May 27 '15 at 13:09
  • @Mr.Wizard Hi, I found that's the problem about there generate a new line \\ in StackExchange in str11, which will affect the function. Now I‘ve removed that. – HyperGroups May 27 '15 at 13:28
  • I posted an answer relating to the pattern (which also handles the newline). Please check it. Are you interested in alternative implementation of htmlStringTrim? – Mr.Wizard May 27 '15 at 13:32
  • @Mr.Wizard Yes, welcome, I add one not so concise just now – HyperGroups May 27 '15 at 13:57
  • Is performance or clarity more important for this code? – Mr.Wizard May 27 '15 at 14:03
  • @Mr.Wizard performance is not so important in dealing with that bug I think, since there will limited number of style will be in a string, considering Inline style of css. – HyperGroups May 27 '15 at 14:06

1 Answers1

1

I think this matches what you want it to:

str11 = "<strong style=\"font-size:20px\" style=\"color:#dfa57c\" 
  style=\"font-size:20px\" style=\"color:#dfa57c\" class=\"new\">";

StringCases[str11, ("style=\"" ~~ Except["\""] .. ~~ "\"" ~~ Whitespace | "") ..]
{"style=\"font-size:20px\" style=\"color:#dfa57c\" 
 style=\"font-size:20px\" style=\"color:#dfa57c\" "}
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371