11

I am looking for a Regex to match comments in XML documents:

<root>
<!-- 
    match this 
-->
<but>not this</but>
<!--
     and also this
-->
</root>

I've tried <!--[^(-->)]*-->, which only matches single line comments, and <!--[\s\S\n]*--> which matches non-commented nodes as well.

1 Answers1

12

The regex you're looking for would be:

<!--[\s\S\n]*?-->

Explanation:

 <!--               All comments must begin with this
     [\s\S\n]       Any character (. doesn't allow newlines)
             *      0 or more of the previous thing ([\s\S\n])
              ?     As few of the previous thing as possible while still matching
               -->  All comments must end with this

If you have a comment inside a comment this will have issues though:

<!-- Documentation
This program documents itself using comments of the type <!-- -->
-->

Highlighted in bold means a match

Martin
  • 103
timotree
  • 1,118
  • This exact expression didn't work for me, but this did: . Giving you the green check box for pointing out the "?" though, and the thorough explanation. – Dan Solovay Dec 05 '16 at 19:23
  • @DanSolovay I'll edit it to include your discovery of that Visual Studio quirk. – timotree Dec 05 '16 at 19:25
  • This will not work if the comment doesn't start at the beginning of line. like: <div> <-- comment -->. The \n is useless in the character class, [\s\S] stands for any character, including newline. – Toto May 18 '20 at 10:36
  • @Toto yes... I wrote this answer a while ago and returning to it I am perplexed by it working for 6 people even though it uses ^ and $. Regarding [\s\S\n], see the above comment by the OP saying that adding the \n was necessary for their regex tool. – timotree May 18 '20 at 17:16