I would like to remove comments starting with # from a file. I have tried the simpler approaches described in How can I remove all comments from a file? but I have a few additional rules:
- A
#does not start a comment if it occurs as part of a quoted string. - Strings can be quoted by single quotes
'or double quotes". - Double-quoted strings can contain quotes if preceded by a backslash
\", backslashes are quoted as\\. - All quotes in the input are matched. However, this is not required for quotes that are part of a string's content, in other words
"'","\""and'"'are valid strings). - Quoted strings can't contain newline characters.
- Comments can contain any characters including any number of
#,',"and\. - Any
#outside of quotes starts a comment (as Stéphane Chazelas pointed out code code for most shells follows more complex rules - think about Bash's$#which does not start a comment).
For example the following input
# comment only
# comments are allowed to contain quotes "' and # number signs
# comments are allowed to contain pairs 'of' "quotes"
some text # with an explanation
some "quoted text # not a comment" # comment
'# not a comment' and '# not a comment either' # comment
"# not a comment containing 'quotes\"" # another comment
shall be converted into the following output
some text
some "quoted text # not a comment"
'# not a comment' and '# not a comment either'
"# not a comment containing 'quotes""
I would like to accomplish this with popular Unix command line tools like awk, grep and sed on modern Debian/Ubuntu systems. I'm not strictly limited to features described by POSIX although a POSIX-compliant solution would be preferred.