So it's high-time this question had an answer, and, though I eventually intuitively worked out the how to do this correctly in pretty much every case some time ago, I only very recently managed to fairly concrete that understanding with the text in the standard. It's actually stated there fairly simply - I just stupidly overlooked it many times, I guess.
The relevant portions of the text are all found under the heading...
Editing Commands in sed:
The argument text shall consist of one or more lines. Each embedded \newline in the text shall be preceded by a \backslash. Other backslashes in text shall be removed, and the following character shall be treated literally.
The r and w command verbs, and the w flag to the s command, take an optional rfile (or wfile) parameter, separated from the command verb letter or flag by one or more <blank>s; implementations may allow zero separation as an extension.
Command verbs other than {, a, b, c, i, r, t, w, :, and # can be followed by a ;semicolon, optional <blank>s, and another command verb. However, when the s command verb is used with the w flag, following it with another command in this manner produces undefined results.
...in...
And last in...
Operands:
- script - A string to be used as the script of editing commands. The application shall not present a script that violates the restrictions of a text file except that the final character need not be a
\newline.
So, when you take it altogether, it makes sense that any command which is optionally followed by an arbitrary parameter without a predefined delimiter (as opposed to s d sub d repl d flag for example) should delimit at an unescaped \newline.
It is arguable that the ; is a predefined delimiter but in that case using the ; for any of [aic] commands would necessitate that a separate parser be included in the implementation specifically for those three commands - separate, that is, from the parser used for [:brw], for example. Or else the implementation would have to require that ; also be backslash escaped within the text parameter and it only grows more complicated from there on.
If I were writing a sed which I desired to be both compliant and efficient, then I would not write such a separate parser, I expect - except that maybe [aic] should gen a syntax error if not immediately followed by a \newline. But that is a simple tokenization problem - the end delimiter case is generally the more problematic one. I would just write it so:
sed -e w\ file\\ -e one -e '...;and more commands'
...and...
sed -e a\\ -e appended\\ -e text -e '...;and more commands'
...would behave very similarly, in that the first would create and write to a file named:
file
one
...and the second would append a block of text to the current line on output like...
appended
text
...because both would share the same parsing code for the parameter.
And regarding the { ... } and $! issue - well, I was way off there. A single command preceded by an address is not a function but rather it is just an addressed command. Almost all commands - including { function definition } are specified to accept /one/ or /one/,/two/ addresses - with the exception of #comment and :label definition. And an address can be either a line number or a regular express and can be negated with !. So all of...
$!d
/address/s/ub/stitution/
5!y/d/c/
...can be followed by a ; and more commands according to standard, but if more commands are required for a single address, and that address should not be reevaluated following the execution of each command, then a { function } should be used like:
/address/{ s//replace addressed pattern/
s/do other conditional/substitutions/
s/in the same context/without/
s/reevaluating/address/
}
...where { cannot be followed on the same line by a closing } and that a closing } cannot occur except at the start of a line. But if a contained command should not otherwise be followed by a \newline, then it need not within the function either. So all of the above s///ubstitutions - and even the closing } brace, can be portably followed by ; semicolons and further commands.
I keep talking about \newline delimiters but the question is instead about -expression statements, I know. But the two are really one and the same, and the key relation is that a script can be either a literal command-line argument or a file with either of -[ef], and that both are interpreted as text files (which are specified to end in a \newline) but neither need actually end in a \newline. By this I can reasonbly (I hope) infer that a \0NUL delimited argument implies an ending \newline, and as all invocation arguments get at least) a \0NUL delimiter anyway, then either should work fine.
In fact, in practice, in every case but one where the standard specifies a \backslash escaped newline should be required, I have portably found...
sed -e ... -e '...\' -e '...'
...to work just as well. And in every case - again, in practice - where a non-escaped \newline should be required...
sed -e '...' -e '...'
...has worked for me, too. The one exception I mention above is...
sed -e 's/.../...\' -e '.../'
...which does not work for any implementation in any of my tests. I'm fairly sure that falls back to the text file requirement and the fact that s/// comes with a delimiter and so there is no reason a single statement should span \0NUL delimited arguments.
So, in conclusion, here is a short rundown of portable ways to write several kinds of sed commands:
For any of [aic]:
...commands;[aic]\
text embedded newline\
delimiting newline
...more;commands...
...or...
sed -e '...commands;[aic]\' -e 'text embedded newline\' -e 'delimiting newline' -e '.;.;.'
For any of [:rwtb] where the parameter is optional (for all but :) but the delimiting \newline is not. Note that I have never had a reason to try multiple line label parameters as would be used with [:tb], but that writing/reading to multiple lines in [rw]file parameters is usually accepted without question by seds I have tested so long as the embedded \newline is escaped w/ a \backslash. Still, the standard does not directly specify that label and [rw]file parameters should be parsed identically to text parameters and makes no mention of \newlines regarding the first two except as it delimits them.
...commands;[:trwb] parameter
...more;commands...
...or...
sed -e '[:trwb] parameter' -e '...'
...where the <space> above is optional for [:tb].
And last...
...;address[!]{ ...function;commands...
};...more;commands....
...or...
sed -e '...;address[!]{ ...function;commands...' -e '};...more;commands...'
...where any of the aforementioned commands (excepting :) also accept at least one address and which can be either a /regexp/ or a line number and might be negated with !, but if more than one command is necessary for a single evaluation of address then { function context } delimiting braces must be used. A function can contain even multiple \newline delimited commands, but each must be delimited within the braces as it would be otherwise.
And that's how to write portable sed scripts.
b;n;:b, you're branching to the label called";n;:b"in historical and POSIX seds (and GNU sed is not in that regards). – Stéphane Chazelas Aug 05 '14 at 16:49:part - you drove that home months ago. But I don't fully understand why the secondsedcommand was similarly POSIXified. – mikeserv Aug 05 '14 at 16:54b;h;n;G;P;Dbranches to the;h;n;G;P;Dlabel. What do you mean? – Stéphane Chazelas Aug 05 '14 at 16:56sedis very unclear to me. I've requested clarifications a few times in the past, but I don't think it was updated as a result. A good test is to try with the heirloom toolchest (Solaris one, derived from the original and which the POSIX spec is largely based on). – Stéphane Chazelas Aug 05 '14 at 16:59!is not involved at all? I mean what I asked - how does the!relate to a function and/or breaks required due to the}? Does it at all?hmmm.... any chance you could put something to that effect in an answer? – mikeserv Aug 05 '14 at 16:59sedlearners: simply memorize-eas to be mandatory if, for instance, you want to do multiple replacements using regex, e. g.sed -e 's/foo/bar/' -e 's/foo/baz/g'. Mandatory means, "mandatory unless you want to do something really stupid" (like pipe a couplesedstatements in a chain when there is no reason to do so; always gives me the shivers when I see this) – syntaxerror Dec 05 '14 at 16:22s///ubstitutions are spec'd to accept chaining with a ; . it gets blurry around commands that must be delimited with a newline and how-ecan stand in in that case - at least it does for me. ive yet to stumble on asedthat doesnt interpret them pretty interchangeably though. – mikeserv Dec 05 '14 at 16:27-ethen (still better than this stupid piping after all, no? ;)). But ... um ... of course you're right. I confess I always keep forgetting that you can use the;as a separator even between differents///regex statements. I'm just not used to it. Heck knows why. – syntaxerror Dec 05 '14 at 16:30sedsimply will not do though - particularly when it comes to line counting. you need fresh input for fresh counts following edits - and you need anothersed- or else some algorithm which otherwise handles it. – mikeserv Dec 05 '14 at 16:35s///ubs are not so clear cut as all that. For example:s///w fileshould need a newline or-efollowing it, and you need the same whens///;testing a substitution as well, i suppose. – mikeserv Dec 05 '14 at 16:46-e-concatenatedsedlines and now! That's worlds apart, and has given a readability boost by at least 50 percent. – syntaxerror Dec 05 '14 at 17:01;before a newline - a newline is fine. Honestly, you could do without the-eand all entirely and just write a file like#!/bin/sedwith each command on a newline - or those that don't require such delimiters instead delimited with;. The ones that do require newlines are usually the ones that take arbitrary input -:label names and commands that refer to them likebortor closing}curlies for functions, orread andwrite which take filename args. They all portably need to be followed by\n. – mikeserv Dec 05 '14 at 17:06aiandcwhich all accept any kind of input up to next newline. – mikeserv Dec 05 '14 at 17:07;stuff indeed. OK, to my excuse I just thought it's better to be safe than sorry. ;-) But many thanks, we're getting there. :p BUT OTOH, fiddling with this stuff in a trial-and-error fashion will usually spawn zillions of unrelated, highly-confusing error messages and warnings, so this is why I'd commonly like to avoid that at all costs. – syntaxerror Dec 05 '14 at 17:09sedin something like an eval chain on a file. – mikeserv Dec 05 '14 at 17:14sedlearners". You definitely are no learner (LOL), more like a pro asking other uber-pros like Stéphane to squeeze out the very last quirks that remain. ;) grin – syntaxerror Dec 05 '14 at 17:15sedthough - but the topics covered in this question ive pretty much come to terms with since i asked it. – mikeserv Dec 05 '14 at 17:21