How is segwit backward compatible (soft fork) when the transaction serialization structure is changed?

Question

I understand that SegWit is backward compatible (soft-fork) from this perspective.

So, for old nodes the following output script (scriptPubKey)

OP_n (where n from 0 to 16) <2-40 bytes>

... is considered as anyone-can-spend and in "concatenation" with the empty input script (scriptSig) it will be always valid.

For new (updated) nodes it will be considered as SegWit, so the special SegWit validation process, as Pieter Wuille described, will start.

Observing only this, everything is clear to me.

What is not clear to me is how all this will work when the structure of the transaction has been changed from SegWit.

Namely, according to bip144, the structure of the transaction has been changed by "expanding" the transaction with two bytes (marker (0x00) and flag (0x01)) after version field, and also adding witness data after tx_outs and before locktime.

For new (updated) nodes this is considered as a sign that there is witness data, however, for old nodes this is a transaction with 0 inputs and 1 output which is considered invalid so they will always discard this transaction (especially since there is also witness data that is not clear to them at all).

Even if we would consider that the old nodes would not relay these transactions to other peers, they would not even accept them in the blocks they receive from the miners, because for them these are completely invalid transactions (valid from the first perspective of inputs and outpus since they are anyone-can-spend, but invalid from this second, structure perspective).

Can someone explain to me how this works? I misunderstood something.

Yes, I read it, but I still don't understand 100%. I have additional questions. I've posted the questions below in a reply, but may as well post them here so RedGrittyBrick doesn't get too many notices. — dassd, Sep 10 '23 at 18:35

score 2 · Accepted Answer · answered Sep 10 '23 at 17:56

2

Old nodes never see the new serialisation format

When an old node asks a new node for data, the new node edits the data into the old serialisation format before sending it

answered Sep 10 '23 at 17:56

RedGrittyBrick

26,841
3
25
51

Okay, but when a node just want to relay the transaction it got to all its peers, how will he know if the peer is old or new (updated) node? – dassd Sep 10 '23 at 18:05
Part of the communications handshake distinguishes segwit nodes – RedGrittyBrick Sep 10 '23 at 18:23
Okay, I understand they can do that when we're talking about relaying. But what about the block transaction received by the miner? How will the old nodes confirm this since they don't understand the new structure? I mean, it can't be the old serialization format without witnesses and markers/flags because then the old and new nodes would have a different blockchain and this block would have a different hash? – dassd Sep 10 '23 at 18:34
1

@RedGrittyBrick While that's true, the handshake just lets the other side know that the peer can offer witness. To actually get witness data, a node has to ask for a witness_tx or a witness_block, rather than tx or block (if you ask for those, even from a segwit node, you get the stripped version). – Pieter Wuille Sep 10 '23 at 18:35
@joke The txid of a transaction is independent of whether the witnesses are included or not, and the hash of a block is the hash of its header - which also doesn't include the witnesses. Witnesses data is committed to in blocks, but just by including the hash of all the witnesses in the coinbase transaction somewhere. So no, you can just drop all the witness data from a witness block, give it to a pre-segwit peer, and it will be valid to them. – Pieter Wuille Sep 10 '23 at 18:36
@PieterWuille So the stripped version (of both the block and the transaction) is always sent, with the fact that during handshaking, the segWit node somehow announces itself (it will set some flag or something) as segWit node and that can also send witness data? – dassd Sep 10 '23 at 18:49
Right. NODE_WITNESS announces that witnesses are available, and if so, the getdata command can be used to ask for transactions, blocks, or for transactions+witness, or for block+witness. Without that flag, only transactions or blocks can be asked for. – Pieter Wuille Sep 10 '23 at 18:53
@PieterWuille Couple of question. 1) I assume that NODE_WITNESS flag is introduced with segWit, so it won't "confuse" old nodes? I mean, they will just ignore it and ask for tx/block (not tx+witness/block+witness)? 2) If we ask for tx+witness, we will get from node new serialization format (with marker+flag+witness), and if we ask for tx we get old serialization format? (of course all this in case that tx is segWit) 3) What does it mean block+witness and block? Is first that we get block consist of all tx with new ser format, while the second one is all tx with old ser format? – dassd Sep 10 '23 at 19:03
@joke To go in this much depth, I suggest you just read BIP144 which specifies the protocol/serialization changes. BIP141 specifies the segwit consensus rules, which go along with it. – Pieter Wuille Sep 10 '23 at 19:08
@PieterWuille I did, but I still do not understand it. It's still too technical for me, I'm trying to understand conceptually. That's why I'm trying this way by asking questions here. After I understand things conceptually I think I will then be able to understand the details. – dassd Sep 10 '23 at 19:11
1
Yes, unknown service flags are ignored by nodes that don't know them. 2) Yes, the marker/flag format is used for any transaction that has at least one non-empty witness (old format is still used for transactions without witnesses) 3) Yes, the witness block format is identical to telhe normal block format, but instead of normal transactions, witness transactions are included.

Pieter Wuille

Sep 10 '23 at 19:13

@PieterWuille If old nodes operate with stripped block and new nodes with the, let's call it, "full" blocks, does it mean that old nodes and new nodes will have different blockchain, since their blocks will be different? They will have same blocks in context of ids (hashes) and transactions, but blocks by their structure will be different? – dassd Sep 10 '23 at 19:57

2

@joke: It is probably clearer to think of it not as different blockchains but as the same blockchain with Segwit data stripped out. Just as we would say a pruning node and a non-pruning node have the same blockchain even though one node has more data than the other. – RedGrittyBrick Sep 10 '23 at 20:07

Clearly they don't have the exact same data, as the pre-segwit nodes lack the witnesses. But apart from that, whether that's "structurally" different will depend on how you define that. – Pieter Wuille Sep 10 '23 at 20:08

@PieterWuille "structurally" in context that old nodes will have blockchain made of blocks composed of transaction with old serialization format while new nodes will have blockchain made of block composed of transactions with new serialization format (that is, if tansaction has witness input, it will have additionaly marker, flag and witness data, otherwise old serialization format is used too). So blockchains in new and old nodes will differ in this "structurally" way? – dassd Sep 10 '23 at 20:20

@joke This is becoming a semantics discussion, I don't think this will serve anything. – Pieter Wuille Sep 10 '23 at 20:31

@PieterWuille Well its just important for me to know if old nodes have in their blockchain blocks with old serialization tx format, while new nodes have in their blockchain blocks with new serialization format, so thats how their blockchains differ? The other parts of their blockchains are the same. – dassd Sep 10 '23 at 20:46

1

@joke Yes, exactly. New nodes use the flag/marker serialization (which includes witness data) for transactions that have non-empty witnesses, while old nodes always use the old serialization format. To me however, serialization is just an implementation detail - the question is which data they have, as it can always be converted to another format. And in terms of data, they have exactly the same data, just without witness data. – Pieter Wuille Sep 10 '23 at 20:53

@PieterWuille Yes I understand that. Just I want to understand how will their blockchain differ? Is it as RedGrittyBrick said that in old nodes blockchain will be composed of stripped blocks (old tx serialization format) while new nodes has blockchain composed of blocks with witness data (new tx serialization format)? I am asking because if its like that, then I understand the way that segwit is backward compatible from this perspective. – dassd Sep 10 '23 at 21:05

@Joke I thought I had already confirmed that! Yes, old nodes only use the old serialization, which does not support witnesses, so yes, what they receive are by definition the blocks without the witnesses. They likely also store those blocks in that format, but they don't have to. – Pieter Wuille Sep 10 '23 at 21:07

@PieterWuille Oh, sorry, did not understand it like that. What do you mean by saying 'but they dont have to'? Does it mean that they can store it in some arbitrary way that suit them, I mean how do Bitcoin core nodes do that? – dassd Sep 10 '23 at 21:48

1

Yes, that's my point, nodes can use whatever format they want to store the data, the only thing that matters is what data they have available and which protocol they speak. Bitcoin Core stores blocks in the same format on disk as it'd sent on the network, packed together with multiple blocks in a files with some metadata before each block, and a separate database that contains an index of all those blocks. – Pieter Wuille Sep 10 '23 at 22:05

@PieterWuille Cool, and only different between old and new nodes in the context of block storage, if I understand correctly (and I hope I finally do), is that old nodes will only store blocks with the old ser format, and new nodes will also include a block with the new ser format (if there is a segwit transaction in it). I am talking about bitcoin core. True? – dassd Sep 10 '23 at 22:21

@joke Bitcoin Core will only store blocks once, using for each transaction the appropriate format (witness format if it has a witness, old format otherwise). If a pre-segwit peer asks for a non-witness block or transaction, the data is re-serialized in the old format (which discards witness data) before sending. – Pieter Wuille Sep 10 '23 at 22:25

@PieterWuille Yes, but that one time when the Bitcoin core node stores the block, if thats 1) old node: it stores block that consists only of all old serialization format transactions (no matter whether tx is segwit or not; it ifs segwit it will be without witness data/flag/marker and will be considered as anyone-can-spend) 2) new node: it stores block with old ser tx format and new ser tx format. If and old peer ask a new node for some block or tx, the new node will re-serialized in the old format before sending. Right? – dassd Sep 10 '23 at 22:35

If by "with old ser tx format and new ser tx format" you mean that the blocks it stored will use a mix of both formats, yes. It doesn't store everything twice. Otherwise, indeed! – Pieter Wuille Sep 10 '23 at 22:37

@PieterWuille Oh, I didn't express myself well. I just wanna say the following. If we have, for example, block with 10 tx where 3 are segwit and 7 are legacy (P2PKH for example), then the old nodes will store the block as 10 tx all serialized with old format (no witness data, no marker, no flag, 3 segwit tx are considered as anyone-can-spend). On the other hand, the new nodes will store the same block as 3 segwit tx with new ser format (yes witness, yes marker, yes flag, these tx are not anyone-can-spend) and for other 7 tx use old tx ser format. Right? This is only difference in storage? – dassd Sep 10 '23 at 22:46

1

@joke All correct. – Pieter Wuille Sep 10 '23 at 22:49

@PieterWuille Cool, thanks, everything clear. – dassd Sep 10 '23 at 22:59

How is segwit backward compatible (soft fork) when the transaction serialization structure is changed?

1 Answers1