-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Simplify block structure by removing unnecessary serialization/deserialization
Summary
The current block structure carries several fields as serialized byte arrays that are then deserialized on every access. This introduces unnecessary CPU overhead and code complexity. We should simplify the block structure by using typed fields directly and removing unused fields.
Current Block Structure in Fabric
message Block {
BlockHeader header = 1;
BlockData data = 2;
BlockMetadata metadata = 3;
}
message BlockHeader {
uint64 number = 1;
bytes previous_hash = 2;
bytes data_hash = 3;
}
message BlockData {
repeated bytes data = 1; // each entry is a serialized Envelope
}
message Envelope {
bytes payload = 1; // serialized Payload
bytes signature = 2;
}
message Payload {
Header header = 1;
bytes data = 2;
}
message Header {
bytes channel_header = 1; // serialized ChannelHeader
bytes signature_header = 2; // serialized SignatureHeader
}
message ChannelHeader {
int32 type = 1;
int32 version = 2;
google.protobuf.Timestamp timestamp = 3;
string channel_id = 4;
string tx_id = 5;
uint64 epoch = 6;
bytes extension = 7;
bytes tls_cert_hash = 8;
}
message SignatureHeader {
bytes creator = 1;
bytes nonce = 2;
}
Note how the structure is deeply nested with serialized bytes at every level: Block.data → Envelope.payload → Payload.header.channel_header / Payload.header.signature_header. Each layer requires a separate proto.Unmarshal call to access the typed fields within.
Problem
In the current implementation, various components of the block are stored as serialized []byte within the protobuf message and must be deserialized each time they are accessed. For example:
- Signature Header: The
SignatureHeaderis embedded as raw bytes inside the payload. Every consumer of this field must unmarshal it before use, even though it could be stored as a typed struct directly. - Common Header: Similarly, the
CommonHeader(channel header + signature header) is serialized into bytes within theHeaderfield of the payload. Accessing channel ID, tx type, or timestamp requires repeated deserialization. - Envelope payload: The
Payloadinside eachEnvelopeis carried as bytes, requiring deserialization at each processing stage (validation, commit, indexing, etc.).
This pattern leads to:
- Redundant CPU work — the same fields are deserialized multiple times across the transaction lifecycle (endorsement, ordering, validation, commit).
- Verbose boilerplate — every call site needs error-handling logic around
proto.Unmarshalfor what are conceptually direct field accesses. - Increased GC pressure — repeated deserialization creates short-lived objects that add to garbage collection overhead.
Proposal
- Use typed structs instead of byte slices — Replace serialized byte fields (e.g.,
SignatureHeader,ChannelHeader) with their corresponding typed protobuf message fields in the internal block representation. - Deserialize once at ingress — Parse the block fully when it is first received (e.g., at the orderer or committer ingress) and pass the fully typed structure through the pipeline.
- Remove unused fields — Audit the block and envelope structures for fields that are never read or are redundant, and remove them from the internal representation.
- Serialize only at egress — Re-serialize to the wire format only when the block needs to be persisted or transmitted over the network.
Example
Before:
// Every call site repeats this pattern
payload := &cb.Payload{}
err := proto.Unmarshal(envelope.Payload, payload)
// ...
header := &cb.SignatureHeader{}
err = proto.Unmarshal(payload.Header.SignatureHeader, header)
// ... finally use header.Creator, header.Nonce
After:
// Direct typed access — deserialized once at ingress
creator := envelope.Payload.Header.SignatureHeader.Creator
nonce := envelope.Payload.Header.SignatureHeader.Nonce
Expected Benefits
- Reduced CPU usage from eliminating redundant
proto.Unmarshalcalls across the transaction pipeline - Simpler, more readable code with direct field access
- Lower GC pressure from fewer intermediate allocations
- Smaller internal block representation after removing unused fields
Backward-Compatible Evolution via oneof
If we need to introduce a new SignatureHeader type (e.g., one with typed fields or additional metadata), we could potentially leverage protobuf's oneof to do so without breaking backward compatibility:
// Current Header definition (see above)
message Header {
bytes channel_header = 1; // serialized ChannelHeader
bytes signature_header = 2; // serialized SignatureHeader
}
// Proposed Header with oneof for backward-compatible evolution
// Since Header only has fields 1 and 2, field 3 is safe to use.
// ⚠️ For messages with more fields, always verify the next unused field number.
message Header {
bytes channel_header = 1; // existing: serialized ChannelHeader
oneof signature_header_type {
bytes signature_header = 2; // existing: serialized SignatureHeader
SignatureHeaderV2 signature_header_v2 = 3; // new: typed, uses next available field number
}
}
How this could work:
- Existing nodes that only understand the old format would continue reading field 2 (
signature_headerbytes) and deserializing as before. - Newer nodes would populate and read field 3 (
signature_header_v2) directly, avoiding the serialize/deserialize overhead. - Since
oneoffields are mutually exclusive, only one representation is present on the wire at a time — no storage overhead from carrying both. - The same
oneofpattern could also be applied tochannel_header(field 1) in the future if needed.
oneof candidates, the new oneof member cannot reuse an occupied field number. Protobuf will either reject the descriptor outright ("duplicate field number") or, when old and new nodes run different schema versions, cause silent data corruption — the old node would misinterpret the serialized embedded message bytes as the original field's data. The safe approach is to always use the next available unused field number.
Considerations:
- Wire compatibility is confirmed for old → new direction: Old nodes can produce data using field 2 (bytes) and new nodes correctly identify it via the
oneofdiscriminator. New nodes can also continue writing the old bytes format for full backward compatibility. - New V2 → old nodes requires gating: When a new node writes using
signature_header_v2(field 3), old nodes see an emptysignature_header— the new field is unknown to them and silently ignored. This means a capability flag or channel configuration update is required to gate when the new format is used during rolling upgrades. - Field number safety varies by message: For
Headerspecifically, field 3 is safe since only fields 1 and 2 exist today. When applying this pattern to other messages (e.g.,ChannelHeaderwhich has fields up to 8), care must be taken to use a truly unused field number. - Ledger persistence: If the block is persisted to the ledger in the new format, older binaries reading historical blocks post-upgrade must handle both variants. This may require a migration strategy or dual-write period.
This approach is viable but requires a clear upgrade path and capability gating before it can be adopted.
Compatibility Matrix
| Scenario | Result |
|---|---|
| Old node writes → New node reads | ✅ Fully compatible — new node reads bytes via oneof |
| New node writes V2 → Old node reads | |
| New node writes old format → Old node reads | ✅ Fully compatible — new node can still produce old wire format |
| Oneof mutual exclusivity | ✅ Setting one field correctly clears the other |
| Field number collision (reusing existing field number for oneof) | ❌ Silent data corruption — old node misinterprets V2 bytes as the original field |
| Safe approach (unused field number for oneof) | ✅ All existing fields preserved, V2 invisible to old nodes |