Skip to content

[refactor] Simplify block structure by removing unnecessary serialization/deserialization #76

@cendhu

Description

@cendhu

Simplify block structure by removing unnecessary serialization/deserialization

Summary

The current block structure carries several fields as serialized byte arrays that are then deserialized on every access. This introduces unnecessary CPU overhead and code complexity. We should simplify the block structure by using typed fields directly and removing unused fields.

Current Block Structure in Fabric

message Block {
  BlockHeader header = 1;
  BlockData data = 2;
  BlockMetadata metadata = 3;
}
message BlockHeader {
  uint64 number = 1;
  bytes previous_hash = 2;
  bytes data_hash = 3;
}
message BlockData {
  repeated bytes data = 1;  // each entry is a serialized Envelope
}
message Envelope {
  bytes payload = 1;    // serialized Payload
  bytes signature = 2;
}
message Payload {
  Header header = 1;
  bytes data = 2;
}
message Header {
  bytes channel_header = 1;    // serialized ChannelHeader
  bytes signature_header = 2;  // serialized SignatureHeader
}
message ChannelHeader {
  int32 type = 1;
  int32 version = 2;
  google.protobuf.Timestamp timestamp = 3;
  string channel_id = 4;
  string tx_id = 5;
  uint64 epoch = 6;
  bytes extension = 7;
  bytes tls_cert_hash = 8;
}
message SignatureHeader {
  bytes creator = 1;
  bytes nonce = 2;
}

Note how the structure is deeply nested with serialized bytes at every level: Block.dataEnvelope.payloadPayload.header.channel_header / Payload.header.signature_header. Each layer requires a separate proto.Unmarshal call to access the typed fields within.

Problem

In the current implementation, various components of the block are stored as serialized []byte within the protobuf message and must be deserialized each time they are accessed. For example:

  • Signature Header: The SignatureHeader is embedded as raw bytes inside the payload. Every consumer of this field must unmarshal it before use, even though it could be stored as a typed struct directly.
  • Common Header: Similarly, the CommonHeader (channel header + signature header) is serialized into bytes within the Header field of the payload. Accessing channel ID, tx type, or timestamp requires repeated deserialization.
  • Envelope payload: The Payload inside each Envelope is carried as bytes, requiring deserialization at each processing stage (validation, commit, indexing, etc.).

This pattern leads to:

  1. Redundant CPU work — the same fields are deserialized multiple times across the transaction lifecycle (endorsement, ordering, validation, commit).
  2. Verbose boilerplate — every call site needs error-handling logic around proto.Unmarshal for what are conceptually direct field accesses.
  3. Increased GC pressure — repeated deserialization creates short-lived objects that add to garbage collection overhead.

Proposal

  1. Use typed structs instead of byte slices — Replace serialized byte fields (e.g., SignatureHeader, ChannelHeader) with their corresponding typed protobuf message fields in the internal block representation.
  2. Deserialize once at ingress — Parse the block fully when it is first received (e.g., at the orderer or committer ingress) and pass the fully typed structure through the pipeline.
  3. Remove unused fields — Audit the block and envelope structures for fields that are never read or are redundant, and remove them from the internal representation.
  4. Serialize only at egress — Re-serialize to the wire format only when the block needs to be persisted or transmitted over the network.

Example

Before:

// Every call site repeats this pattern
payload := &cb.Payload{}
err := proto.Unmarshal(envelope.Payload, payload)
// ...
header := &cb.SignatureHeader{}
err = proto.Unmarshal(payload.Header.SignatureHeader, header)
// ... finally use header.Creator, header.Nonce

After:

// Direct typed access — deserialized once at ingress
creator := envelope.Payload.Header.SignatureHeader.Creator
nonce := envelope.Payload.Header.SignatureHeader.Nonce

Expected Benefits

  • Reduced CPU usage from eliminating redundant proto.Unmarshal calls across the transaction pipeline
  • Simpler, more readable code with direct field access
  • Lower GC pressure from fewer intermediate allocations
  • Smaller internal block representation after removing unused fields

Backward-Compatible Evolution via oneof

If we need to introduce a new SignatureHeader type (e.g., one with typed fields or additional metadata), we could potentially leverage protobuf's oneof to do so without breaking backward compatibility:

// Current Header definition (see above)
message Header {
  bytes channel_header = 1;    // serialized ChannelHeader
  bytes signature_header = 2;  // serialized SignatureHeader
}
// Proposed Header with oneof for backward-compatible evolution
// Since Header only has fields 1 and 2, field 3 is safe to use.
// ⚠️  For messages with more fields, always verify the next unused field number.
message Header {
  bytes channel_header = 1;    // existing: serialized ChannelHeader
  oneof signature_header_type {
    bytes signature_header = 2;                // existing: serialized SignatureHeader
    SignatureHeaderV2 signature_header_v2 = 3; // new: typed, uses next available field number
  }
}

How this could work:

  • Existing nodes that only understand the old format would continue reading field 2 (signature_header bytes) and deserializing as before.
  • Newer nodes would populate and read field 3 (signature_header_v2) directly, avoiding the serialize/deserialize overhead.
  • Since oneof fields are mutually exclusive, only one representation is present on the wire at a time — no storage overhead from carrying both.
  • The same oneof pattern could also be applied to channel_header (field 1) in the future if needed.

⚠️ Critical: New oneof members must use unused field numbers. If the existing message has additional fields beyond the oneof candidates, the new oneof member cannot reuse an occupied field number. Protobuf will either reject the descriptor outright ("duplicate field number") or, when old and new nodes run different schema versions, cause silent data corruption — the old node would misinterpret the serialized embedded message bytes as the original field's data. The safe approach is to always use the next available unused field number.

Considerations:

  • Wire compatibility is confirmed for old → new direction: Old nodes can produce data using field 2 (bytes) and new nodes correctly identify it via the oneof discriminator. New nodes can also continue writing the old bytes format for full backward compatibility.
  • New V2 → old nodes requires gating: When a new node writes using signature_header_v2 (field 3), old nodes see an empty signature_header — the new field is unknown to them and silently ignored. This means a capability flag or channel configuration update is required to gate when the new format is used during rolling upgrades.
  • Field number safety varies by message: For Header specifically, field 3 is safe since only fields 1 and 2 exist today. When applying this pattern to other messages (e.g., ChannelHeader which has fields up to 8), care must be taken to use a truly unused field number.
  • Ledger persistence: If the block is persisted to the ledger in the new format, older binaries reading historical blocks post-upgrade must handle both variants. This may require a migration strategy or dual-write period.

This approach is viable but requires a clear upgrade path and capability gating before it can be adopted.

Compatibility Matrix

Scenario Result
Old node writes → New node reads ✅ Fully compatible — new node reads bytes via oneof
New node writes V2 → Old node reads ⚠️ Old node sees empty signature_header — V2 field is unknown
New node writes old format → Old node reads ✅ Fully compatible — new node can still produce old wire format
Oneof mutual exclusivity ✅ Setting one field correctly clears the other
Field number collision (reusing existing field number for oneof) ❌ Silent data corruption — old node misinterprets V2 bytes as the original field
Safe approach (unused field number for oneof) ✅ All existing fields preserved, V2 invisible to old nodes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions