Skip to content

Conversation

@gerceboss
Copy link

@gerceboss gerceboss commented Feb 7, 2026

What was wrong?

Improvement proposed in #1172
Issue #1193

To-Do

  • Serialisation changes and its tests
  • Add or update documentation if any related to these changes

@acul71 @seetadev

@seetadev
Copy link
Contributor

seetadev commented Feb 7, 2026

@gerceboss : Thank you Nilav.

Appreciate the update and the improvements proposed here — this looks like a solid step forward 👍

Inviting @yashksaini-coder, @sumanjeet0012 , and @acul71 to review the changes, especially around the serialization updates and related tests.

Appreciate the clarity on the context from #1172 and #1193. Looking forward to feedback and any suggestions before merge.

@gerceboss gerceboss force-pushed the nilav/multicodec-integration branch from 9ab1c06 to 552d090 Compare February 8, 2026 10:00
@yashksaini-coder
Copy link
Contributor

@gerceboss Is the PR ready for review or you are currently working on this ?

@gerceboss
Copy link
Author

Its ready for review @yashksaini-coder , I have left out the serialisation part as it was optional in the discussion so it can be added in another PR

Copy link
Contributor

@yashksaini-coder yashksaini-coder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required Fixes:-

  1. Breaking Changes: Binary format changes not clearly documented
  2. Fragile Parsing: Reverse indexing and hardcoded offsets
  3. Limited Hash Support: Assumes SHA2-256 throughout
  4. Test Coverage: Missing edge cases and backward compatibility tests
  5. Documentation: Needs updates reflecting new capabilities
  6. Error Handling: Some paths lack proper exception handling

Some Tests cases that needs to be done-

  1. Backward Compatibility

    def test_old_cid_format_decoding():
        """Verify old CIDs still work after changes"""
        # Use pre-generated CIDs from old format
        old_cid = bytes([0x01, 0x55, 0x12, 0x20, ...])
        # Should still verify correctly
  2. Varint Codec Handling

    def test_multibyte_varint_codec():
        """Test codecs with multi-byte varint encoding"""
        # Test codecs > 127 that require multi-byte varint
  3. String Codec Input

    def test_string_codec():
        cid1 = compute_cid_v1(data, "raw")
        cid2 = compute_cid_v1(data, CODEC_RAW)
        assert cid1 == cid2
  4. Invalid Codec Handling

    def test_invalid_codec():
        with pytest.raises(ValueError):
            compute_cid_v1(data, "nonexistent-codec")
        with pytest.raises(ValueError):
            compute_cid_v1(data, 0xFFFFFFFF)


return cid
# CIDv1 format: <version><codec-varint><multihash>
return bytes([CID_V1]) + codec_prefixed + multihash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using add_prefix() to get proper varint-encoded codec prefix is a great refactoring, but since codec was previously a single byte, now it's varint-encoded

  • This changes the binary format of CIDv1 which can lead to ⚠️ Breaking Change

  • This can impact the Existing CIDs that may not be backward compatible

  • Can you add migration strategy and version testing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compatibility:

  • Codecs < 128 (e.g., raw=0x55, dag-pb=0x70): Single-byte variant encoding
    means legacy and new formats are identical (backward compatible).
  • Codecs >= 128: Multi-byte variant encoding means formats are different (breaking).

Migration Strategy:

  1. Legacy CIDs with codec < 128 continue to work without migration.
  2. Legacy CIDs with codec >= 128 need recomputation from original data.
  3. Use some function like detect_cid_version() to identify format.
  4. Use some function migrate_legacy_cid() to convert when possible.

If this sounds good to you, I'll go ahead and implement the necessary functions and tests @yashksaini-coder

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation: Needs updates reflecting new capabilities
can you please guide on this ? where to update and some example ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerceboss Great to identify this, follow the main discussion post regarding the Multicodec integration #1172

Also I've created a small guide implementation, review and look at it, and see if it can help implement and answer your queries, #1172 (comment)

Keeping other in loop is important as well.
CC: @acul71

Comment on lines 56 to 70
):
self.public_key = public_key
self.payload_type = payload_type

# Normalise payload_type to a Code instance
if isinstance(payload_type, bytes):
codec_name = get_codec(payload_type)
self.payload_type_code = Code.from_string(codec_name)
elif isinstance(payload_type, str):
self.payload_type_code = Code.from_string(payload_type)
else:
self.payload_type_code = payload_type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Location: Envelope.__init__() codec normalization
  • Issue: No try-except for get_codec() or Code.from_string() failures
  • Recommendation:
    try:
        codec_name = get_codec(payload_type)
        self.payload_type_code = Code.from_string(codec_name)
    except Exception as e:
        raise ValueError(f"Invalid codec: {e}")

return compute_cid_v1(data, codec)


def parse_cid_codec(cid: bytes) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Issue: parse_cid_codec() returns "cidv0" for CIDv0
  • Problem: Not a standard multicodec name
  • Recommendation: Document or return None or "dag-pb"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see , its better to use DAG_PB.name 👍

if len(cid) >= 4:
# Return first 4 bytes (version + codec + hash type + hash length)
return cid[:4]
# For CIDv1 produced by this module, the structure is:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ⚠️ Critical: This assumes SHA2-256 (32-byte digest)
  • Hardcoded offset cid[-33] is fragile for other hash algorithms It may break if codec varint encoding is multi-byte. Add hash algorithm flexibility or explicit validation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only sha2-256 is supported in py-libp2p and I saw that multihash integration is out of scope for this one #1170
@yashksaini-coder

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okh I have seen the referred discussion post, #1170 (comment)

Copy link
Contributor

@yashksaini-coder yashksaini-coder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gerceboss Kindly check the reviewed code areas, and also add missing test cases

@yashksaini-coder
Copy link
Contributor

@gerceboss Have you completed the changes to the PR yet ?

@gerceboss gerceboss force-pushed the nilav/multicodec-integration branch from b5223d6 to cf0a4a8 Compare February 10, 2026 20:36
@gerceboss
Copy link
Author

gerceboss commented Feb 10, 2026

Updated with the changes necessary , added the docs as well @yashksaini-coder . One test was failing locally so had to change

# Run echo example as server via module so imports resolve correctly
    cmd = [sys.executable, "-u", "-m", "examples.echo.echo", "-p", "0"]

in examples/test_echo_thin_waist.py

@gerceboss gerceboss force-pushed the nilav/multicodec-integration branch from cf0a4a8 to e169751 Compare February 10, 2026 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants