add multicodec #1194

gerceboss · 2026-02-07T11:13:05Z

What was wrong?

Improvement proposed in #1172
Issue #1193

To-Do

Serialisation changes and its tests
Add or update documentation if any related to these changes

@acul71 @seetadev

seetadev · 2026-02-07T13:16:13Z

@gerceboss : Thank you Nilav.

Appreciate the update and the improvements proposed here — this looks like a solid step forward 👍

Inviting @yashksaini-coder, @sumanjeet0012 , and @acul71 to review the changes, especially around the serialization updates and related tests.

Appreciate the clarity on the context from #1172 and #1193. Looking forward to feedback and any suggestions before merge.

yashksaini-coder · 2026-02-08T13:40:50Z

@gerceboss Is the PR ready for review or you are currently working on this ?

gerceboss · 2026-02-08T13:50:10Z

Its ready for review @yashksaini-coder , I have left out the serialisation part as it was optional in the discussion so it can be added in another PR

yashksaini-coder

Required Fixes:-

Breaking Changes: Binary format changes not clearly documented
Fragile Parsing: Reverse indexing and hardcoded offsets
Limited Hash Support: Assumes SHA2-256 throughout
Test Coverage: Missing edge cases and backward compatibility tests
Documentation: Needs updates reflecting new capabilities
Error Handling: Some paths lack proper exception handling

Some Tests cases that needs to be done-

Backward Compatibility

def test_old_cid_format_decoding():
    """Verify old CIDs still work after changes"""
    # Use pre-generated CIDs from old format
    old_cid = bytes([0x01, 0x55, 0x12, 0x20, ...])
    # Should still verify correctly

Varint Codec Handling

def test_multibyte_varint_codec():
    """Test codecs with multi-byte varint encoding"""
    # Test codecs > 127 that require multi-byte varint

String Codec Input

def test_string_codec():
    cid1 = compute_cid_v1(data, "raw")
    cid2 = compute_cid_v1(data, CODEC_RAW)
    assert cid1 == cid2

Invalid Codec Handling

def test_invalid_codec():
    with pytest.raises(ValueError):
        compute_cid_v1(data, "nonexistent-codec")
    with pytest.raises(ValueError):
        compute_cid_v1(data, 0xFFFFFFFF)

yashksaini-coder · 2026-02-08T17:43:38Z

libp2p/bitswap/cid.py


-    return cid
+    # CIDv1 format: <version><codec-varint><multihash>
+    return bytes([CID_V1]) + codec_prefixed + multihash


Using add_prefix() to get proper varint-encoded codec prefix is a great refactoring, but since codec was previously a single byte, now it's varint-encoded

This changes the binary format of CIDv1 which can lead to ⚠️ Breaking Change

This can impact the Existing CIDs that may not be backward compatible

Can you add migration strategy and version testing

Compatibility:

Codecs < 128 (e.g., raw=0x55, dag-pb=0x70): Single-byte variant encoding
means legacy and new formats are identical (backward compatible).

Codecs >= 128: Multi-byte variant encoding means formats are different (breaking).

Migration Strategy:

Legacy CIDs with codec < 128 continue to work without migration.

Legacy CIDs with codec >= 128 need recomputation from original data.

Use some function like detect_cid_version() to identify format.

Use some function migrate_legacy_cid() to convert when possible.

If this sounds good to you, I'll go ahead and implement the necessary functions and tests @yashksaini-coder

Documentation: Needs updates reflecting new capabilities
can you please guide on this ? where to update and some example ?

@gerceboss Great to identify this, follow the main discussion post regarding the Multicodec integration #1172

Also I've created a small guide implementation, review and look at it, and see if it can help implement and answer your queries, #1172 (comment)

Keeping other in loop is important as well.
CC: @acul71

yashksaini-coder · 2026-02-08T17:57:30Z

libp2p/peer/envelope.py

    ):
        self.public_key = public_key
-        self.payload_type = payload_type
+
+        # Normalise payload_type to a Code instance
+        if isinstance(payload_type, bytes):
+            codec_name = get_codec(payload_type)
+            self.payload_type_code = Code.from_string(codec_name)
+        elif isinstance(payload_type, str):
+            self.payload_type_code = Code.from_string(payload_type)
+        else:
+            self.payload_type_code = payload_type
+


Location: Envelope.__init__() codec normalization

Issue: No try-except for get_codec() or Code.from_string() failures

Recommendation:
try: codec_name = get_codec(payload_type) self.payload_type_code = Code.from_string(codec_name) except Exception as e: raise ValueError(f"Invalid codec: {e}")

yashksaini-coder · 2026-02-08T17:59:01Z

libp2p/bitswap/cid.py

        return compute_cid_v1(data, codec)
+
+
+def parse_cid_codec(cid: bytes) -> str:


Issue: parse_cid_codec() returns "cidv0" for CIDv0

Problem: Not a standard multicodec name

Recommendation: Document or return None or "dag-pb"

I see , its better to use DAG_PB.name 👍

yashksaini-coder · 2026-02-08T18:10:07Z

libp2p/bitswap/cid.py

-        if len(cid) >= 4:
-            # Return first 4 bytes (version + codec + hash type + hash length)
-            return cid[:4]
+        # For CIDv1 produced by this module, the structure is:


⚠️ Critical: This assumes SHA2-256 (32-byte digest)

Hardcoded offset cid[-33] is fragile for other hash algorithms It may break if codec varint encoding is multi-byte. Add hash algorithm flexibility or explicit validation

Currently only sha2-256 is supported in py-libp2p and I saw that multihash integration is out of scope for this one #1170
@yashksaini-coder

Okh I have seen the referred discussion post, #1170 (comment)

yashksaini-coder

@gerceboss Kindly check the reviewed code areas, and also add missing test cases

yashksaini-coder · 2026-02-10T17:06:33Z

@gerceboss Have you completed the changes to the PR yet ?

gerceboss · 2026-02-10T20:40:44Z

Updated with the changes necessary , added the docs as well @yashksaini-coder . One test was failing locally so had to change

# Run echo example as server via module so imports resolve correctly
    cmd = [sys.executable, "-u", "-m", "examples.echo.echo", "-p", "0"]

in examples/test_echo_thin_waist.py

gerceboss force-pushed the nilav/multicodec-integration branch from 9ab1c06 to 552d090 Compare February 8, 2026 10:00

yashksaini-coder reviewed Feb 8, 2026

View reviewed changes

yashksaini-coder suggested changes Feb 8, 2026

View reviewed changes

gerceboss requested a review from yashksaini-coder February 9, 2026 18:40

Nilav added 3 commits February 11, 2026 02:06

add multicodec

3c5cd0d

fix the hash_length byte

d54c5cf

add migration functions and tests

4817aa5

gerceboss force-pushed the nilav/multicodec-integration branch from b5223d6 to cf0a4a8 Compare February 10, 2026 20:36

add codec docs for breaking changes

e169751

gerceboss force-pushed the nilav/multicodec-integration branch from cf0a4a8 to e169751 Compare February 10, 2026 20:45

		return compute_cid_v1(data, codec)


		def parse_cid_codec(cid: bytes) -> str:

add multicodec #1194

Are you sure you want to change the base?

add multicodec #1194

Uh oh!

Conversation

gerceboss commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was wrong?

To-Do

Uh oh!

seetadev commented Feb 7, 2026

Uh oh!

yashksaini-coder commented Feb 8, 2026

Uh oh!

gerceboss commented Feb 8, 2026

Uh oh!

yashksaini-coder left a comment

Choose a reason for hiding this comment

Required Fixes:-

Some Tests cases that needs to be done-

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Compatibility:

Migration Strategy:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yashksaini-coder left a comment

Choose a reason for hiding this comment

Uh oh!

yashksaini-coder commented Feb 10, 2026

Uh oh!

gerceboss commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gerceboss commented Feb 7, 2026 •

edited

Loading

gerceboss commented Feb 10, 2026 •

edited

Loading