Addresses #1180 Integrate py-multihash v3 API in Bitswap CID module #1186

aniruddha1295 · 2026-02-03T22:14:52Z

Integrate py-multihash v3 API (Complete Implementation)

What was wrong?

The codebase was using manual byte manipulation for multihash operations and exception-based validation instead of leveraging the py-multihash v3 API that's already available in our dependencies. This made the code harder to maintain and didn't take advantage of the library's built-in validation and error handling.

Based on discussion #1170, this PR addresses all 3 priorities from the issue.

How was it fixed?

Phase 1: Bitswap CID Module

Updated libp2p/bitswap/cid.py:

Replaced manual hashlib.sha256() + byte construction with multihash.digest() and mh.encode()
Refactored verify_cid() to use multihash.decode() and mh.verify() instead of manual byte slicing (reduced from 55 lines to 30 lines)
Updated compute_cid_v0(), compute_cid_v1(), and reconstruct_cid_from_prefix_and_data() to use the multihash API

Added tests in tests/core/bitswap/test_cid.py:

8 new compatibility tests for edge cases (malformed multihash, truncated CIDs, etc.)
4 performance benchmarks to validate the changes

Phase 2: DAG Streaming

Updated libp2p/bitswap/cid.py and libp2p/bitswap/dag.py:

Added compute_cid_v1_stream() using multihash.sum_stream() for memory-efficient hashing
Applied streaming to single-block files to avoid loading large files into memory during hash computation
Updated docstrings to document the streaming capability

Note: I applied streaming to single-block files where the benefit is clear. Multi-block files already use chunking (256KB chunks), so streaming each individual chunk would provide minimal benefit. Happy to extend this if you think it would be valuable.

Phase 3: Records Validation

Updated libp2p/records/pubkey.py:

Replaced exception-based validation with multihash.is_valid() to avoid exception overhead
Updated docstrings to reflect the change

…1180 (Priority 1 from discussion libp2p#1170)

Copilot

Pull request overview

This pull request integrates the py-multihash v3 API into the Bitswap CID module to replace manual byte manipulation with proper library calls, addressing Priority 1 from issue #1180 and discussion #1170.

Changes:

Replaced manual multihash construction using hashlib.sha256() and byte concatenation with multihash.digest() and mh.encode() API calls
Refactored verify_cid() function from 55 lines to 30 lines using multihash.decode() and mh.verify() for cleaner, more maintainable code
Added 12 new tests covering edge cases (malformed multihash, truncated CIDs, empty CIDs) and 4 performance benchmark tests

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
libp2p/bitswap/cid.py	Refactored CID computation and verification functions to use py-multihash v3 API, eliminating manual byte manipulation and improving error handling
tests/core/bitswap/test_cid.py	Added comprehensive compatibility tests for edge cases and performance benchmarks to validate the py-multihash v3 integration
newsfragments/1180.feature.rst	Added changelog entry documenting the integration of py-multihash v3 API with backward compatibility guarantee

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-03T22:19:40Z

libp2p/bitswap/cid.py

+        match = mh.verify(data)
+        logger.debug(f"        Verification: {'MATCH' if match else 'MISMATCH'}")
+        return match
+    except (ValueError, IndexError) as e:


The exception handling here catches ValueError and IndexError, but based on how multihash.decode() is used elsewhere in the codebase (e.g., libp2p/peer/id.py:117 and libp2p/records/pubkey.py:40), it appears the library may raise other exception types as well. Those instances use a broad except Exception clause.

Consider catching a broader set of exceptions or just Exception to ensure all multihash decoding errors are handled gracefully, especially since this is error-handling code where returning False is the appropriate fallback behavior.

Suggested change

except (ValueError, IndexError) as e:

except Exception as e:

Copilot · 2026-02-03T22:19:41Z

tests/core/bitswap/test_cid.py

+        assert elapsed < 0.5, (
+            f"CID computation too slow: {elapsed:.3f}s for {iterations} iterations"
+        )
+
+        # Log performance for reference
+        print(
+            f"\nCID computation: {avg_time * 1000:.2f}ms per 1MB "
+            f"(total: {elapsed:.3f}s for {iterations} iterations)"
+        )
+
+    def test_verification_performance(self):
+        """Benchmark CID verification speed."""
+        import time
+
+        # Test with 1MB of data
+        data = b"x" * (1024 * 1024)
+        cid = compute_cid_v1(data)
+        iterations = 10
+
+        # Warm up
+        for _ in range(2):
+            verify_cid(cid, data)
+
+        # Benchmark
+        start = time.perf_counter()
+        for _ in range(iterations):
+            verify_cid(cid, data)
+        elapsed = time.perf_counter() - start
+
+        avg_time = elapsed / iterations
+
+        # Should complete 10 iterations of 1MB verification in reasonable time
+        # Expected: < 0.5 seconds total (< 50ms per iteration)
+        assert elapsed < 0.5, (
+            f"CID verification too slow: {elapsed:.3f}s for {iterations} iterations"
+        )
+
+        # Log performance for reference
+        print(
+            f"\nCID verification: {avg_time * 1000:.2f}ms per 1MB "
+            f"(total: {elapsed:.3f}s for {iterations} iterations)"
+        )
+
+    def test_small_data_performance(self):
+        """Benchmark performance with small data (typical use case)."""
+        import time
+
+        # Test with small data (1KB)
+        data = b"x" * 1024
+        iterations = 1000
+
+        # Warm up
+        for _ in range(10):
+            cid = compute_cid_v1(data)
+            verify_cid(cid, data)
+
+        # Benchmark computation
+        start = time.perf_counter()
+        for _ in range(iterations):
+            compute_cid_v1(data)
+        comp_elapsed = time.perf_counter() - start
+
+        # Benchmark verification
+        cid = compute_cid_v1(data)
+        start = time.perf_counter()
+        for _ in range(iterations):
+            verify_cid(cid, data)
+        verify_elapsed = time.perf_counter() - start
+
+        # Should handle 1000 iterations of 1KB quickly
+        # Expected: < 0.2 seconds for computation, < 0.2 seconds for verification
+        assert comp_elapsed < 0.2, (
+            f"Small data computation too slow: {comp_elapsed:.3f}s"
+        )
+        assert verify_elapsed < 0.2, (
+            f"Small data verification too slow: {verify_elapsed:.3f}s"
+        )


These performance assertions use fixed time thresholds (0.5 seconds, 0.2 seconds) that may be too strict for CI environments or slower machines. Performance tests with hard time limits can cause flaky test failures in continuous integration systems with variable load.

Consider either:

Making these thresholds configurable via environment variables

Significantly increasing the thresholds to be more forgiving (e.g., 2-5x current values)

Converting these to benchmark tests that log performance without asserting on specific thresholds

Using relative performance comparisons instead of absolute time limits

This is especially important for the 1MB tests which could be affected by I/O, GC, or system load.

Copilot · 2026-02-03T22:19:41Z

tests/core/bitswap/test_cid.py

+    def test_multihash_api_integration(self):
+        """Test that py-multihash v3 API is properly integrated."""
+        import multihash
+
+        # Test that we can use multihash directly
+        data = b"test data"
+        mh = multihash.digest(data, multihash.Func.sha2_256)
+
+        # Verify multihash properties
+        assert mh.code == 0x12  # SHA-256 code
+        assert len(mh.digest) == 32  # SHA-256 produces 32 bytes
+        assert mh.verify(data) is True
+        assert mh.verify(b"wrong data") is False


This test directly asserts on multihash API properties (like mh.code and mh.verify) which are part of the py-multihash library's interface, not the CID module being tested. While it's good to verify the integration works, this test would be better placed as part of the compatibility tests or removed entirely.

The actual integration testing is already covered by test_cidv0_format_compatibility and test_cidv1_format_compatibility which verify that CIDs computed using the multihash API work correctly with the verify_cid function. This test adds minimal value beyond verifying that py-multihash itself works, which should be the responsibility of that library's own tests.

Addresses libp2p#1180 (Priority 3 from discussion libp2p#1170)

…ses libp2p#1180 (Priority 2 from discussion libp2p#1170)

…actoring - Priority 2: DAG streaming capability - Priority 3: Records validation

aniruddha1295 · 2026-02-04T13:43:43Z

@sumanjeet0012 @seetadev

I have a question about Priority 2 (Streaming hash for large files) from the issue.

I implemented streaming for single-block files using multihash.sum_stream():

# libp2p/bitswap/dag.py:163
with open(file_path, "rb") as f:
    cid = compute_cid_v1_stream(f, codec=CODEC_RAW)
    
# libp2p/bitswap/dag.py:211
for chunk_data in enumerate(chunk_file(file_path, chunk_size)):
    chunk_cid = compute_cid_v1(chunk_data, codec=CODEC_RAW)  # 256KB in memory

seetadev · 2026-02-07T21:36:51Z

@aniruddha1295 : This is an excellent and very thorough piece of work — thanks a lot for driving this 👏

Looping in @sumanjeet0012, @yashksaini-coder, and @acul71 for visibility and review as well.
(@sumanjeet0012 especially since you did the original Bitswap integration in py-libp2p.)

Overall, this PR does a great job of addressing #1180 in a clean, well-structured way, and it clearly reflects careful thought around maintainability, correctness, and long-term alignment with the libp2p stack.

What works really well

Replacing manual byte manipulation with the py-multihash v3 API is absolutely the right move. This significantly improves readability, correctness, and future maintainability, while also letting us rely on well-tested library behavior instead of custom logic.
The refactor of verify_cid() is especially nice — reducing complexity while improving semantic clarity is a big win.
The phased approach (CID module → DAG streaming → records validation) makes the PR easy to reason about despite its size, and the commit history is clean and review-friendly.
The streaming support using multihash.sum_stream() is a solid, pragmatic implementation. Applying it where it provides real benefit (single-block files) while avoiding unnecessary churn in the chunked path is a thoughtful trade-off.
Test coverage is strong. The added edge-case tests materially improve confidence around malformed inputs and compatibility, and the changelog entry is clear and complete.

On the streaming question

Your reasoning makes sense. Applying streaming to single-block files is where the memory benefit is most clear, and the current chunking approach for multi-block files already keeps memory bounded. I’m 👍 on keeping it as-is for now, with the option to extend later if we see real-world demand.

Despite the CI failures at the moment (likely tied to the perf assertions), the core design and implementation here look solid. Once those are addressed, this feels very close to being merge-ready.

Thanks again for the high-quality contribution — this meaningfully improves the Bitswap CID path and sets us up well for future work.

acul71 · 2026-02-07T22:55:47Z

Hello @aniruddha129, thanks for this PR and for integrating the new py-multihash v3 API!

The PR is well done but there are some issues that must be addressed before merging:

Lint/typecheck errors
Run make pr and make docs to spot them and resolve. Did you push with git commit --no-verify?
Also add a trailing newline to the newsfragment (make docs # or make linux-docs will catch it).
Double file read in dag.py single-block path — performance regression
sum_stream has no beneficial use in the current MerkleDag architecture: single-block files (≤ chunk_size) must be fully loaded into memory for add_block() anyway, so streaming the hash just adds an extra file read with no memory savings. Multi-block files are already chunked into small pieces, so there's no large-file streaming scenario either. Please remove compute_cid_v1_stream from dag.py. (Sorry if the issue description was misleading on this — sum_stream is a valid utility but doesn't fit the MerkleDag code paths. It could be useful if something different than DAG is used in the future.)
Duplicate py-multihash tests
Check if test_multihash_api_integration is already covered by py-multihash's own test suite — we don't need to duplicate library-level tests (testing mh.code, mh.digest, mh.verify directly) here.
reconstruct_cid_from_prefix_and_data() hardcodes SHA-256 regardless of prefix
The prefix bytes contain the hash function code at prefix[2], but it's always using sha2_256. Could you improve this to read the hash algorithm from the prefix, or is that out of scope for bitswap?

Full review here (try to feed this message to COPILOT AI and see if it's able to cope with that. (ANYWAY Check always the code for allucinations ) (-:

AI PR Review: PR #1186 — Integrate py-multihash v3 API in Bitswap CID Module

PR: #1186
Author: @aniruddha1295
Branch: integrate-multihash-v3 → main
Issue: #1180
Discussion: #1170
Review Date: 2026-02-07
Reviewer: AI (claude-4.6-opus)

1. Summary of Changes

This PR replaces manual multihash byte manipulation with py-multihash v3 API calls across three modules, addressing all three priorities outlined in issue #1180 and discussion #1170.

Changes by phase:

Phase 1 — Bitswap CID Module (libp2p/bitswap/cid.py): Replaced hashlib.sha256() + manual byte construction with multihash.digest() / mh.encode(). Refactored verify_cid() to use multihash.decode() + mh.verify() instead of manual byte slicing.
Phase 2 — DAG Streaming (libp2p/bitswap/dag.py): Added compute_cid_v1_stream() using multihash.sum_stream() and applied it to single-block files in MerkleDag.add_file().
Phase 3 — Records Validation (libp2p/records/pubkey.py): Replaced exception-based multihash validation with multihash.is_valid().
Tests (tests/core/bitswap/test_cid.py): Added 8 compatibility edge-case tests and 4 performance benchmarks.
Newsfragment (newsfragments/1180.feature.rst): Added changelog entry for the feature.

Files affected: 5 files (3 source modules, 1 test file, 1 newsfragment)
Additions: 316 lines | Deletions: 63 lines

Breaking changes: None. The public API signatures and behavior are preserved.

2. Branch Sync Status and Merge Conflicts

Branch Sync Status

Status: ℹ️ Ahead of origin/main
Details: Branch is 0 commits behind and 5 commits ahead of origin/main.

Merge Conflict Analysis

✅ No merge conflicts detected. The PR branch can be merged cleanly into origin/main.

3. Strengths

Clear alignment with issue and discussion: The PR addresses all three priorities from issue Integrate py-multihash v3 features into py-libp2p #1180 and follows the code examples provided in discussion py-libp2p Multihash Integration Analysis #1170 closely.
Good edge-case test coverage: The TestCompatibility class covers important edge cases (malformed multihash, truncated CIDs, empty CIDs, single-byte CIDs, wrong hash types) that weren't previously tested.
Cleaner verification logic: The refactored verify_cid() is significantly simpler (30 lines vs 55 lines) by delegating decoding and verification to the library.
Backward compatibility preserved: All existing tests pass, confirming no regressions.
Proper use of multihash API: The multihash.digest(), mh.encode(), mh.verify(), and multihash.is_valid() calls are used correctly.

4. Issues Found

Critical

C0. All py-multihash v3 APIs fail in CI — namespace collision with pymultihash (BLOCKER)

Files: libp2p/bitswap/cid.py, libp2p/bitswap/dag.py, libp2p/records/pubkey.py, tests/core/bitswap/test_cid.py
Issue: All GitHub Actions CI checks fail (8 tox jobs + 3 Windows jobs = 11 failing jobs) with AttributeError on every Python version (3.10–3.13) and both Linux and Windows:
1. module 'multihash' has no attribute 'sum_stream' — affects dag.py, causing 5 test failures
2. module 'multihash' has no attribute 'is_valid' — affects pubkey.py, causing 4 test failures
3. 'Multihash' object has no attribute 'code' — affects test code, causing 2 test failures
Root cause: pymultihash package namespace collision

The project's dev dependency p2pclient==0.2.0 (in pyproject.toml) depends on pymultihash==0.8.2. Both py-multihash and pymultihash install a multihash/ Python package into site-packages — same namespace, different code. When uv installs both simultaneously in CI, pymultihash's __init__.py overwrites py-multihash's __init__.py. The pymultihash version lacks the v3 APIs (sum_stream, is_valid, Multihash.code). Locally, py-multihash happened to be installed last, so its files took precedence.

Dependency chain: pyproject.toml → p2pclient==0.2.0 → pymultihash==0.8.2 → overwrites multihash/ namespace
Fix (verified): p2pclient v0.2.1 (released 2026-01-28) already depends on py-multihash>=3.0.0 instead of pymultihash, eliminating the collision. Bumping p2pclient from ==0.2.0 to >=0.2.1 in pyproject.toml resolves all 11 CI failures. This has been verified locally — all tox core tests pass cleanly after the bump.

C1. Missing type annotation causes mypy failure (BLOCKER)

File: libp2p/bitswap/cid.py
Line(s): 67
Issue: compute_cid_v1_stream(file_obj, ...) is missing a type annotation for file_obj, causing mypy to fail with [no-untyped-def].

Suggestion: Add a proper type annotation:

from typing import BinaryIO

def compute_cid_v1_stream(file_obj: BinaryIO, codec: int = CODEC_RAW) -> bytes:

Major

M1. Newsfragment missing trailing newline (pre-commit failure)

File: newsfragments/1180.feature.rst
Issue: The file was committed without a trailing newline. The end-of-file-fixer pre-commit hook auto-fixes this, but the fix is not committed. Run pre-commit run --all-files and commit the fixed file.

M2. Double file read in dag.py single-block path — performance regression

File: libp2p/bitswap/dag.py
Line(s): 161–167
Issue: The streaming approach reads the file twice — once for hash computation via compute_cid_v1_stream(), and once to load data for add_block(). The old code read the file once and computed the CID from the in-memory data. For single-block files (≤ 256KB by default), the data must be loaded into memory regardless for add_block(), so streaming adds I/O overhead without saving memory.

sum_stream has no beneficial application in the current MerkleDag architecture:
1. Single-block files (≤ chunk_size): Must be fully loaded into memory for add_block(). Streaming gains nothing and costs an extra file read.
2. Multi-block files (> chunk_size): Already chunked into ≤ 256KB pieces by chunk_file(). Each chunk is materialized in memory when compute_cid_v1() is called. Streaming a 256KB chunk provides no memory benefit.
3. Root node CID: Computed over the DAG-PB serialized metadata blob (links to chunks), not over the original large file. There is never a "hash the entire large file" step.
Cross-ecosystem note: go-libp2p's go-multihash provides an equivalent SumStream() function but does not use it in its own codebase — all CID/multihash operations use multihash.Sum() with byte slices.

The py-multihash sum_stream implementation is sound — it just doesn't have a use case in the current dag.py code paths. It could be useful in future non-DAG contexts.
Suggestion: Revert the dag.py single-block path to the original approach (read file once, compute CID from in-memory bytes). The compute_cid_v1_stream() utility can remain in cid.py for potential future use, but remove its import and usage from dag.py.

M3. reconstruct_cid_from_prefix_and_data() hardcodes SHA-256 regardless of prefix

File: libp2p/bitswap/cid.py
Line(s): 139–143
Issue: The function hardcodes multihash.Func.sha2_256 without consulting the hash algorithm code in the prefix. If the prefix specifies a different hash algorithm, the reconstruction will produce an incorrect CID. This was a pre-existing limitation, but this PR was an opportunity to improve it. The prefix bytes are <version><codec><hash-type><hash-length>, so prefix[2] contains the hash function code.

Suggestion: Consider reading the hash function from the prefix:

hash_code = prefix[2] if len(prefix) > 2 else multihash.Func.sha2_256
mh = multihash.digest(data, hash_code)
return prefix + mh.digest

If intentionally left for a future PR, add a TODO comment.

M4. Exception handling in verify_cid() may be too narrow

File: libp2p/bitswap/cid.py
Line(s): 187
Issue: The except (ValueError, IndexError) clause may not catch all exceptions that multihash.decode() can raise. Other parts of the codebase (e.g., libp2p/peer/id.py:117) use a broader except Exception clause when calling multihash.decode().
Suggestion: Widen to except Exception as e: since returning False is the safe fallback for malformed input.

Minor

m1. Pyrefly false-positive type errors (informational)

File: libp2p/bitswap/cid.py (line 82), libp2p/records/pubkey.py (line 41)
Issue: Pyrefly reports No attribute 'sum_stream' in module 'multihash' and No attribute 'is_valid' in module 'multihash'. Both attributes exist at runtime — this is a pyrefly type-stub limitation with py-multihash v3, not a code issue.
Suggestion: No action required from the PR author. The project may want to add pyrefly ignore comments or update stubs separately.

m2. compute_cid_v0 imported locally in test instead of at top of file

File: tests/core/bitswap/test_cid.py
Line(s): 179
Issue: compute_cid_v0 is imported inside test_cidv0_format_compatibility() rather than at the top of the file with other imports.
Suggestion: Move the import to the top-level import block for consistency.

m3. Performance tests use hardcoded time thresholds

File: tests/core/bitswap/test_cid.py
Line(s): 242–347
Issue: The TestPerformance class uses fixed time thresholds (e.g., < 0.5s, < 0.2s) that may fail in CI environments with variable load.
Suggestion: Either increase thresholds significantly (5-10x), mark them with @pytest.mark.benchmark and skip in CI, or log performance without asserting on absolute thresholds.

m4. test_multihash_api_integration tests the library, not the module

File: tests/core/bitswap/test_cid.py
Line(s): 202–214
Issue: This test directly tests py-multihash properties (mh.code, mh.digest, mh.verify), which is the library's own responsibility. These should be covered by py-multihash's own test suite, not duplicated here.
Suggestion: Remove this test or convert it to a minimal smoke test.

m5. Redundant docstring phrasing

File: libp2p/bitswap/cid.py
Lines: 26–31, 46–50, 68–79, etc.
Issue: Several docstrings repeat "Uses py-multihash v3 API for robust multihash handling" in both the summary and a separate paragraph.
Suggestion: Keep one mention per docstring.

5. Security Review

Risk: None identified.
Impact: None.
Notes: The changes are internal refactoring that preserves the same cryptographic operations (SHA-256 hashing, CID verification). No new external input handling paths are introduced. The verify_cid() exception handling (issue M4) could theoretically allow an unhandled exception to propagate on crafted input, but this would cause a crash rather than a security bypass.

6. Documentation and Examples

Docstrings are updated for all modified functions and accurately describe the new behavior.
The module-level docstring in dag.py is updated to mention streaming hash computation.
No README or tutorial updates are needed since these are internal API changes.
Minor issue: Docstrings are somewhat verbose with repeated "Uses py-multihash v3 API" phrasing (see m5).

7. Newsfragment Requirement

File: newsfragments/1180.feature.rst — Present ✅
Naming: 1180.feature.rst — Correct (matches issue Integrate py-multihash v3 features into py-libp2p #1180, type is feature) ✅
Content: Describes user-facing change in ReST format ✅
Issue Reference: PR body references issue Integrate py-multihash v3 features into py-libp2p #1180 ✅
Trailing Newline: ❌ Missing — needs to be committed after running pre-commit hooks (see M1).

8. Tests and Validation

Local Test Results

Metric	Result
Total Tests	1931
Passed	1931 ✅
Failed	0 ✅
Skipped	16
Errors	0 ✅
Warnings	25 (pre-existing)
Duration	87.48s

All tests pass locally. No regressions detected.

GitHub Actions CI Results

Job	Status	Cause
tox (3.10–3.13, core)	❌ FAIL (x4)	11 test failures each — namespace collision (C0)
tox (3.10–3.13, lint)	❌ FAIL (x4)	mypy (C1) + pyrefly (m1) errors
windows (3.11–3.13, core)	❌ FAIL (x3)	11 test failures each — namespace collision (C0)
docs, demos, interop, utils, wheel	✅ PASS	—

All CI failures trace back to the pymultihash namespace collision (C0) and the missing type annotation (C1). Both are resolved by bumping p2pclient>=0.2.1 and adding the type annotation.

New Test Coverage

Test Class	Tests	Purpose
`TestCompatibility`	8	Edge cases: malformed, truncated, empty, single-byte CIDs; wrong hash type; CIDv0/v1 compatibility; API integration
`TestPerformance`	4	Benchmarks: CID computation, verification, small data, codec comparison

Good edge-case coverage. Performance tests may cause flaky failures in CI (see m3).

Lint Results

Check	Result
YAML/TOML	✅ Passed
End of files	❌ Failed (auto-fixed newsfragment — see M1)
Trailing whitespace	✅ Passed
pyupgrade / ruff / ruff format / mdformat	✅ Passed
mypy	❌ Failed — missing type annotation (see C1)
pyrefly	❌ Failed — false-positive `missing-attribute` errors (see m1)
RST check	✅ Passed

Documentation Build

HTML build: ✅ Succeeded
Doctest: ✅ 109 tests passed, 0 failures
Towncrier: ✅ Draft renders correctly with the new feature entry for Integrate py-multihash v3 features into py-libp2p #1180

9. Recommendations for Improvement

Must Fix (Blockers)

Bump p2pclient to >=0.2.1 in pyproject.toml to resolve the pymultihash namespace collision causing all CI failures (C0).
Add type annotation to compute_cid_v1_stream's file_obj parameter (use BinaryIO from typing) to fix the mypy failure (C1).
Commit the newsfragment trailing newline fix — run pre-commit run --all-files and commit (M1).

Should Fix

Remove compute_cid_v1_stream usage from dag.py — revert to reading the file once for single-block files. The streaming approach adds overhead without benefit in MerkleDag (M2).
Widen exception handling in verify_cid() from (ValueError, IndexError) to Exception for robustness (M4).

Nice to Have

Consider reading the hash algorithm from the prefix in reconstruct_cid_from_prefix_and_data() instead of hardcoding SHA-256 (M3).
Move the compute_cid_v0 import in the test to the top-level import block (m2).
Make performance test thresholds more lenient or mark them as benchmarks (m3).
Remove test_multihash_api_integration — it tests the library, not the module (m4).
Remove redundant "Uses py-multihash v3 API" lines in docstrings (m5).

10. Questions for the Author

Double file read in dag.py: Was the double file read for single-block files intentional? The data must be fully loaded into memory for add_block() anyway, so streaming the hash doesn't save memory. Would you consider reverting to the previous single-read approach?
Exception handling breadth: The existing codebase uses except Exception when calling multihash.decode() (e.g., libp2p/peer/id.py). Was except (ValueError, IndexError) in verify_cid() a deliberate choice?
Performance claims: The newsfragment claims "5-50% faster." Was this measured? The streaming approach for single-block files actually adds overhead (double file I/O). Could you share benchmark results?
Hash algorithm flexibility: The prefix contains the hash algorithm code, but reconstruct_cid_from_prefix_and_data() always uses SHA-256. Is there a plan to make this configurable in a follow-up PR?

11. Overall Assessment

Criterion	Rating
Quality Rating	Changes Requested
Security Impact	None
Merge Readiness	Blocked — CI failures + architectural issue in `dag.py`
Confidence	High

Summary: The PR correctly replaces manual byte manipulation with py-multihash v3 API calls, and the core refactoring logic is sound. However, it is blocked by two issues: ~~(1) a pymultihash namespace collision that breaks all CI tests — fixable by bumping p2pclient to >=0.2.1~~ — and (2) a performance regression in dag.py where sum_stream is applied to single-block files without benefit, causing a double file read. The sum_stream utility itself is valid but has no beneficial use case in the current MerkleDag architecture. Additional fixes needed: missing type annotation (mypy blocker), newsfragment trailing newline, and narrower exception handling in verify_cid(). Once these are addressed, the PR is in good shape for merge.

- C0: Already fixed (p2pclient>=0.2.1) - C1: Add BinaryIO type annotation to compute_cid_v1_stream - M1: Add trailing newline to newsfragment - M2: Remove streaming from dag.py (performance regression) - M3: Read hash algorithm from prefix instead of hardcoding SHA-256 - M4: Widen exception handling in verify_cid to Exception - m2: Move pytest import to top of test file - m3: Add @pytest.mark.benchmark decorator - m4: Remove test_multihash_api_integration - m5: Clean up redundant docstring phrasing - Update newsfragment to reflect Phase 1 & 3 only All 43 tests passing locally. make linux-docs passed. Pre-commit pyrefly check fails on 35 errors in unrelated files (examples, libp2p core, tests) not modified in this PR.

aniruddha1295 · 2026-02-09T19:15:24Z

@acul71 @seetadev @sumanjeet0012 ,

I've pushed all 11 fixes from your review (commit 3d4505f7).

Note on commit process: I used git commit --no-verify because the pre-commit pyrefly check was failing on 35 errors in files I didn't modify (examples, libp2p core, tests). My modified files all pass pyrefly individually.

I've opened a discussion to ask about the recommended approach for this situation: [(https://github.com//discussions/1200#discussion-9451060)]

All my changes are tested and ready for CI validation. Let me know if you need any clarifications!

Integrate py-multihash v3 API in Bitswap CID module Addresses libp2p#…

1e8d0a8

…1180 (Priority 1 from discussion libp2p#1170)

Copilot AI review requested due to automatic review settings February 3, 2026 22:14

Copilot started reviewing on behalf of aniruddha1295 February 3, 2026 22:15 View session

Copilot AI reviewed Feb 3, 2026

View reviewed changes

aniruddha1295 changed the title ~~Addresses #1180 Integrate py-multihash v3 API in Bitswap CID module (…~~ Addresses #1180 Integrate py-multihash v3 API in Bitswap CID module Feb 3, 2026

aniruddha1295 added 3 commits February 4, 2026 13:29

Add Phase 3: Records validation using is_valid()

3327e1d

Addresses libp2p#1180 (Priority 3 from discussion libp2p#1170)

Add Phase 2: DAG streaming capability (initial implementation) Addres…

708c06a

…ses libp2p#1180 (Priority 2 from discussion libp2p#1170)

Update newsfragment for all 3 priorities - Priority 1: CID module ref…

5b5c73f

…actoring - Priority 2: DAG streaming capability - Priority 3: Records validation

aniruddha1295 mentioned this pull request Feb 4, 2026

Integrate py-multihash v3 features into py-libp2p #1180

Open

Merge branch 'main' into integrate-multihash-v3

19a8555

acul71 and others added 3 commits February 8, 2026 00:35

fix: temp fix for pymultihash py-multihash conflict

502f941

fix: p2pclient==0.2.1 fix for pymultihash py-multihash conflict

1f6c653

aniruddha1295 and others added 3 commits February 9, 2026 19:32

Trigger CI re-run

e1913d4

Merge branch 'main' into integrate-multihash-v3

b3ce627

Fix newsfragment trailing newline

f75538e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addresses #1180 Integrate py-multihash v3 API in Bitswap CID module #1186

Addresses #1180 Integrate py-multihash v3 API in Bitswap CID module #1186

aniruddha1295 commented Feb 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

Copilot AI Feb 3, 2026

Uh oh!

aniruddha1295 commented Feb 4, 2026 •

edited

Loading

Uh oh!

seetadev commented Feb 7, 2026

Uh oh!

acul71 commented Feb 7, 2026 •

edited

Loading

Uh oh!

aniruddha1295 commented Feb 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Addresses #1180 Integrate py-multihash v3 API in Bitswap CID module #1186

Are you sure you want to change the base?

Addresses #1180 Integrate py-multihash v3 API in Bitswap CID module #1186

Conversation

aniruddha1295 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Integrate py-multihash v3 API (Complete Implementation)

What was wrong?

How was it fixed?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

aniruddha1295 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seetadev commented Feb 7, 2026

What works really well

On the streaming question

Uh oh!

acul71 commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI PR Review: PR #1186 — Integrate py-multihash v3 API in Bitswap CID Module

1. Summary of Changes

2. Branch Sync Status and Merge Conflicts

Branch Sync Status

Merge Conflict Analysis

3. Strengths

4. Issues Found

Critical

Major

Minor

5. Security Review

6. Documentation and Examples

7. Newsfragment Requirement

8. Tests and Validation

Local Test Results

GitHub Actions CI Results

New Test Coverage

Lint Results

Documentation Build

9. Recommendations for Improvement

Must Fix (Blockers)

Should Fix

Nice to Have

10. Questions for the Author

11. Overall Assessment

Uh oh!

aniruddha1295 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aniruddha1295 commented Feb 3, 2026 •

edited

Loading

aniruddha1295 commented Feb 4, 2026 •

edited

Loading

acul71 commented Feb 7, 2026 •

edited

Loading

aniruddha1295 commented Feb 9, 2026 •

edited

Loading