Skip to content

Conversation

@sneaxhuh
Copy link

@sneaxhuh sneaxhuh commented Dec 2, 2025

Summary

Implements dynamic programming cache for CID traversal in ipfs dag stat to avoid redundant traversals when multiple DAGs share common subgraphs.

Resolves TODO in core/commands/dag/stat.go:17-18

Changes

Core Implementation

  • Added cidStatCache structure to memoize subtree statistics for each CID
  • Replaced linear traversal with recursive DP algorithm that:
    • Checks cache before traversing any node
    • Skips entire subtree traversal for cached CIDs
    • Computes and caches subtree stats (size + block count) for each node
  • Maintains correct accounting for TotalSize, SharedSize, and deduplication ratios

Testing

  • Added TestDagStatCaching with two test cases:
    • Cache consistency when querying duplicate CIDs
    • Correct deduplication stats with shared subgraphs

Performance Impact

Time Complexity:

  • Before: O(N × M) where N = unique CIDs, M = times each appears
  • After: O(N) - each unique CID traversed once

…tat` to avoid redundant traversals when multiple DAGs share common subgraphs.

Resolves TODO in `core/commands/dag/stat.go:17-18`:
> "cache every cid traversal in a dp cache. if the cid exists in the cache, don't traverse it, and use the cached result to compute the new state"

- **Added `cidStatCache`** structure to memoize subtree statistics for each CID
- **Replaced linear traversal** with recursive DP algorithm that:
  - Checks cache before traversing any node
  - Skips entire subtree traversal for cached CIDs
  - Computes and caches subtree stats (size + block count) for each node
- **Maintains correct accounting** for TotalSize, SharedSize, and deduplication ratios

- Added `TestDagStatCaching` with two test cases:
  - Cache consistency when querying duplicate CIDs
  - Correct deduplication stats with shared subgraph

Signed-off-by: sneax <[email protected]>
@sneaxhuh sneaxhuh requested a review from a team as a code owner December 2, 2025 13:56
@gammazero
Copy link
Contributor

Triage: Will review. This may be useful for pinner and for dag import, possibly by caching different data. Review should assess additional use cases.

@gammazero gammazero self-assigned this Dec 2, 2025
@gammazero gammazero added P1 High: Likely tackled by core team if no one steps up kind/feature A new feature labels Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature A new feature P1 High: Likely tackled by core team if no one steps up

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants