Skip to content

Conversation

dentiny
Copy link
Contributor

@dentiny dentiny commented Oct 1, 2025

Which issue does this PR close?

What changes are included in this PR?

Context: I see huge CPU time spent on manifest list loading, especially avro deserialization (see attached PR for details), I want to leverage the object cache to avoid unnecessary IO and deser.

Discussed online with @liurenjie1024 for a bit, see

we lean towards the path that:

  • Make object cache a read-through and write-through cache for manifest and manifest list
  • Later loading attempts from object cache first, could be either a read-through cache, or look-aside for easier implementation
  • Make object cache internal and transparent, instead of allow external application to directly access

I plan to structure and split the series of PRs as follows:

  • Store manifest list into object cache, if cache enabled
  • Load manifest list with object cache considered, which makes object store a part of file io
  • Replicate the same procedure to manifest files

This PR finishes the first part, which also benefits existing cached scan.

Are these changes tested?

Yes, unit test added.

@dentiny dentiny force-pushed the hjiang/manifest-list-write-through-cache branch from 6a6015d to c995c9c Compare October 1, 2025 00:50
@dentiny dentiny force-pushed the hjiang/manifest-list-write-through-cache branch from c995c9c to 17a85d0 Compare October 1, 2025 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant