Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #59056

…uce I/O and parsing overhead (#59056)

### What problem does this PR solve?

## Motivation

During Iceberg query planning, FE needs to read and parse the metadata
chain: ManifestList → Manifest → DataFile/DeleteFile. When frequently
querying hot partitions or executing small batch queries, the same
Manifest files are repeatedly read and parsed, causing significant I/O
and CPU overhead.

## Solution

This PR introduces a manifest-level cache (`IcebergManifestCache`) in FE
to cache the parsed DataFile/DeleteFile lists per manifest file. The
cache is implemented using Caffeine with weight-based LRU eviction and
TTL support.

### Key Components

- **IcebergManifestCache**: Core cache implementation using Caffeine
- Weight-based LRU eviction controlled by
`iceberg.manifest.cache.capacity-mb`
  - TTL expiration via `iceberg.manifest.cache.ttl-second`
- Single-flight loading to prevent duplicate parsing of the same
manifest

- **ManifestCacheKey**: Cache key consisting of:
  - Manifest file path

- **ManifestCacheValue**: Cached payload containing:
  - List of `DataFile` or `DeleteFile`
  - Estimated memory weight for eviction

- **IcebergManifestCacheLoader**: Helper class to load and populate the
cache using `ManifestFiles.read()`

### Cache Invalidation Strategy

- Key changes automatically invalidate stale entries
(length/lastModified/sequenceNumber changes)
- TTL prevents stale data when underlying storage doesn't support
precise mtime/etag
- Different snapshots use different manifest paths/keys, ensuring
snapshot-level isolation

### Iceberg Catalog Properties

| Config | Default | Description |
|--------|---------|-------------|
| `iceberg.manifest.cache.enable` | `true` | Enable/disable manifest
cache |
| `iceberg.manifest.cache.capacity-mb` | `1024` | Maximum cache capacity
in MB |
| `iceberg.manifest.cache.ttl-second` | `48 * 60 * 60` | Cache entry
expiration after access |

### Integration Point

The cache is integrated in
`IcebergScanNode.planFileScanTaskWithManifestCache()`, which:
1. Loads delete manifests via cache and builds `DeleteFileIndex`
2. Loads data manifests via cache and creates `FileScanTask` for each
data file
3. Falls back to original scan if cache loading fails
@github-actions github-actions bot requested a review from yiguolei as a code owner December 26, 2025 09:49
@yiguolei
Copy link
Contributor

yiguolei commented Jan 5, 2026

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 9.93% (72/725) 🎉
Increment coverage report
Complete coverage report

@morningman morningman closed this Jan 7, 2026
@morningman morningman reopened this Jan 7, 2026
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 7, 2026
@github-actions
Copy link
Contributor Author

github-actions bot commented Jan 7, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor Author

github-actions bot commented Jan 7, 2026

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit 81330f0 into branch-4.0 Jan 7, 2026
25 of 28 checks passed
@github-actions github-actions bot deleted the auto-pick-59056-branch-4.0 branch January 7, 2026 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants