Upgrade DataFusion to 52.1.0 and Add Liquid Cache Support#20740
Conversation
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 776c709.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
776c709 to
ab6614a
Compare
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit ab6614a.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
| liquid_cache: LiquidCacheRef, | ||
| } | ||
|
|
||
| static LIQUID_ONLY: OnceLock<Result<LiquidOnlyRuntime, String>> = OnceLock::new(); |
There was a problem hiding this comment.
Why are we doing it here? Can't it be done in global runtime?
| config = config.with_files_statistics_cache(Some(default_stats)); | ||
| } | ||
| // Add statistics cache if available - use default since CustomStatisticsCache doesn't implement FileStatisticsCache trait | ||
| let default_stats = Arc::new(DefaultFileStatisticsCache::default()); |
There was a problem hiding this comment.
This defeats the purpose of our custom statistics cache
| config.options_mut().execution.parquet.pushdown_filters = false; | ||
| config.options_mut().execution.target_partitions = target_partitions; | ||
| config.options_mut().execution.batch_size = 8192; | ||
| .with_metadata_cache_limit(250 * 1024 * 1024) |
There was a problem hiding this comment.
This should be configurable via settings
| info!("[LiquidCache] Creating Parquet access plans for {} row IDs across {} files", | ||
| row_ids.len(), files_metadata.len()); | ||
| let access_plans = create_access_plans(row_ids, files_metadata.clone()).await?; | ||
| info!("[LiquidCache] ✓ Access plans created, Liquid Cache will optimize data access"); |
There was a problem hiding this comment.
Kindly trim unnecessary logs throughout this PR and keep only the essential debug logs.
| .build().unwrap(); | ||
|
|
||
| log_info!("[LiquidCache] Initializing global Liquid Cache (1GB max)"); | ||
| let liquid_runtime = match LiquidOnlyRuntime::init(1024 * 1024 * 1024) { |
There was a problem hiding this comment.
This should be configurable via settings.
ab6614a to
13e10d5
Compare
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 13e10d5.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 17d5398.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
17d5398 to
ab300b4
Compare
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit ab300b4.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
…pendency Signed-off-by: Tanvir Alam <tanvralm@amazon.com>
ab300b4 to
76ff645
Compare
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 76ff645.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
|
❌ Gradle check result for 76ff645: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit ba9d9c1.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
ba9d9c1 to
698f7c3
Compare
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 698f7c3.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
Signed-off-by: Tanvir Alam <tanvralm@amazon.com>
698f7c3 to
579060c
Compare
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit 579060c.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
| let list_file_cache = Arc::new(DefaultListFilesCache::default()); | ||
| list_file_cache.put(table_path.prefix(), object_meta); | ||
| let table_scoped_path = datafusion::execution::cache::TableScopedPath { | ||
| table: None, |
There was a problem hiding this comment.
Do we need to define the table also here!
There was a problem hiding this comment.
TableScopedPath is required by DataFusion's cache API signature - the put() method expects this struct type, not a raw path.
| let mut config = SessionConfig::new(); | ||
| config.options_mut().execution.parquet.pushdown_filters = false; | ||
| config.options_mut().execution.target_partitions = target_partitions; | ||
| config.options_mut().execution.target_partitions = 4; |
There was a problem hiding this comment.
let's make sure we keep it same?
| url = { workspace = true } | ||
|
|
||
| # Liquid Cache for byte-level caching | ||
| liquid-cache-datafusion-local = "0.1.12" |
There was a problem hiding this comment.
Do it similar to other packages
There was a problem hiding this comment.
The /parquet/ subdirectory was created because the DatafusionEngine constructs the file path by appending the data format name to the base path. Looking at the earlier context, when using DataFormat.PARQUET, the engine expects files to be in a parquet/ subdirectory.
There was a problem hiding this comment.
Why can't we mock it in the tests the path? I think we should already have some file
There was a problem hiding this comment.
i think only paths have changed - this should be okay right ?
There was a problem hiding this comment.
Yes only paths have changed the files are now inside parquet directory
…tructure Signed-off-by: Tanvir Alam <tanvralm@amazon.com>
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit c496725.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit fbb15e0.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
…ojection Signed-off-by: Tanvir Alam <tanvralm@amazon.com>
fbb15e0 to
d955609
Compare
PR Code Analyzer ❗AI-powered 'Code-Diff-Analyzer' found issues on commit d955609.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
6e5699d
into
opensearch-project:feature/datafusion
Summary
This PR upgrades the DataFusion engine from version 51.0.0 to 52.1.0 and integrates the
liquid-cache-datafusion-localdependency for enhanced caching capabilities.Changes
Dependency Updates
liquid-cache-datafusion-local = "0.1.12"