Skip to content

feat: Add config to enable running Comet in onheap mode#2554

Merged
andygrove merged 9 commits intoapache:mainfrom
andygrove:require-offheap-mode
Oct 13, 2025
Merged

feat: Add config to enable running Comet in onheap mode#2554
andygrove merged 9 commits intoapache:mainfrom
andygrove:require-offheap-mode

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Oct 12, 2025

Which issue does this PR close?

Related to #2342

Rationale for this change

Running Comet without Spark's off-heap mode enabled is not recommended for production use, as documented in the tuning guide. We should fall back to Spark by default in on-heap mode unless the user explicitly opts in.

What changes are included in this PR?

  • New config
  • Fall back to Spark if config is not enabled
  • Update GitHub workflows for Spark SQL test to enable new config via env var
  • Update CometTestBase to enable new config
  • Update docs

How are these changes tested?

@andygrove andygrove changed the title feat: Add config to enable running Comet in onheap mode for testing purposes [WIP] feat: Add config to enable running Comet in onheap mode for testing purposes [WIP] [iceberg] Oct 12, 2025
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 12, 2025

Codecov Report

❌ Patch coverage is 45.45455% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.89%. Comparing base (f09f8af) to head (72a8664).
⚠️ Report is 1150 commits behind head on main.

Files with missing lines Patch % Lines
...park/src/main/scala/org/apache/spark/Plugins.scala 14.28% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2554      +/-   ##
============================================
+ Coverage     56.12%   58.89%   +2.77%     
- Complexity      976     1457     +481     
============================================
  Files           119      147      +28     
  Lines         11743    13652    +1909     
  Branches       2251     2371     +120     
============================================
+ Hits           6591     8041    +1450     
- Misses         4012     4386     +374     
- Partials       1140     1225      +85     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove marked this pull request as ready for review October 13, 2025 14:00
@andygrove andygrove changed the title feat: Add config to enable running Comet in onheap mode for testing purposes [WIP] [iceberg] feat: Add config to enable running Comet in onheap mode for testing purposes [WIP] [viceberg] Oct 13, 2025
@andygrove andygrove changed the title feat: Add config to enable running Comet in onheap mode for testing purposes [WIP] [viceberg] feat: Add config to enable running Comet in onheap mode] Oct 13, 2025
@andygrove andygrove requested a review from wForget October 13, 2025 14:02
@andygrove andygrove changed the title feat: Add config to enable running Comet in onheap mode] feat: Add config to enable running Comet in onheap mode Oct 13, 2025
Copy link
Copy Markdown
Member

@wForget wForget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, lgtm

@andygrove andygrove requested a review from mbutrovich October 13, 2025 14:34
@andygrove
Copy link
Copy Markdown
Member Author

@Kontinuation @EmilyMatt fyi

@parthchandra
Copy link
Copy Markdown
Contributor

Why do we need a new config for this? Isn't Spark's config sufficient?

@andygrove
Copy link
Copy Markdown
Member Author

Why do we need a new config for this? Isn't Spark's config sufficient?

The goal is to require the user to opt-in to enabling Comet if Spark is in on-heap mode, because running Comet in on-heap mode is not recommended for production use.

@parthchandra
Copy link
Copy Markdown
Contributor

Why do we need a new config for this? Isn't Spark's config sufficient?

The goal is to require the user to opt-in to enabling Comet if Spark is in on-heap mode, because running Comet in on-heap mode is not recommended for production use.

Hmm. I feel that we have too many configs already and there may already be confusion about memory settings for Comet. I'm not saying we shouldn't have this config, but we do need to start simplifying things for end users soon.

@andygrove
Copy link
Copy Markdown
Member Author

Why do we need a new config for this? Isn't Spark's config sufficient?

The goal is to require the user to opt-in to enabling Comet if Spark is in on-heap mode, because running Comet in on-heap mode is not recommended for production use.

Hmm. I feel that we have too many configs already and there may already be confusion about memory settings for Comet. I'm not saying we shouldn't have this config, but we do need to start simplifying things for end users soon.

I agree. I think you will like the next PR that this enables. Thanks for the review.

@andygrove andygrove merged commit c214049 into apache:main Oct 13, 2025
127 of 130 checks passed
@andygrove andygrove deleted the require-offheap-mode branch October 13, 2025 19:24
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants