Skip to content

Conversation

andygrove
Copy link
Member

Which issue does this PR close?

N/A

Rationale for this change

What changes are included in this PR?

  • Add scaladoc documentation
  • Move methods around so for easier code comprehension; hasNext / next / close are in that order now, and helper methods are moved to the bottom

How are these changes tested?

Comment on lines -243 to -261
// The allocator thoughts the exported ArrowArray and ArrowSchema structs are not released,
// so it will report:
// Caused by: java.lang.IllegalStateException: Memory was leaked by query.
// Memory leaked: (516) Allocator(ROOT) 0/516/808/9223372036854775807 (res/actual/peak/limit)
// Suspect this seems a false positive leak, because there is no reported memory leak at JVM
// when profiling. `allocator` reports a leak because it calculates the accumulated number
// of memory allocated for ArrowArray and ArrowSchema. But these exported ones will be
// released in native side later.
// More to clarify it. For ArrowArray and ArrowSchema, Arrow will put a release field into the
// memory region which is a callback function pointer (C function) that could be called to
// release these structs in native code too. Once we wrap their memory addresses at native
// side using FFI ArrowArray and ArrowSchema, and drop them later, the callback function will
// be called to release the memory.
// But at JVM, the allocator doesn't know about this fact so it still keeps the accumulated
// number.
// Tried to manually do `release` and `close` that can make the allocator happy, but it will
// cause JVM runtime failure.

// allocator.close()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment refers to an allocator that no longer exists so it doesn't seem very useful to keep around

* - 3. Executes queries natively using DataFusion's columnar processing
* - 4. Returns results as ColumnarBatch with ownership transferred back to JVM
*
* * This isn't quite true. Comet does not currently implement best practice when passing batches
Copy link
Contributor

@comphead comphead Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    • This isn't quite true.

is it needed??

*
* '''Thread Safety:''' This method is synchronized to prevent race conditions during cleanup.
* Multiple threads calling close() concurrently will be serialized safely. This is important
* because it may be invoked from Spark's task cleanup listener.
Copy link
Contributor

@comphead comphead Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important because it may be invoked from Spark's task cleanup listener

This sounds somewhat out of context IMO?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this. I wanted to explain that this method is called both from the current class and also from other Spark threads, which is why it is synchronized

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @andygrove

* done).
* Comet's primary execution iterator that bridges JVM (Spark) and native (Rust) execution
* environments. This iterator orchestrates native query execution on Arrow columnar batches while
* managing sophisticated memory ownership semantics across the JNI boundary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sophisticated" isn't adding anything here.

* executing native query as there might be blocking operators such as Sort, Aggregate. The API
* `hasNext` can be used to check if it is the end of this iterator (i.e. the native query is
* done).
* Comet's primary execution iterator that bridges JVM (Spark) and native (Rust) execution
Copy link
Contributor

@mbutrovich mbutrovich Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simplify this down (along with CometBatchIterator) down to something that just clearly lays out "iterator for passing Arrow batches from JVM to Native" or vice versa. There are a lot of words here that don't really make clear what it's doing (same with #2168)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. i've pushed a more concise version

@codecov-commenter
Copy link

codecov-commenter commented Aug 18, 2025

Codecov Report

❌ Patch coverage is 77.77778% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.08%. Comparing base (f09f8af) to head (7280b79).
⚠️ Report is 389 commits behind head on main.

Files with missing lines Patch % Lines
...ain/scala/org/apache/comet/CometExecIterator.scala 77.77% 3 Missing and 5 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##               main    #2169       +/-   ##
=============================================
- Coverage     56.12%   35.08%   -21.05%     
+ Complexity      976      860      -116     
=============================================
  Files           119      143       +24     
  Lines         11743    13198     +1455     
  Branches       2251     2368      +117     
=============================================
- Hits           6591     4630     -1961     
- Misses         4012     7671     +3659     
+ Partials       1140      897      -243     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants