-
Notifications
You must be signed in to change notification settings - Fork 234
chore: Improve documentation for CometExecIterator
#2169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
// The allocator thoughts the exported ArrowArray and ArrowSchema structs are not released, | ||
// so it will report: | ||
// Caused by: java.lang.IllegalStateException: Memory was leaked by query. | ||
// Memory leaked: (516) Allocator(ROOT) 0/516/808/9223372036854775807 (res/actual/peak/limit) | ||
// Suspect this seems a false positive leak, because there is no reported memory leak at JVM | ||
// when profiling. `allocator` reports a leak because it calculates the accumulated number | ||
// of memory allocated for ArrowArray and ArrowSchema. But these exported ones will be | ||
// released in native side later. | ||
// More to clarify it. For ArrowArray and ArrowSchema, Arrow will put a release field into the | ||
// memory region which is a callback function pointer (C function) that could be called to | ||
// release these structs in native code too. Once we wrap their memory addresses at native | ||
// side using FFI ArrowArray and ArrowSchema, and drop them later, the callback function will | ||
// be called to release the memory. | ||
// But at JVM, the allocator doesn't know about this fact so it still keeps the accumulated | ||
// number. | ||
// Tried to manually do `release` and `close` that can make the allocator happy, but it will | ||
// cause JVM runtime failure. | ||
|
||
// allocator.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment refers to an allocator
that no longer exists so it doesn't seem very useful to keep around
* - 3. Executes queries natively using DataFusion's columnar processing | ||
* - 4. Returns results as ColumnarBatch with ownership transferred back to JVM | ||
* | ||
* * This isn't quite true. Comet does not currently implement best practice when passing batches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This isn't quite true.
is it needed??
* | ||
* '''Thread Safety:''' This method is synchronized to prevent race conditions during cleanup. | ||
* Multiple threads calling close() concurrently will be serialized safely. This is important | ||
* because it may be invoked from Spark's task cleanup listener. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is important because it may be invoked from Spark's task cleanup listener
This sounds somewhat out of context IMO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated this. I wanted to explain that this method is called both from the current class and also from other Spark threads, which is why it is synchronized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @andygrove
* done). | ||
* Comet's primary execution iterator that bridges JVM (Spark) and native (Rust) execution | ||
* environments. This iterator orchestrates native query execution on Arrow columnar batches while | ||
* managing sophisticated memory ownership semantics across the JNI boundary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"sophisticated" isn't adding anything here.
* executing native query as there might be blocking operators such as Sort, Aggregate. The API | ||
* `hasNext` can be used to check if it is the end of this iterator (i.e. the native query is | ||
* done). | ||
* Comet's primary execution iterator that bridges JVM (Spark) and native (Rust) execution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would simplify this down (along with CometBatchIterator) down to something that just clearly lays out "iterator for passing Arrow batches from JVM to Native" or vice versa. There are a lot of words here that don't really make clear what it's doing (same with #2168)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. i've pushed a more concise version
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2169 +/- ##
=============================================
- Coverage 56.12% 35.08% -21.05%
+ Complexity 976 860 -116
=============================================
Files 119 143 +24
Lines 11743 13198 +1455
Branches 2251 2368 +117
=============================================
- Hits 6591 4630 -1961
- Misses 4012 7671 +3659
+ Partials 1140 897 -243 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Which issue does this PR close?
N/A
Rationale for this change
What changes are included in this PR?
How are these changes tested?