-
Notifications
You must be signed in to change notification settings - Fork 248
chore: Add memory reservation debug logging and visualization #2521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
native/core/src/execution/jni_api.rs
Outdated
| debug_native: jboolean, | ||
| explain_native: jboolean, | ||
| tracing_enabled: jboolean, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than adding yet another flag to this API call, I am now using the already available spark config map in native code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. The config map should be the preferred method
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2521 +/- ##
============================================
+ Coverage 56.12% 58.93% +2.80%
- Complexity 976 1449 +473
============================================
Files 119 147 +28
Lines 11743 13649 +1906
Branches 2251 2369 +118
============================================
+ Hits 6591 8044 +1453
- Misses 4012 4382 +370
- Partials 1140 1223 +83 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
||
| impl MemoryPool for LoggingPool { | ||
| fn grow(&self, reservation: &MemoryReservation, additional: usize) { | ||
| println!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be println as info! or trace! ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess info! would be ok. I pushed that change. If we use trace! then we would have to set spark.comet.debug.memory=true and also configure trace logging for this one file, which seem like overkill for a debug feature
|
moving to draft while I work on the Python scripts |
| Next, generate a chart from the CSV file for a specific Spark task: | ||
|
|
||
| ```shell | ||
| python3 dev/scripts/plot_memory_usage.py /tmp/mem.csv --task 1234 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| python3 dev/scripts/plot_memory_usage.py /tmp/mem.csv --task 1234 | |
| python3 dev/scripts/plot_memory_usage.py /tmp/mem.csv |
plot_memory_usage.py does not accept --task argument
| if __name__ == "__main__": | ||
| ap = argparse.ArgumentParser(description="Generate CSV From memory debug output") | ||
| ap.add_argument("--task", default=None, help="Task ID.") | ||
| ap.add_argument("--file", default=None, help="Spark log containing memory debug output") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file argument seems to be mandatory, not optional.
It is used at https://github.com/apache/datafusion-comet/pull/2521/files#diff-b1b45e935652f7568175f6d7b83ff247fab24d507c782ef1e53392a53410e095R30
| "Guide (https://datafusion.apache.org/comet/user-guide/tracing.html)" | ||
|
|
||
| private val DEBUGGING_GUIDE = "For more information, refer to the Comet Debugging " + | ||
| "Guide (https://datafusion.apache.org/comet/contributor-guide/debugging.html" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "Guide (https://datafusion.apache.org/comet/contributor-guide/debugging.html" | |
| "Guide (https://datafusion.apache.org/comet/contributor-guide/debugging.html)" |
| | spark.comet.convert.json.enabled | When enabled, data from Spark (non-native) JSON v1 and v2 scans will be converted to Arrow format. Note that to enable native vectorized execution, both this config and 'spark.comet.exec.enabled' need to be enabled. | false | | ||
| | spark.comet.convert.parquet.enabled | When enabled, data from Spark (non-native) Parquet v1 and v2 scans will be converted to Arrow format. Note that to enable native vectorized execution, both this config and 'spark.comet.exec.enabled' need to be enabled. | false | | ||
| | spark.comet.debug.enabled | Whether to enable debug mode for Comet. When enabled, Comet will do additional checks for debugging purpose. For example, validating array when importing arrays from JVM at native side. Note that these checks may be expensive in performance and should only be enabled for debugging purpose. | false | | ||
| | spark.comet.debug.memory | When enabled, log all native memory pool interactions. For more information, refer to the Comet Debugging Guide (https://datafusion.apache.org/comet/contributor-guide/debugging.html. | false | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| | spark.comet.debug.memory | When enabled, log all native memory pool interactions. For more information, refer to the Comet Debugging Guide (https://datafusion.apache.org/comet/contributor-guide/debugging.html. | false | | |
| | spark.comet.debug.memory | When enabled, log all native memory pool interactions. For more information, refer to the Comet Debugging Guide (https://datafusion.apache.org/comet/contributor-guide/debugging.html). | false | |
| ap.add_argument("--task", default=None, help="Task ID.") | ||
| ap.add_argument("--file", default=None, help="Spark log containing memory debug output") | ||
| args = ap.parse_args() | ||
| main(args.file, int(args.task)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task is optional parameter. Calling int(None) will fail with TypeError
| size = int(re_match.group(4)) | ||
|
|
||
| if alloc.get(consumer) is None: | ||
| alloc[consumer] = size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible the method to be shrink for the first occurrence ?
| elif method == "shrink": | ||
| alloc[consumer] = alloc[consumer] - size | ||
|
|
||
| print(consumer, ",", alloc[consumer]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| print(consumer, ",", alloc[consumer]) | |
| print(f"{consumer},{alloc[consumer]}") |
nit: to avoid the extra spaces around each item
|
|
||
| # Pivot the data to have consumers as columns | ||
| pivot_df = df.pivot(index='time', columns='name', values='size') | ||
| pivot_df = pivot_df.fillna(method='ffill').fillna(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pivot_df = pivot_df.fillna(method='ffill').fillna(0) | |
| pivot_df = pivot_df.ffill().fillna(0) |
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html - Deprecated since version 2.1.0: Use ffill or bfill instead.
| let tracing_enabled = spark_config.get_bool(COMET_TRACING_ENABLED); | ||
| let max_temp_directory_size = | ||
| spark_config.get_u64(COMET_MAX_TEMP_DIRECTORY_SIZE, 100 * 1024 * 1024 * 1024); | ||
| let logging_memory_pool = spark_config.get_bool(COMET_DEBUG_MEMORY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let logging_memory_pool = spark_config.get_bool(COMET_DEBUG_MEMORY); | |
| let debug_memory_enabled = spark_config.get_bool(COMET_DEBUG_MEMORY); |
| let memory_pool = | ||
| create_memory_pool(&memory_pool_config, task_memory_manager, task_attempt_id); | ||
|
|
||
| let memory_pool = if logging_memory_pool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let memory_pool = if logging_memory_pool { | |
| let memory_pool = if debug_memory_enabled { |


Which issue does this PR close?
Closes #.
Rationale for this change
Debugging.
From this, we can make pretty charts to help with comprehension:
What changes are included in this PR?
spark.comet.debug.memoryLoggingPoolthat is enabled when the new config is setHow are these changes tested?