Commit 04a40ab
[SPARK-53157][CORE] Decouple driver and executor polling intervals
### What changes were proposed in this pull request?
Add a config `spark.driver.metrics.pollingInterval`, and schedule driver polling interval / heartbeat at that schedule.
### Why are the changes needed?
Decouple driver and executor heartbeat intervals. Due to sampling frequencies in memory metric reporting intervals we do not have a 100% accurate view of stats at drivers and executors. This is particularly observed at the driver, where we don't have the benefit of a larger sample size of metrics from N executors in application.
Here we can provide a way increase (or change in general) the rate of collection of metrics at the driver, to aid in overcoming the sampling problem, without requiring users to also increase executor heartbeat frequencies.
### Does this PR introduce _any_ user-facing change?
Yes, introduces a spark config
### How was this patch tested?
Verified that metric collection was improved when sampling rates were increased, and verified that the number of events were expected when rate was changed.
Methodology for validating that increased driver heartbeat intervals would improve memory collection:
1. Using a 6gb driver heap, wrote a job to broadcast a table, gradually increasing the size of the table until OOM.
2. Increased driver memory to 10gb, large enough for the same broadcast to succeed.
3. Repeated this job and tracked the peak memory usage that was written to event log.
4. After repeated experiments, witnessed that the median peak heap typical usage was tracked at <=5GiB.
5. Added my change, and decreased the heartbeat interval.
6. Re-ran same jobs with 10gb heap, and saw that the typical peak memory usage tracked was ~8GiB, more accurately reflecting the increased memory needs.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #51885 from ForVic/vsunderl/driver_polling_interval.
Authored-by: ForVic <[email protected]>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>1 parent 4441fa1 commit 04a40ab
File tree
2 files changed
+9
-1
lines changed- core/src/main/scala/org/apache/spark
- internal/config
2 files changed
+9
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
614 | 614 | | |
615 | 615 | | |
616 | 616 | | |
617 | | - | |
| 617 | + | |
618 | 618 | | |
619 | 619 | | |
620 | 620 | | |
| |||
Lines changed: 8 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1201 | 1201 | | |
1202 | 1202 | | |
1203 | 1203 | | |
| 1204 | + | |
| 1205 | + | |
| 1206 | + | |
| 1207 | + | |
| 1208 | + | |
| 1209 | + | |
| 1210 | + | |
| 1211 | + | |
1204 | 1212 | | |
1205 | 1213 | | |
1206 | 1214 | | |
| |||
0 commit comments