Commit 60bcc71
[SPARK-51095][CORE][SQL] Include caller context for hdfs audit logs for calls from driver
### What changes were proposed in this pull request?
Add the caller context for calls from DRIVER to HDFS.
### Why are the changes needed?
HDFS audit logs include the ability to add a "caller context". Spark already leverages this to set the yarn application id, job id, task id, etc. but only on executors. The caller context is left empty on the spark driver. With introductions of Iceberg we have seen multiple scenarios in which files in HDFS are accessed from the driver. But since the caller context is left empty our ability to forensically analyse any issues has diminished. This PR includes sets caller context from the driver as well.
### Does this PR introduce _any_ user-facing change?
Yes, hdfs audit logs will now have caller context for calls from driver.
### How was this patch tested?
This patch was tested manually. After this change the hdfs audit logs now contain caller context from the driver.
```
2025-02-14 02:26:23,249 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/192.168.97.4 cmd=getfileinfo src=/warehouse/sample dst=null perm=null proto=rpc callerContext=SPARK_DRIVER_application_1739496632907_0005
2025-02-14 02:26:23,265 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/192.168.97.4 cmd=listStatus src=/warehouse/sample dst=null perm=null proto=rpc callerContext=SPARK_DRIVER_application_1739496632907_0005
2025-02-14 02:26:25,519 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/192.168.97.5 cmd=open src=/warehouse/sample/part-00000-dd473344-76b1-4179-91ae-d15a8da4a888-c000 dst=null perm=null proto=rpc callerContext=SPARK_TASK_application_1739496632907_0005_JId_0_SId_0_0_TId_0_0
2025-02-14 02:26:26,345 INFO FSNamesystem.audit: allowed=true ugi=root (auth:SIMPLE) ip=/192.168.97.5 cmd=open src=/warehouse/sample/part-00000-dd473344-76b1-4179-91ae-d15a8da4a888-c000 dst=null perm=null proto=rpc callerContext=SPARK_TASK_application_1739496632907_0005_JId_1_SId_1_0_TId_1_0
```
### Was this patch authored or co-authored using generative AI tooling?
No
Closes apache#49814 from sririshindra/master-SPARK-51095.
Lead-authored-by: Rishi <[email protected]>
Co-authored-by: Rishi <[email protected]>
Signed-off-by: attilapiros <[email protected]>1 parent 48fc0fb commit 60bcc71
File tree
2 files changed
+13
-0
lines changed- core/src
- main/scala/org/apache/spark
- test/scala/org/apache/spark
2 files changed
+13
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
722 | 722 | | |
723 | 723 | | |
724 | 724 | | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
725 | 728 | | |
726 | 729 | | |
727 | 730 | | |
| |||
Lines changed: 10 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
1460 | 1461 | | |
1461 | 1462 | | |
1462 | 1463 | | |
| 1464 | + | |
| 1465 | + | |
| 1466 | + | |
| 1467 | + | |
| 1468 | + | |
| 1469 | + | |
| 1470 | + | |
| 1471 | + | |
| 1472 | + | |
1463 | 1473 | | |
1464 | 1474 | | |
1465 | 1475 | | |
| |||
0 commit comments