-
Notifications
You must be signed in to change notification settings - Fork 28.9k
Commit 776ffd5
committed
[SPARK-53735][SDP] Hide server-side JVM stack traces by default in spark-pipelines output
### What changes were proposed in this pull request?
Hide server-side JVM stack traces by default in spark-pipelines output
### Why are the changes needed?
Error output for failing pipeline runs can be very verbose and show a bunch of info that is not relevant to the user.
### Does this PR introduce _any_ user-facing change?
Changes unreleased feature
### How was this patch tested?
- Ran `spark-pipelines run` and verified the output.
- Observed that explicitly setting the `spark.sql.connect.serverStacktrace.enabled` config brings the server-side stack traces back
Before:
```
2025-09-26 15:29:54: Failed to resolve flow: 'spark_catalog.default.rental_bike_trips'.
Error: [TABLE_OR_VIEW_NOT_FOUND] The table or view `spark_catalog`.`default`.`rental_bike_trips_raws` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01;
'UnresolvedRelation [spark_catalog, default, rental_bike_trips_raws], [], true
Traceback (most recent call last):
File "/Users/sandy.ryza/oss/python/pyspark/pipelines/cli.py", line 358, in <module>
run(
File "/Users/sandy.ryza/oss/python/pyspark/pipelines/cli.py", line 285, in run
handle_pipeline_events(result_iter)
File "/Users/sandy.ryza/oss/python/pyspark/pipelines/spark_connect_pipeline.py", line 53, in handle_pipeline_events
for result in iter:
File "/Users/sandy.ryza/oss/python/pyspark/sql/connect/client/core.py", line 1169, in execute_command_as_iterator
for response in self._execute_and_fetch_as_iterator(req, observations or {}):
File "/Users/sandy.ryza/oss/python/pyspark/sql/connect/client/core.py", line 1559, in _execute_and_fetch_as_iterator
self._handle_error(error)
File "/Users/sandy.ryza/oss/python/pyspark/sql/connect/client/core.py", line 1833, in _handle_error
self._handle_rpc_error(error)
File "/Users/sandy.ryza/oss/python/pyspark/sql/connect/client/core.py", line 1904, in _handle_rpc_error
raise convert_exception(
pyspark.errors.exceptions.connect.AnalysisException:
Failed to resolve flows in the pipeline.
A flow can fail to resolve because the flow itself contains errors or because it reads
from an upstream flow which failed to resolve.
Flows with errors: spark_catalog.default.rental_bike_trips
Flows that failed due to upstream errors:
To view the exceptions that were raised while resolving these flows, look for flow
failures that precede this log.
JVM stacktrace:
org.apache.spark.sql.pipelines.graph.UnresolvedPipelineException
at org.apache.spark.sql.pipelines.graph.GraphValidations.validateSuccessfulFlowAnalysis(GraphValidations.scala:284)
at org.apache.spark.sql.pipelines.graph.GraphValidations.validateSuccessfulFlowAnalysis$(GraphValidations.scala:247)
at org.apache.spark.sql.pipelines.graph.DataflowGraph.validateSuccessfulFlowAnalysis(DataflowGraph.scala:33)
at org.apache.spark.sql.pipelines.graph.DataflowGraph.$anonfun$validationFailure$1(DataflowGraph.scala:186)
at scala.util.Try$.apply(Try.scala:217)
at org.apache.spark.sql.pipelines.graph.DataflowGraph.validationFailure$lzycompute(DataflowGraph.scala:185)
at org.apache.spark.sql.pipelines.graph.DataflowGraph.validationFailure(DataflowGraph.scala:185)
at org.apache.spark.sql.pipelines.graph.DataflowGraph.validate(DataflowGraph.scala:173)
at org.apache.spark.sql.pipelines.graph.PipelineExecution.resolveGraph(PipelineExecution.scala:109)
at org.apache.spark.sql.pipelines.graph.PipelineExecution.startPipeline(PipelineExecution.scala:48)
at org.apache.spark.sql.pipelines.graph.PipelineExecution.runPipeline(PipelineExecution.scala:63)
at org.apache.spark.sql.connect.pipelines.PipelinesHandler$.startRun(PipelinesHandler.scala:294)
at org.apache.spark.sql.connect.pipelines.PipelinesHandler$.handlePipelinesCommand(PipelinesHandler.scala:93)
at org.apache.spark.sql.connect.planner.SparkConnectPlanner.handlePipelineCommand(SparkConnectPlanner.scala:2727)
at org.apache.spark.sql.connect.planner.SparkConnectPlanner.process(SparkConnectPlanner.scala:2697)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.handleCommand(ExecuteThreadRunner.scala:322)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1(ExecuteThreadRunner.scala:224)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.$anonfun$executeInternal$1$adapted(ExecuteThreadRunner.scala:196)
at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$2(SessionHolder.scala:349)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$1(SessionHolder.scala:349)
at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:187)
at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:102)
at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
at org.apache.spark.sql.connect.service.SessionHolder.withSession(SessionHolder.scala:348)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.executeInternal(ExecuteThreadRunner.scala:196)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner.org$apache$spark$sql$connect$execution$ExecuteThreadRunner$$execute(ExecuteThreadRunner.scala:125)
at org.apache.spark.sql.connect.execution.ExecuteThreadRunner$ExecutionThread.run(ExecuteThreadRunner.scala:347)
25/09/26 08:29:54 INFO ShutdownHookManager: Shutdown hook called
```
After:
```
2025-09-26 15:27:33: Failed to resolve flow: 'spark_catalog.default.rental_bike_trips'.
Error: [TABLE_OR_VIEW_NOT_FOUND] The table or view `spark_catalog`.`default`.`rental_bike_trips_raws` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01;
'UnresolvedRelation [spark_catalog, default, rental_bike_trips_raws], [], true
Traceback (most recent call last):
File "/Users/sandy.ryza/oss/python/pyspark/pipelines/cli.py", line 360, in <module>
run(
File "/Users/sandy.ryza/oss/python/pyspark/pipelines/cli.py", line 287, in run
handle_pipeline_events(result_iter)
File "/Users/sandy.ryza/oss/python/pyspark/pipelines/spark_connect_pipeline.py", line 53, in handle_pipeline_events
for result in iter:
File "/Users/sandy.ryza/oss/python/pyspark/sql/connect/client/core.py", line 1169, in execute_command_as_iterator
for response in self._execute_and_fetch_as_iterator(req, observations or {}):
File "/Users/sandy.ryza/oss/python/pyspark/sql/connect/client/core.py", line 1559, in _execute_and_fetch_as_iterator
self._handle_error(error)
File "/Users/sandy.ryza/oss/python/pyspark/sql/connect/client/core.py", line 1833, in _handle_error
self._handle_rpc_error(error)
File "/Users/sandy.ryza/oss/python/pyspark/sql/connect/client/core.py", line 1904, in _handle_rpc_error
raise convert_exception(
pyspark.errors.exceptions.connect.AnalysisException:
Failed to resolve flows in the pipeline.
A flow can fail to resolve because the flow itself contains errors or because it reads
from an upstream flow which failed to resolve.
Flows with errors: spark_catalog.default.rental_bike_trips
Flows that failed due to upstream errors:
To view the exceptions that were raised while resolving these flows, look for flow
failures that precede this log.
25/09/26 08:27:34 INFO ShutdownHookManager: Shutdown hook called
25/09/26 08:27:34 INFO ShutdownHookManager: Deleting directory /private/var/folders/1v/dqhbgmt10vl6v3tdlwvvx90r0000gp/T/localPyFiles-039afc43-9f5c-4a6f-ac7b-2437496ac7de
25/09/26 08:27:34 INFO ShutdownHookManager: Deleting directory /private/var/folders/1v/dqhbgmt10vl6v3tdlwvvx90r0000gp/T/spark-c67d94d5-4110-4268-af67-430b3ae82133
```
### Was this patch authored or co-authored using generative AI tooling?
Closes #52470 from sryza/hide-jvm-stack-trace.
Lead-authored-by: Sandy Ryza <[email protected]>
Co-authored-by: Sandy Ryza <[email protected]>
Signed-off-by: Sandy Ryza <[email protected]>1 parent 922adad commit 776ffd5Copy full SHA for 776ffd5
File tree
Expand file treeCollapse file tree
1 file changed
+3
-1
lines changedFilter options
- python/pyspark/pipelines
Expand file treeCollapse file tree
1 file changed
+3
-1
lines changedCollapse file: python/pyspark/pipelines/cli.py
python/pyspark/pipelines/cli.py
Copy file name to clipboardExpand all lines: python/pyspark/pipelines/cli.py+3-1Lines changed: 3 additions & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
295 | 295 |
| |
296 | 296 |
| |
297 | 297 |
| |
298 |
| - | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
299 | 301 |
| |
300 | 302 |
| |
301 | 303 |
| |
|
0 commit comments