Skip to content

Commit ac8ccfa

Browse files
Merge pull request #201876 from Niharikadutta/nidutta/addLivyErrorDoc
Add doc to explain new error codes introduced for failing jobs in Synapse
2 parents b3a9b94 + 6f584e0 commit ac8ccfa

File tree

4 files changed

+82
-0
lines changed

4 files changed

+82
-0
lines changed
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
---
2+
title: Handle Livy errors on Apache Spark in Synapse
3+
description: Learn how to handle and interpret job failures on Apache Spark in Synapse Analytics.
4+
author: Niharikadutta
5+
ms.service: synapse-analytics
6+
ms.topic: overview
7+
ms.subservice: spark
8+
ms.date: 08/29/2022
9+
ms.author: nidutta
10+
---
11+
12+
# Interpret error codes in Synapse Analytics
13+
14+
There are many factors that can play into why a spark application fails in Azure Synapse Analytics today. For instance, it can be due to a system error or even a user related error. Previously, all errors corresponding to failing jobs on
15+
Synapse Analytics were surfaced with a generic error code displaying *LIVY_JOB_STATE_DEAD*. This error code gave no further insight into why the job has failed. It requires significant effort to identify the root cause by digging into the driver, executor, Spark Event, Livy logs, and find a resolution.
16+
17+
:::image type="content" source="./media/apache-spark-error-classification/apache-spark-old-error-view.png" alt-text="Screenshot of Apache Spark error code without detailed message." lightbox="./media/apache-spark-error-classification/apache-spark-old-error-view.png" border="true":::
18+
19+
We have introduced a more precise list of error codes that replaces the previous generic message. The new message describes the cause of failure. Whenever a job fails on Azure Synapse Analytics, the error handling feature parses and checks the logs on the backend to identify the root cause. It then displays a message to the user on the monitoring pane along with the steps to resolve the issue.
20+
21+
:::image type="content" source="./media/apache-spark-error-classification/apache-spark-new-error-view.png" alt-text="Screenshot of Apache Spark error code with detailed message." lightbox="./media/apache-spark-error-classification/apache-spark-new-error-view.png" border="true":::
22+
23+
## Enable error classification in Synapse
24+
25+
The error classification feature can be enabled or disabled by setting the following Spark configuration to `true` or `false` at the job or pool level:
26+
27+
`livy.rsc.synapse.error-classification.enabled`
28+
29+
The following section lists some error types that are currently supported. We are continuously refining and adding more to these error codes by improving our model.
30+
31+
## Error code categories
32+
33+
Each error code falls under one of the following four buckets:
34+
35+
1. **User** - Indicating a user error
36+
2. **System** - Indicating a system error
37+
3. **Ambiguous** - Could be either user or system error
38+
4. **Unknown** - No classification yet, most probably because the error type isn't included in the model
39+
40+
## Error code examples for each classification type
41+
42+
### Spark_User_TypeError_TypeNotIterable
43+
44+
In Python, the error `TypeError: argument of type 'insert type' is not iterable` occurs when the membership operator (in, not in) is used to validate the membership of a value in non iterable objects such as list, tuple, dictionary. This is usually due to the search of value in a non-iterable object. Possible solutions:
45+
46+
* Check if the value is present in the iterable object.
47+
* If you want to check one value to another, use logical operator instead of Membership Operator.
48+
* If the membership operator contains "None" value, it won't be able to iterate, and a null check or assigned default must be done.
49+
* Check if the type of the value used can actually be checked and the typing is correct.
50+
51+
### Spark_System_ABFS_OperationFailed
52+
53+
An operation with ADLS Gen2 has failed.
54+
55+
This error occurs typically due to a permissions issue.
56+
57+
Ensure that for all ADLS Gen2 resources referenced in the Spark job, has "Storage Blob Data Contributor" RBAC role on the storage accounts the job is expected to read and write from.
58+
Check the logs for this Spark application. Navigate to your Synapse Studio, select the **Monitor** tab from the left pane. From the **Activities** section, select **Apache Spark Applications** and find your Spark job from the list. For the ADLS Gen2 storage account name that is experiencing this issue, inspect the logs available in the **Logs** tab at the bottom part of this page.
59+
60+
### Spark_Ambiguous_ClassLoader_NoClassDefFound
61+
62+
A class required by the code could not be found when the script was run.
63+
64+
Please refer to the following pages for package management documentation:
65+
66+
For Notebook scenarios: [Apache Spark manage packages for interactive jobs](./apache-spark-manage-scala-packages.md)
67+
68+
For Spark batch scenarios (see section 6): [Apache Spark manage packages for batch jobs](./apache-spark-job-definitions.md#create-an-apache-spark-job-definition-for-apache-sparkscala )
69+
70+
Ensure that all the code dependencies are included in the JARs Synapse runs. If you do not or cannot include third party JARs with your own code, ensure that all dependencies are included in the workspace packages for the Spark pool you are executing code on, or they are included in the "Reference files" listing for the Spark batch submission. See the above documentation for more information.
71+
72+
### Spark_Unknown_Unknown_java.lang.Exception
73+
74+
An unknown failure, the model wasn't able to classify.
75+
76+
77+
The error codes (including and beyond the list shown above) along with the troubleshooting instructions on how to resolve the issue will show up on the Synapse Studio application error pane if this feature is enabled.
78+
79+
> [!NOTE]
80+
> If you built any tooling around the Synapse monitoring job that checks for a failing job by filtering the `LIVY_JOB_STATE_DEAD` error code, your app would no longer work. Because the returned error codes would be different as mentioned above. Modify any scripts accordingly in order to utilize this feature or disable the feature if it's not needed.
186 KB
Loading
160 KB
Loading

articles/synapse-analytics/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,6 +206,8 @@ items:
206206
href: troubleshoot/workspaces-created-by-sdk.md
207207
- name: Troubleshoot reading UTF-8 text
208208
href: troubleshoot/reading-utf8-text.md
209+
- name: Troubleshoot Spark job failures
210+
href: spark/apache-spark-handle-livy-error.md
209211
- name: Move workspace to new region
210212
href: how-to-move-workspace-from-one-region-to-another.md
211213
- name: Move workspace to another tenant

0 commit comments

Comments
 (0)