Skip to content

Commit 319a36c

Browse files
Merge pull request #265049 from ynpandey/patch-46
Update how-to-submit-spark-jobs.md
2 parents 7f12a4a + 56f5d75 commit 319a36c

File tree

1 file changed

+1
-31
lines changed

1 file changed

+1
-31
lines changed

articles/machine-learning/how-to-submit-spark-jobs.md

Lines changed: 1 addition & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -721,37 +721,7 @@ To troubleshoot a Spark job, you can access the logs generated for that job in A
721721
1. Access the Spark job logs inside the **driver** and **library manager** folders
722722

723723
> [!NOTE]
724-
> To troubleshoot Spark jobs created during interactive data wrangling in a notebook session, select **Job details** near the top right corner of the notebook UI. A Spark jobs from an interactive notebook session is created under the experiment name **notebook-runs**.
725-
726-
## Improving serverless Spark session start-up time while using session-level Conda packages
727-
A serverless Spark session [*cold start* with session-level Conda packages](./apache-spark-azure-ml-concepts.md#inactivity-periods-and-tear-down-mechanism) may take 10 to 15 minutes. You can improve the Spark session *cold start* time by setting configuration variable `spark.hadoop.aml.enable_cache` to true. Declaring this configuration variable is optional. To ensure that the configuration variable was set successfully, check status of the latest job in the experiment `cachejobmamangement`. A successful job indicates that the cache was created successfully. A session *cold start* with session level Conda packages typically takes 10 to 15 minutes when the session starts for the first time. However, subsequent session *cold starts* typically take three to five minutes.
728-
729-
# [CLI](#tab/cli)
730-
[!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)]
731-
732-
Use the `conf` property in the standalone Spark job, or the Spark component YAML specification file, to define the configuration variable `spark.hadoop.aml.enable_cache`.
733-
734-
```yaml
735-
conf:
736-
spark.hadoop.aml.enable_cache: True
737-
```
738-
739-
# [Python SDK](#tab/sdk)
740-
[!INCLUDE [sdk v2](includes/machine-learning-sdk-v2.md)]
741-
742-
Use the `conf` parameter of the `azure.ai.ml.spark` function to define the configuration variable `spark.hadoop.aml.enable_cache`.
743-
744-
```python
745-
conf={"spark.hadoop.aml.enable_cache": "true"},
746-
```
747-
748-
# [Studio UI](#tab/ui)
749-
750-
Define configuration variable `spark.hadoop.aml.enable_cache` in the **Configure session** user interface, under **Configuration settings**. Set the value of this variable to `true`.
751-
752-
:::image type="content" source="./media/how-to-submit-spark-jobs/spark-session-enable-cache.png" lightbox="./media/how-to-submit-spark-jobs/spark-session-enable-cache.png" alt-text="Expandable diagram that shows Spark session configuration tag to enable cache.":::
753-
754-
---
724+
> To troubleshoot Spark jobs created during interactive data wrangling in a notebook session, select **Job details** near the top right corner of the notebook UI. A Spark jobs from an interactive notebook session is created under the experiment name **notebook-runs**.
755725

756726
## Next steps
757727

0 commit comments

Comments
 (0)