Skip to content

Commit bfbeb8d

Browse files
Merge pull request #225967 from ynpandey/patch-10
Updated quickstart-spark-jobs.md
2 parents c91c8ca + 6116982 commit bfbeb8d

File tree

1 file changed

+17
-14
lines changed

1 file changed

+17
-14
lines changed

articles/machine-learning/quickstart-spark-jobs.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -30,12 +30,6 @@ In this quickstart guide, you'll learn how to submit a Spark job using Azure Mac
3030
- [Create an Azure Machine Learning compute instance](./concept-compute-instance.md#create).
3131
- [Install Azure Machine Learning CLI](./how-to-configure-cli.md?tabs=public).
3232

33-
> [!TIP]
34-
> You can submit a Spark job from:
35-
> - [terminal of an Azure Machine Learning compute instance](./how-to-access-terminal.md#access-a-terminal).
36-
> - terminal of [Visual Studio Code connected to an Azure Machine Learning compute instance](./how-to-set-up-vs-code-remote.md?tabs=studio).
37-
> - your local computer that has [the Azure Machine Learning CLI](./how-to-configure-cli.md?tabs=public) installed.
38-
3933
# [Python SDK](#tab/sdk)
4034
[!INCLUDE [sdk v2](../../includes/machine-learning-sdk-v2.md)]
4135
- An Azure subscription; if you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free) before you begin.
@@ -44,12 +38,6 @@ In this quickstart guide, you'll learn how to submit a Spark job using Azure Mac
4438
- [Configure your development environment](./how-to-configure-environment.md), or [create an Azure Machine Learning compute instance](./concept-compute-instance.md#create).
4539
- [Install Azure Machine Learning SDK for Python](/python/api/overview/azure/ai-ml-readme).
4640

47-
> [!TIP]
48-
> You can submit a Spark job from:
49-
> - an Azure Machine Learning Notebook connected to an Azure Machine Learning compute instance.
50-
> - [Visual Studio Code connected to an Azure Machine Learning compute instance](./how-to-set-up-vs-code-remote.md?tabs=studio).
51-
> - your local computer that has [the Azure Machine Learning SDK for Python](/python/api/overview/azure/ai-ml-readme) installed.
52-
5341
# [Studio UI](#tab/studio-ui)
5442
- An Azure subscription; if you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free) before you begin.
5543
- An Azure Machine Learning workspace. See [Create workspace resources](./quickstart-create-resources.md).
@@ -126,14 +114,22 @@ df.to_csv(args.wrangled_data, index_col="PassengerId")
126114
```
127115

128116
> [!NOTE]
129-
> This Python code sample uses `pyspark.pandas`, which is only supported by Spark runtime version 3.2.
117+
> - This Python code sample uses `pyspark.pandas`, which is only supported by Spark runtime version 3.2.
118+
> - Please ensure that `titanic.py` file is uploaded to a folder named `src`. The `src` folder should be located in the same directory where you have created the Python script/notebook or the YAML specification file defining the standalone Spark job.
130119
131120
The above script takes two arguments `--titanic_data` and `--wrangled_data`, which pass the path of input data and output folder respectively. The script uses `titanic.csv` file, which can be [found here](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/spark/data/titanic.csv). This file should be uploaded to the Azure Data Lake Storage (ADLS) Gen 2 storage account.
132121

133122
## Submit a standalone Spark job
134123

135124
# [CLI](#tab/cli)
136125
[!INCLUDE [cli v2](../../includes/machine-learning-cli-v2.md)]
126+
127+
> [!TIP]
128+
> You can submit a Spark job from:
129+
> - [terminal of an Azure Machine Learning compute instance](./how-to-access-terminal.md#access-a-terminal).
130+
> - terminal of [Visual Studio Code connected to an Azure Machine Learning compute instance](./how-to-set-up-vs-code-remote.md?tabs=studio).
131+
> - your local computer that has [the Azure Machine Learning CLI](./how-to-configure-cli.md?tabs=public) installed.
132+
137133
This example YAML specification shows a standalone Spark job. It uses an Azure Machine Learning Managed (Automatic) Spark compute, user identity passthrough, and input/output data URI in format `abfss://<FILE_SYSTEM_NAME>@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net/<PATH_TO_DATA>`:
138134

139135
```yaml
@@ -192,6 +188,13 @@ az ml job create --file <YAML_SPECIFICATION_FILE_NAME>.yaml --subscription <SUBS
192188

193189
# [Python SDK](#tab/sdk)
194190
[!INCLUDE [sdk v2](../../includes/machine-learning-sdk-v2.md)]
191+
192+
> [!TIP]
193+
> You can submit a Spark job from:
194+
> - an Azure Machine Learning Notebook connected to an Azure Machine Learning compute instance.
195+
> - [Visual Studio Code connected to an Azure Machine Learning compute instance](./how-to-set-up-vs-code-remote.md?tabs=studio).
196+
> - your local computer that has [the Azure Machine Learning SDK for Python](/python/api/overview/azure/ai-ml-readme) installed.
197+
195198
This Python code snippet shows the creation of a standalone Spark job, with an Azure Machine Learning Managed (Automatic) Spark compute, user identity passthrough, and input/output data URI in format `abfss://<FILE_SYSTEM_NAME>@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net/<PATH_TO_DATA>`:
196199

197200
```python
@@ -319,4 +322,4 @@ First, upload the parameterized Python code `titanic.py` to the Azure Blob stora
319322
- [Interactive Data Wrangling with Apache Spark in Azure Machine Learning (preview)](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
320323
- [Submit Spark jobs in Azure Machine Learning (preview)](./how-to-submit-spark-jobs.md)
321324
- [Code samples for Spark jobs using Azure Machine Learning CLI](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/spark)
322-
- [Code samples for Spark jobs using Azure Machine Learning Python SDK](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/spark)
325+
- [Code samples for Spark jobs using Azure Machine Learning Python SDK](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/spark)

0 commit comments

Comments
 (0)