Skip to content

Commit 6d253e6

Browse files
committed
Minor changes to the Spark-related files . . .
1 parent 2839c17 commit 6d253e6

File tree

4 files changed

+21
-12
lines changed

4 files changed

+21
-12
lines changed

articles/machine-learning/how-to-manage-synapse-spark-pool.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -640,6 +640,6 @@ Some user scenarios may require access to a Synapse Spark Pool, during an Azure
640640

641641
## Next steps
642642

643-
- [Interactive Data Wrangling with Apache Spark in Azure Machine Learning (preview)](/interactive-data-wrangling-with-apache-spark-azure-ml.md)
643+
- [Interactive Data Wrangling with Apache Spark in Azure Machine Learning (preview)](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
644644

645-
- [Submit Spark jobs in Azure Machine Learning (preview)](/how-to-submit-spark-jobs.md)
645+
- [Submit Spark jobs in Azure Machine Learning (preview)](./how-to-submit-spark-jobs.md)

articles/machine-learning/how-to-submit-spark-jobs.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -16,21 +16,21 @@ ms.custom: template-how-to
1616

1717
[!INCLUDE [preview disclaimer](../../includes/machine-learning-preview-generic-disclaimer.md)]
1818

19-
Azure Machine Learning provides the ability to submit standalone machine learning jobs or creating a [machine learning pipeline](/concept-ml-pipelines.md) comprising multiple steps in a machine learning workflow. Azure Machine Learning supports creation of a standalone Spark job, and creation of a reusable Spark component that can be used in Azure Machine Learning pipelines. In this article you will learn how to submit Spark jobs using:
19+
Azure Machine Learning provides the ability to submit standalone machine learning jobs or creating a [machine learning pipeline](./concept-ml-pipelines.md) comprising multiple steps in a machine learning workflow. Azure Machine Learning supports creation of a standalone Spark job, and creation of a reusable Spark component that can be used in Azure Machine Learning pipelines. In this article you will learn how to submit Spark jobs using:
2020
- Azure Machine Learning studio UI
2121
- Azure Machine Learning CLI
2222
- Azure Machine Learning SDK
2323

2424
## Prerequisites
2525
- An Azure subscription; if you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free) before you begin
2626
- An Azure Machine Learning workspace. See [Create workspace resources](./quickstart-create-resources.md)
27-
- [An attached Synapse Spark pool in the Azure Machine Learning workspace](/how-to-manage-synapse-spark-pool.md).
27+
- [An attached Synapse Spark pool in the Azure Machine Learning workspace](./how-to-manage-synapse-spark-pool.md).
2828
- [Configure your development environment](./how-to-configure-environment.md), or [create an Azure Machine Learning compute instance](./concept-compute-instance.md#create)
2929
- [Install the Azure Machine Learning SDK for Python](/python/api/overview/azure/ml/installv2)
3030
- [Install Azure Machine Learning CLI](./how-to-configure-cli.md?tabs=public)
3131

3232
## Ensuring resource access for Spark jobs
33-
Spark jobs can use either user identity passthrough or a managed identity to access data and other resource. Different mechanisms for accessing resources while using attached Synapse Spark pool and Managed (Automatic) Spark compute are summarized in the following table.
33+
Spark jobs can use either user identity passthrough or a managed identity to access data and other resource. Different mechanisms for accessing resources while using attached Synapse Spark pool and Managed (Automatic) Spark compute are summarized in the following table.
3434

3535
|Spark pool|Supported identities|Default identity|
3636
| ---------- | -------------------- | ---------------- |
@@ -66,9 +66,9 @@ armclient PATCH https://management.azure.com/subscriptions/<SUBSCRIPTION_ID>/res
6666
> To ensure successful execution of spark job, the identity being used for the Spark job should be assigned **Contributor** and **Storage Blob Data Contributor** roles on the Azure storage account used for data input and output.
6767
6868
## Submit a standalone Spark job
69-
Once a Python script is developed by [interactive data wrangling](/interactive-data-wrangling-with-apache-spark-azure-ml.md), it can be used for submitting a batch job to process a larger volume of data after making necessary changes for parameterization of the Python script. A simple data wrangling batch job can be submitted as a standalone Spark job.
69+
Once a Python script is developed by [interactive data wrangling](./interactive-data-wrangling-with-apache-spark-azure-ml.md), it can be used for submitting a batch job to process a larger volume of data after making necessary changes for parameterization of the Python script. A simple data wrangling batch job can be submitted as a standalone Spark job.
7070

71-
A Spark job requires a Python script that takes arguments, which can be developed by modifying the Python code developed from [interactive data wrangling](/interactive-data-wrangling-with-apache-spark-azure-ml.md). A sample Python script is shown here.
71+
A Spark job requires a Python script that takes arguments, which can be developed by modifying the Python code developed from [interactive data wrangling](./interactive-data-wrangling-with-apache-spark-azure-ml.md). A sample Python script is shown here.
7272

7373
```python
7474

@@ -126,7 +126,7 @@ A standalone Spark job can be defined as a YAML specification file, which can be
126126
- `spark.dynamicAllocation.maxExecutors` - the maximum number of Spark executors instances, for dynamic allocation.
127127
- If dynamic allocation of executors is disabled, define this property:
128128
- `spark.executor.instances` - the number of Spark executor instances.
129-
- `environment` - an [Azure Machine Learning environment](/reference-yaml-environment) to run the job.
129+
- `environment` - an [Azure Machine Learning environment](./reference-yaml-environment.md) to run the job.
130130
- `args` - the command line arguments that should be passed to the job entry point Python script or class. See the YAML specification file provided below for an example.
131131
- `compute` - this property defines the name of an attached Synapse Spark pool, as shown in this example:
132132
```yaml
@@ -429,7 +429,7 @@ To submit a standalone Spark job using the Azure Machine Learning studio UI:
429429
1. Select **Create** to submit the standalone Spark job.
430430

431431
## Spark component in a pipeline job
432-
A Spark component allows the flexibility to use the same component in multiple [Azure Machine Learning pipelines](/concept-ml-pipelines) as a pipeline step.
432+
A Spark component allows the flexibility to use the same component in multiple [Azure Machine Learning pipelines](./concept-ml-pipelines.md) as a pipeline step.
433433

434434
# [Azure CLI](#tab/cli)
435435

@@ -501,7 +501,7 @@ conf:
501501
502502
```
503503

504-
The Spark component defined in the above YAML specification file can be used in an Azure Machine Learning pipeline job. See [pipeline job YAML schema](/reference-yaml-job-pipeline.md) to learn more about the YAML syntax that defines a pipeline job. This is an example YAML specification file for a pipeline job, with a Spark component:
504+
The Spark component defined in the above YAML specification file can be used in an Azure Machine Learning pipeline job. See [pipeline job YAML schema](./reference-yaml-job-pipeline.md) to learn more about the YAML syntax that defines a pipeline job. This is an example YAML specification file for a pipeline job, with a Spark component:
505505

506506
```yaml
507507

articles/machine-learning/interactive-data-wrangling-with-apache-spark-azure-ml.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -389,5 +389,5 @@ df.head()
389389

390390
- [Code samples for interactive data wrangling with Apache Spark in Azure Machine Learning](https://github.com/Azure/azureml-examples/tree/main/sdk/python/data-wrangling)
391391
- [Optimize Apache Spark jobs in Azure Synapse Analytics](../synapse-analytics/spark/apache-spark-performance.md)
392-
- [What are Azure Machine Learning pipelines?](/concept-ml-pipelines.md)
393-
- [Submit Spark jobs in Azure Machine Learning (preview)](/how-to-submit-spark-jobs.md)
392+
- [What are Azure Machine Learning pipelines?](./concept-ml-pipelines.md)
393+
- [Submit Spark jobs in Azure Machine Learning (preview)](./how-to-submit-spark-jobs.md)

articles/machine-learning/toc.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -317,6 +317,9 @@
317317
- name: Manage compute resources
318318
displayName: compute target, dsvm, Data Science Virtual Machine, local, cluster, ACI, container instance, Databricks, data lake, lake, HDI, HDInsight, low priority, managed identity
319319
href: how-to-create-attach-compute-studio.md
320+
- name: Attach and Manage a Synapse Spark pool
321+
displayName: Attach and Manage a Synapse Spark pool
322+
href: how-to-manage-synapse-spark-pool.md
320323
- name: AKS and Azure Arc-enabled Kubernetes
321324
items:
322325
- name: What is Kubernetes compute target
@@ -408,6 +411,12 @@
408411
- name: Read & write data in jobs
409412
displayName: Read & write data in jobs
410413
href: how-to-read-write-data-v2.md
414+
- name: Submit Spark jobs in Azure Machine Learning
415+
displayName: Submit Spark jobs in Azure Machine Learning
416+
href: how-to-submit-spark-jobs.md
417+
- name: Interactive Data Wrangling with Apache Spark
418+
displayName: Interactive Data Wrangling with Apache Spark
419+
href: interactive-data-wrangling-with-apache-spark-azure-ml.md
411420
- name: Data administration
412421
displayName: Data administration
413422
href: how-to-administrate-data-authentication.md

0 commit comments

Comments
 (0)