You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -19,7 +19,7 @@ Data flows provide an entirely visual experience with no coding required. Your d
19
19
20
20
## Get started
21
21
22
-
Data flows are created from the Develop pane in Synapse studio. To create a data flow, select the plus sign next to **Develop**, and then select **Data Flow**.
22
+
Data flows are created from the **Develop** pane in Synapse studio. To create a data flow, select the plus sign next to **Develop**, and then select **Data Flow**.
23
23
24
24

25
25
@@ -61,7 +61,7 @@ The **Inspect** tab provides a view into the metadata of the data stream that yo
61
61
62
62

63
63
64
-
As you change the shape of your data through transformations, you'll see the metadata changes flow in the **Inspect** pane. If there isn't a defined schema in your source transformation, then metadata won't be visible in the **Inspect** pane. Lack of metadata is common in schema drift scenarios.
64
+
As you change the shape of your data through transformations, you see the metadata changes flow in the **Inspect** pane. If there isn't a defined schema in your source transformation, then metadata isn't visible in the **Inspect** pane. Lack of metadata is common in schema drift scenarios.
|**Monitoring**| Monitoring of Spark Jobs for Data Flow | ✗ | ✓ *Leverage the Synapse Spark pools*|
27
+
|**Monitoring**| Monitoring of Spark Jobs for Data Flow | ✗ | ✓ *Use the Synapse Spark pools*|
28
28
29
29
Get started with data integration in your Synapse workspace by learning how to [ingest data into an Azure Data Lake Storage gen2 account](data-integration-data-lake.md).
Copy file name to clipboardExpand all lines: articles/synapse-analytics/quickstart-transform-data-using-spark-job-definition.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,12 +7,12 @@ ms.reviewer: makromer
7
7
ms.service: azure-synapse-analytics
8
8
ms.subservice: pipeline
9
9
ms.topic: quickstart
10
-
ms.date: 02/15/2022
10
+
ms.date: 12/11/2024
11
11
---
12
12
13
13
# Quickstart: Transform data using Apache Spark job definition
14
14
15
-
In this quickstart, you'll use Azure Synapse Analytics to create a pipeline using Apache Spark job definition.
15
+
In this quickstart, you use Azure Synapse Analytics to create a pipeline using Apache Spark job definition.
16
16
17
17
## Prerequisites
18
18
@@ -27,13 +27,13 @@ After your Azure Synapse workspace is created, you have two ways to open Synapse
27
27
* Open your Synapse workspace in the [Azure portal](https://portal.azure.com). Select **Open** on the Open Synapse Studio card under **Getting started**.
28
28
* Open [Azure Synapse Analytics](https://web.azuresynapse.net/) and sign in to your workspace.
29
29
30
-
In this quickstart, we use the workspace named "sampletest" as an example. It will automatically navigate you to the Synapse Studio home page.
30
+
In this quickstart, we use the workspace named "sampletest" as an example.
31
31
32
32

33
33
34
34
## Create a pipeline with an Apache Spark job definition
35
35
36
-
A pipeline contains the logical flow for an execution of a set of activities. In this section, you'll create a pipeline that contains an Apache Spark job definition activity.
36
+
A pipeline contains the logical flow for an execution of a set of activities. In this section, you create a pipeline that contains an Apache Spark job definition activity.
37
37
38
38
1. Go to the **Integrate** tab. Select the plus icon next to the pipelines header and select **Pipeline**.
39
39
@@ -48,7 +48,7 @@ A pipeline contains the logical flow for an execution of a set of activities. In
48
48
49
49
## Set Apache Spark job definition canvas
50
50
51
-
Once you create your Apache Spark job definition, you'll be automatically sent to the Spark job definition canvas.
51
+
Once you create your Apache Spark job definition, you're automatically sent to the Spark job definition canvas.
52
52
53
53
### General settings
54
54
@@ -64,9 +64,9 @@ Once you create your Apache Spark job definition, you'll be automatically sent t
64
64
65
65
6. Retry interval: The number of seconds between each retry attempt.
66
66
67
-
7. Secure output: When checked, output from the activity won't be captured in logging.
67
+
7. Secure output: When checked, output from the activity isn't captured in logging.
68
68
69
-
8. Secure input: When checked, input from the activity won't be captured in logging.
69
+
8. Secure input: When checked, input from the activity isn't captured in logging.
@@ -76,16 +76,16 @@ On this panel, you can reference to the Spark job definition to run.
76
76
77
77
* Expand the Spark job definition list, you can choose an existing Apache Spark job definition. You can also create a new Apache Spark job definition by selecting the **New** button to reference the Spark job definition to be run.
78
78
79
-
* (Optional) You can fill in information for Apache Spark job definition. If the following settings are empty, the settings of the spark job definition itself will be used to run; if the following settings aren't empty, these settings will replace the settings of the spark job definition itself.
79
+
* (Optional) You can fill in information for Apache Spark job definition. If the following settings are empty, the settings of the spark job definition itself is used to run; if the following settings aren't empty, these settings replace the settings of the spark job definition itself.
80
80
81
81
| Property | Description |
82
82
| ----- | ----- |
83
83
|Main definition file| The main file used for the job. Select a PY/JAR/ZIP file from your storage. You can select **Upload file** to upload the file to a storage account. <br> Sample: `abfss://…/path/to/wordcount.jar`|
84
-
| References from subfolders | Scanning subfolders from the root folder of the main definition file, these files will be added as reference files. The folders named "jars", "pyFiles", "files" or "archives" will be scanned, and the folders name are case sensitive. |
84
+
| References from subfolders | Scanning subfolders from the root folder of the main definition file, these files are added as reference files. The folders named "jars", "pyFiles", "files" or "archives" are scanned, and the folders name are case sensitive. |
85
85
|Main class name| The fully qualified identifier or the main class that is in the main definition file. <br> Sample: `WordCount`|
86
-
|Command-line arguments| You can add command-line arguments by clicking the **New** button. It should be noted that adding command-line arguments will override the command-line arguments defined by the Spark job definition. <br> *Sample: `abfss://…/path/to/shakespeare.txt``abfss://…/path/to/result`* <br> |
86
+
|Command-line arguments| You can add command-line arguments by clicking the **New** button. It should be noted that adding command-line arguments override the command-line arguments defined by the Spark job definition. <br> *Sample: `abfss://…/path/to/shakespeare.txt``abfss://…/path/to/result`* <br> |
87
87
|Apache Spark pool| You can select Apache Spark pool from the list.|
88
-
|Python code reference| Other Python code files used for reference in the main definition file. <br> It supports passing files (.py, .py3, .zip) to the "pyFiles" property. It will override the "pyFiles" property defined in Spark job definition. <br>|
88
+
|Python code reference| Other Python code files used for reference in the main definition file. <br> It supports passing files (.py, .py3, .zip) to the "pyFiles" property. It overrides the "pyFiles" property defined in Spark job definition. <br>|
89
89
|Reference files | Other files used for reference in the main definition file. |
90
90
|Dynamically allocate executors| This setting maps to the dynamic allocation property in Spark configuration for Spark Application executors allocation.|
91
91
|Min executors| Min number of executors to be allocated in the specified Spark pool for the job.|
0 commit comments