You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/spark/apache-spark-development-using-notebooks.md
+82-5Lines changed: 82 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -312,13 +312,13 @@ The number of tasks per each job or stage help you to identify the parallel leve
312
312
313
313

314
314
315
-
### Spark session config
315
+
### Spark session configuration
316
316
317
317
You can specify the timeout duration, the number, and the size of executors to give to the current Spark session in **Configure session**. Restart the Spark session is for configuration changes to take effect. All cached notebook variables are cleared.
318
318
319
319
[](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-spark-session-management.png#lightbox)
320
320
321
-
#### Spark session config magic command
321
+
#### Spark session configuration magic command
322
322
You can also specify spark session settings via a magic command **%%configure**. The spark session needs to restart to make the settings effect. We recommend you to run the **%%configure** at the beginning of your notebook. Here is a sample, refer to https://github.com/cloudera/livy#request-body for full list of valid parameters.
323
323
324
324
```json
@@ -340,10 +340,56 @@ You can also specify spark session settings via a magic command **%%configure**.
340
340
```
341
341
> [!NOTE]
342
342
> - "DriverMemory" and "ExecutorMemory" are recommended to set as same value in %%configure, so do "driverCores" and "executorCores".
343
-
> - You can use Spark session config magic command in Synapse pipelines. It only takes effect when it's called in the top level. The %%configure used in referenced notebook is going to be ignored.
344
-
> - The Spark configuration properties has to be used in the "conf" body. We do not support top level reference for the Spark configuration properties.
343
+
> - You can use %%configure in Synapse pipelines, but if it's not set in the first code cell, the pipeline run will fail due to cannot restart session.
344
+
> - The %%configure used in mssparkutils.notebook.run is going to be ignored but used in %run notebook will continue executing.
345
+
> - The standard Spark configuration properties must be used in the "conf" body. We do not support first level reference for the Spark configuration properties.
346
+
> - Some special spark properties including "spark.driver.cores", "spark.executor.cores", "spark.driver.memory", "spark.executor.memory", "spark.executor.instances" won't take effect in "conf" body.
345
347
>
346
348
349
+
350
+
#### Parameterized session configuration from pipeline
351
+
352
+
Parameterized session configuration allows you to replace the value in %%configure magic with Pipeline run (Notebook activity) parameters. When preparing %%configure code cell, you can override default values (also configurable, 4 and "2000" in the below example) with an object like this:
Notebook will use default value if run a notebook in interactive mode directly or no parameter that match "activityParameterName" is given from Pipeline Notebook activity.
382
+
383
+
During the pipeline run mode, you can configure pipeline Notebook activity settings as below:
384
+

385
+
386
+
If you want to change the session configuration, pipeline Notebook activity parameters name should be same as activityParameterName in the notebook. When run this pipeline, in this example driverCores in %%configure will be replaced by 8 and livy.rsc.sql.num-rows will be replaced by 4000.
387
+
388
+
> [!NOTE]
389
+
> If run pipeline failed because of using this new %%configure magic, you can check more error information by running %%configure magic cell in the interactive mode of the notebook.
390
+
>
391
+
392
+
347
393
## Bring data to a notebook
348
394
349
395
You can load data from Azure Blob Storage, Azure Data Lake Store Gen 2, and SQL pool as shown in the code samples below.
Reference unpublished notebook is helpful when you want to debug "locally", when enabling this feature, notebook run will fetch the current content in web cache, if you run a cell including a reference notebooks statement, you will reference the presenting notebooks in the current notebook browser instead of a saved versions in cluster, that means the changes in your notebook editor can be referenced immediately by other notebooks without having to be published(Live mode) or committed(Git mode), by leveraging this approach you can easily avoid common libraries getting polluted during developing or debugging process.
566
+
567
+
For different cases comparison please check the table below:
568
+
569
+
Notice that [%run](./apache-spark-development-using-notebooks.md) and [mssparkutils.notebook.run](./microsoft-spark-utilities.md) has same behavior here. We use `%run` here as an example.
570
+
571
+
|Case|Disable|Enable|
572
+
|----|-------|------|
573
+
|**Live Mode**|||
574
+
|- Nb1 (Published) <br/> `%run Nb1`|Run published version of Nb1|Run published version of Nb1|
575
+
|- Nb1 (New) <br/> `%run Nb1`|Error|Run new Nb1|
576
+
|- Nb1 (Previously published, edited) <br/> `%run Nb1`|Run **published** version of Nb1|Run **edited** version of Nb1|
577
+
|**Git Mode**|||
578
+
|- Nb1 (Published) <br/> `%run Nb1`|Run published version of Nb1|Run published version of Nb1|
0 commit comments