Skip to content

Commit 550a673

Browse files
authored
Merge pull request #190428 from jingyanjingyan/notebook
Add session config, reference and wording change
2 parents 9461701 + 1321cd6 commit 550a673

File tree

3 files changed

+83
-6
lines changed

3 files changed

+83
-6
lines changed

articles/synapse-analytics/spark/apache-spark-development-using-notebooks.md

Lines changed: 82 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -312,13 +312,13 @@ The number of tasks per each job or stage help you to identify the parallel leve
312312

313313
![Screenshot of spark-progress-indicator](./media/apache-spark-development-using-notebooks/synapse-spark-progress-indicator.png)
314314

315-
### Spark session config
315+
### Spark session configuration
316316

317317
You can specify the timeout duration, the number, and the size of executors to give to the current Spark session in **Configure session**. Restart the Spark session is for configuration changes to take effect. All cached notebook variables are cleared.
318318

319319
[![Screenshot of session-management](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-spark-session-management.png)](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-spark-session-management.png#lightbox)
320320

321-
#### Spark session config magic command
321+
#### Spark session configuration magic command
322322
You can also specify spark session settings via a magic command **%%configure**. The spark session needs to restart to make the settings effect. We recommend you to run the **%%configure** at the beginning of your notebook. Here is a sample, refer to https://github.com/cloudera/livy#request-body for full list of valid parameters.
323323

324324
```json
@@ -340,10 +340,56 @@ You can also specify spark session settings via a magic command **%%configure**.
340340
```
341341
> [!NOTE]
342342
> - "DriverMemory" and "ExecutorMemory" are recommended to set as same value in %%configure, so do "driverCores" and "executorCores".
343-
> - You can use Spark session config magic command in Synapse pipelines. It only takes effect when it's called in the top level. The %%configure used in referenced notebook is going to be ignored.
344-
> - The Spark configuration properties has to be used in the "conf" body. We do not support top level reference for the Spark configuration properties.
343+
> - You can use %%configure in Synapse pipelines, but if it's not set in the first code cell, the pipeline run will fail due to cannot restart session.
344+
> - The %%configure used in mssparkutils.notebook.run is going to be ignored but used in %run notebook will continue executing.
345+
> - The standard Spark configuration properties must be used in the "conf" body. We do not support first level reference for the Spark configuration properties.
346+
> - Some special spark properties including "spark.driver.cores", "spark.executor.cores", "spark.driver.memory", "spark.executor.memory", "spark.executor.instances" won't take effect in "conf" body.
345347
>
346348
349+
350+
#### Parameterized session configuration from pipeline
351+
352+
Parameterized session configuration allows you to replace the value in %%configure magic with Pipeline run (Notebook activity) parameters. When preparing %%configure code cell, you can override default values (also configurable, 4 and "2000" in the below example) with an object like this:
353+
354+
```
355+
{
356+
"activityParameterName": "paramterNameInPipelineNotebookActivity",
357+
"defaultValue": "defaultValueIfNoParamterFromPipelineNotebookActivity"
358+
}
359+
```
360+
361+
```python
362+
%%configure
363+
364+
{
365+
"driverCores":
366+
{
367+
"activityParameterName": "driverCoresFromNotebookActivity",
368+
"defaultValue": 4
369+
},
370+
"conf":
371+
{
372+
"livy.rsc.sql.num-rows":
373+
{
374+
"activityParameterName": "rows",
375+
"defaultValue": "2000"
376+
}
377+
}
378+
}
379+
```
380+
381+
Notebook will use default value if run a notebook in interactive mode directly or no parameter that match "activityParameterName" is given from Pipeline Notebook activity.
382+
383+
During the pipeline run mode, you can configure pipeline Notebook activity settings as below:
384+
![Screenshot of parameterized session configuration](./media/apache-spark-development-using-notebooks/parameterized-session-config.png)
385+
386+
If you want to change the session configuration, pipeline Notebook activity parameters name should be same as activityParameterName in the notebook. When run this pipeline, in this example driverCores in %%configure will be replaced by 8 and livy.rsc.sql.num-rows will be replaced by 4000.
387+
388+
> [!NOTE]
389+
> If run pipeline failed because of using this new %%configure magic, you can check more error information by running %%configure magic cell in the interactive mode of the notebook.
390+
>
391+
392+
347393
## Bring data to a notebook
348394

349395
You can load data from Azure Blob Storage, Azure Data Lake Store Gen 2, and SQL pool as shown in the code samples below.
@@ -510,10 +556,41 @@ Available line magics:
510556
[%lsmagic](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-lsmagic), [%time](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-time), [%timeit](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit), [%history](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-history), [%run](#notebook-reference), [%load](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-load)
511557

512558
Available cell magics:
513-
[%%time](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-time), [%%timeit](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit), [%%capture](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-capture), [%%writefile](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-writefile), [%%sql](#use-multiple-languages), [%%pyspark](#use-multiple-languages), [%%spark](#use-multiple-languages), [%%csharp](#use-multiple-languages), [%%html](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-html), [%%configure](#spark-session-config-magic-command)
559+
[%%time](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-time), [%%timeit](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit), [%%capture](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-capture), [%%writefile](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-writefile), [%%sql](#use-multiple-languages), [%%pyspark](#use-multiple-languages), [%%spark](#use-multiple-languages), [%%csharp](#use-multiple-languages), [%%html](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-html), [%%configure](#spark-session-configuration-magic-command)
514560

515561
---
516562

563+
## Reference unpublished notebook
564+
565+
Reference unpublished notebook is helpful when you want to debug "locally", when enabling this feature, notebook run will fetch the current content in web cache, if you run a cell including a reference notebooks statement, you will reference the presenting notebooks in the current notebook browser instead of a saved versions in cluster, that means the changes in your notebook editor can be referenced immediately by other notebooks without having to be published(Live mode) or committed(Git mode), by leveraging this approach you can easily avoid common libraries getting polluted during developing or debugging process.
566+
567+
For different cases comparison please check the table below:
568+
569+
Notice that [%run](./apache-spark-development-using-notebooks.md) and [mssparkutils.notebook.run](./microsoft-spark-utilities.md) has same behavior here. We use `%run` here as an example.
570+
571+
|Case|Disable|Enable|
572+
|----|-------|------|
573+
|**Live Mode**|||
574+
|- Nb1 (Published) <br/> `%run Nb1`|Run published version of Nb1|Run published version of Nb1|
575+
|- Nb1 (New) <br/> `%run Nb1`|Error|Run new Nb1|
576+
|- Nb1 (Previously published, edited) <br/> `%run Nb1`|Run **published** version of Nb1|Run **edited** version of Nb1|
577+
|**Git Mode**|||
578+
|- Nb1 (Published) <br/> `%run Nb1`|Run published version of Nb1|Run published version of Nb1|
579+
|- Nb1 (New) <br/> `%run Nb1`|Error|Run new Nb1|
580+
|- Nb1 (Not published, committed) <br/> `%run Nb1`|Error|Run committed Nb1|
581+
|- Nb1 (Previously published, committed) <br/> `%run Nb1`|Run **published** version of Nb1|Run **committed** version of Nb1|
582+
|- Nb1 (Previously published, new in current branch) <br/> `%run Nb1`|Run **published** version of Nb1|Run **new** Nb1|
583+
|- Nb1 (Not published, previously committed, edited) <br/> `%run Nb1`|Error|Run **edited** version of Nb1|
584+
|- Nb1 (Previously published and committed, edited) <br/> `%run Nb1`|Run **published** version of Nb1|Run **edited** version of Nb1|
585+
586+
587+
## Conclusion
588+
589+
* If disabled, always run **published** version.
590+
* If enabled, priority is: edited / new > committed > published.
591+
592+
593+
517594
## Integrate a notebook
518595

519596
### Add a notebook to a pipeline

articles/synapse-analytics/spark/apache-spark-notebook-concept.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ To learn more on how you can create and manage notebooks, see the following arti
3434
- [Use multiple languages using magic commands and temporary tables](./spark/../apache-spark-development-using-notebooks.md#integrate-a-notebook)
3535
- [Use cell magic commands](./spark/../apache-spark-development-using-notebooks.md#magic-commands)
3636
- Development
37-
- [Configure Spark session settings](./spark/../apache-spark-development-using-notebooks.md#spark-session-config)
37+
- [Configure Spark session settings](./spark/../apache-spark-development-using-notebooks.md#spark-session-configuration)
3838
- [Use Microsoft Spark utilities](./spark/../microsoft-spark-utilities.md)
3939
- [Visualize data using notebooks and libraries](./spark/../apache-spark-data-visualization.md)
4040
- [Integrate a notebook into pipelines](./spark/../apache-spark-development-using-notebooks.md#integrate-a-notebook)
26.3 KB
Loading

0 commit comments

Comments
 (0)