Merge pull request #190428 from jingyanjingyan/notebook

tamarakhader · web-flow · commit 550a67364752 · 2022-03-07T11:48:52.000+03:00
Add session config, reference and wording change
diff --git a/articles/synapse-analytics/spark/apache-spark-development-using-notebooks.md b/articles/synapse-analytics/spark/apache-spark-development-using-notebooks.md
@@ -312,13 +312,13 @@ The number of tasks per each job or stage help you to identify the parallel leve
 
 ![Screenshot of spark-progress-indicator](./media/apache-spark-development-using-notebooks/synapse-spark-progress-indicator.png)
 
-### Spark session config
+### Spark session configuration
 
 You can specify the timeout duration, the number, and the size of executors to give to the current Spark session in **Configure session**. Restart the Spark session is for configuration changes to take effect. All cached notebook variables are cleared.
 
 [![Screenshot of session-management](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-spark-session-management.png)](./media/apache-spark-development-using-notebooks/synapse-azure-notebook-spark-session-management.png#lightbox)
 
-#### Spark session config magic command
+#### Spark session configuration magic command
 You can also specify spark session settings via a magic command **%%configure**. The spark session needs to restart to make the settings effect. We recommend you to run the **%%configure** at the beginning of your notebook. Here is a sample, refer to https://github.com/cloudera/livy#request-body for full list of valid parameters. 
 
 ```json
@@ -340,10 +340,56 @@ You can also specify spark session settings via a magic command **%%configure**.
 ```
 > [!NOTE]
 > - "DriverMemory" and "ExecutorMemory" are recommended to set as same value in %%configure, so do "driverCores" and "executorCores".
-> - You can use Spark session config magic command in Synapse pipelines. It only takes effect when it's called in the top level. The %%configure used in referenced notebook is going to be ignored.
-> - The Spark configuration properties has to be used in the "conf" body. We do not support top level reference for the Spark configuration properties.
+> - You can use %%configure in Synapse pipelines, but if it's not set in the first code cell, the pipeline run will fail due to cannot restart session.
+> - The %%configure used in mssparkutils.notebook.run is going to be ignored but used in %run notebook will continue executing.
+> - The standard Spark configuration properties must be used in the "conf" body. We do not support first level reference for the Spark configuration properties.
+> - Some special spark properties including "spark.driver.cores", "spark.executor.cores", "spark.driver.memory", "spark.executor.memory", "spark.executor.instances" won't take effect in "conf" body.
 >
 
+
+#### Parameterized session configuration from pipeline  
+
+Parameterized session configuration allows you to replace the value in %%configure magic with Pipeline run (Notebook activity) parameters. When preparing %%configure code cell, you can override default values (also configurable, 4 and "2000" in the below example) with an object like this:
+
+```
+{
+      "activityParameterName": "paramterNameInPipelineNotebookActivity",
+      "defaultValue": "defaultValueIfNoParamterFromPipelineNotebookActivity"
+} 
+```
+
+```python
+%%configure  
+
+{ 
+    "driverCores": 
+    { 
+        "activityParameterName": "driverCoresFromNotebookActivity", 
+        "defaultValue": 4 
+    }, 
+    "conf": 
+    { 
+        "livy.rsc.sql.num-rows": 
+        { 
+            "activityParameterName": "rows", 
+            "defaultValue": "2000" 
+        } 
+    } 
+} 
+```
+
+Notebook will use default value if run a notebook in interactive mode directly or no parameter that match "activityParameterName" is given from Pipeline Notebook activity.
+
+During the pipeline run mode, you can configure pipeline Notebook activity settings as below:
+![Screenshot of parameterized session configuration](./media/apache-spark-development-using-notebooks/parameterized-session-config.png)
+
+If you want to change the session configuration, pipeline Notebook activity parameters name should be same as activityParameterName in the notebook. When run this pipeline, in this example driverCores in %%configure will be replaced by 8 and livy.rsc.sql.num-rows will be replaced by 4000.
+
+> [!NOTE]
+>  If run pipeline failed because of using this new %%configure magic, you can check more error information by running %%configure magic cell in the interactive mode of the notebook. 
+>
+
+
 ## Bring data to a notebook
 
 You can load data from Azure Blob Storage, Azure Data Lake Store Gen 2, and SQL pool as shown in the code samples below.
@@ -510,10 +556,41 @@ Available line magics:
 [%lsmagic](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-lsmagic), [%time](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-time), [%timeit](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit), [%history](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-history), [%run](#notebook-reference), [%load](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-load)
 
 Available cell magics:
-[%%time](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-time), [%%timeit](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit), [%%capture](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-capture), [%%writefile](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-writefile), [%%sql](#use-multiple-languages), [%%pyspark](#use-multiple-languages), [%%spark](#use-multiple-languages), [%%csharp](#use-multiple-languages), [%%html](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-html), [%%configure](#spark-session-config-magic-command)
+[%%time](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-time), [%%timeit](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit), [%%capture](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-capture), [%%writefile](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-writefile), [%%sql](#use-multiple-languages), [%%pyspark](#use-multiple-languages), [%%spark](#use-multiple-languages), [%%csharp](#use-multiple-languages), [%%html](https://ipython.readthedocs.io/en/stable/interactive/magics.html#cellmagic-html), [%%configure](#spark-session-configuration-magic-command)
 
 --- 
 
+## Reference unpublished notebook
+
+Reference unpublished notebook is helpful when you want to debug "locally", when enabling this feature, notebook run will fetch the current content in web cache, if you run a cell including a reference notebooks statement, you will reference the presenting notebooks in the current notebook browser instead of a saved versions in cluster, that means the changes in your notebook editor can be referenced immediately by other notebooks without having to be published(Live mode) or committed(Git mode), by leveraging this approach you can easily avoid common libraries getting polluted during developing or debugging process. 
+
+For different cases comparison please check the table below:  
+
+Notice that [%run](./apache-spark-development-using-notebooks.md) and [mssparkutils.notebook.run](./microsoft-spark-utilities.md) has same behavior here. We use `%run` here as an example. 
+
+|Case|Disable|Enable| 
+|----|-------|------| 
+|**Live Mode**||| 
+|- Nb1 (Published) <br/> `%run Nb1`|Run published version of Nb1|Run published version of Nb1| 
+|- Nb1 (New) <br/> `%run Nb1`|Error|Run new Nb1| 
+|- Nb1 (Previously published, edited) <br/> `%run Nb1`|Run **published** version of Nb1|Run **edited** version of Nb1| 
+|**Git Mode**||| 
+|- Nb1 (Published) <br/> `%run Nb1`|Run published version of Nb1|Run published version of Nb1| 
+|- Nb1 (New) <br/> `%run Nb1`|Error|Run new Nb1| 
+|- Nb1 (Not published, committed) <br/> `%run Nb1`|Error|Run committed Nb1| 
+|- Nb1 (Previously published, committed) <br/> `%run Nb1`|Run **published** version of Nb1|Run **committed** version of Nb1| 
+|- Nb1 (Previously published, new in current branch) <br/> `%run Nb1`|Run **published** version of Nb1|Run **new** Nb1| 
+|- Nb1 (Not published, previously committed, edited) <br/> `%run Nb1`|Error|Run **edited** version of Nb1| 
+|- Nb1 (Previously published and committed, edited) <br/> `%run Nb1`|Run **published** version of Nb1|Run **edited** version of Nb1| 
+
+ 
+## Conclusion 
+
+* If disabled, always run **published** version. 
+* If enabled, priority is: edited / new > committed > published. 
+
+
+
 ## Integrate a notebook
 
 ### Add a notebook to a pipeline
diff --git a/articles/synapse-analytics/spark/apache-spark-notebook-concept.md b/articles/synapse-analytics/spark/apache-spark-notebook-concept.md
@@ -34,7 +34,7 @@ To learn more on how you can create and manage notebooks, see the following arti
     - [Use multiple languages using magic commands and temporary tables](./spark/../apache-spark-development-using-notebooks.md#integrate-a-notebook)
     - [Use cell magic commands](./spark/../apache-spark-development-using-notebooks.md#magic-commands)
   - Development
-    - [Configure Spark session settings](./spark/../apache-spark-development-using-notebooks.md#spark-session-config)
+    - [Configure Spark session settings](./spark/../apache-spark-development-using-notebooks.md#spark-session-configuration)
     - [Use Microsoft Spark utilities](./spark/../microsoft-spark-utilities.md)
     - [Visualize data using notebooks and libraries](./spark/../apache-spark-data-visualization.md)
     - [Integrate a notebook into pipelines](./spark/../apache-spark-development-using-notebooks.md#integrate-a-notebook)
diff --git a/articles/synapse-analytics/spark/media/apache-spark-development-using-notebooks/parameterized-session-config.png b/articles/synapse-analytics/spark/media/apache-spark-development-using-notebooks/parameterized-session-config.png