You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/spark/apache-spark-development-using-notebooks.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,9 +4,9 @@ description: In this article, you learn how to create and develop Synapse notebo
4
4
services: synapse analytics
5
5
author: JeneZhang
6
6
ms.service: azure-synapse-analytics
7
-
ms.topic: conceptual
7
+
ms.topic: how-to
8
8
ms.subservice: spark
9
-
ms.date: 05/08/2021
9
+
ms.date: 09/11/2024
10
10
ms.author: jingzh
11
11
ms.custom: devx-track-python
12
12
---
@@ -26,7 +26,7 @@ This article describes how to use notebooks in Synapse Studio.
26
26
27
27
## Create a notebook
28
28
29
-
You can create a new notebook or import an existing notebook to a Synapse workspace from **Object Explorer**. Select **Develop**, right-click **Notebooks**, and then select **New notebook** or **Import**. Synapse notebooks recognize standard Jupyter Notebook IPYNB files.
29
+
You can create a new notebook or import an existing notebook to a Synapse workspace from **Object Explorer**. Select the **Develop** menu. Select the **+** button and select **Notebook** or right-click **Notebooks**, and then select **New notebook** or **Import**. Synapse notebooks recognize standard Jupyter Notebook IPYNB files.
30
30
31
31

32
32
@@ -188,7 +188,7 @@ To move a cell, select the left side of the cell and drag the cell to the desire
188
188
189
189
### <aname = "move-a-cell"></a>Copy a cell
190
190
191
-
To copy a cell, create a new cell, select all the text in your original cell, copy the text, and paste the text into the new cell. When your cell is in edit mode, traditional keyboard shortcuts to select all text are limited to the cell.
191
+
To copy a cell, first create a new cell, then select all the text in your original cell, copy the text, and paste the text into the new cell. When your cell is in edit mode, traditional keyboard shortcuts to select all text are limited to the cell.
192
192
193
193
>[!TIP]
194
194
>Synapse notebooks also provide [snippits](#code-snippets) of commonly used code patterns.
@@ -269,7 +269,7 @@ The `%run` magic command has these limitations:
269
269
* The command supports nested calls but not recursive calls.
270
270
* The command supports passing an absolute path or notebook name only as a parameter. It doesn't support relative paths.
271
271
* The command currently supports only four parameter value types: `int`, `float`, `bool`, and `string`. It doesn't support variable replacement operations.
272
-
* The referenced notebooks must be published. You need to publish the notebooks to reference them, unless you select the [option to enable an unpublished notebook reference](#reference-unpublished-notebook). Synapse Studio does not recognize the unpublished notebooks from the Git repo.
272
+
* The referenced notebooks must be published. You need to publish the notebooks to reference them, unless you select the [option to enable an unpublished notebook reference](#reference-unpublished-notebook). Synapse Studio doesn't recognize the unpublished notebooks from the Git repo.
273
273
* Referenced notebooks don't support statement depths larger than five.
274
274
275
275
### Use the variable explorer
@@ -299,7 +299,7 @@ The number of tasks for each job or stage helps you identify the parallel level
299
299
300
300
### <aname = "spark-session-configuration"></a>Configure a Spark session
301
301
302
-
On the **Configure session** pane, you can specify the timeout duration, the number of executors, and the size of executors to give to the current Spark session. Restart the Spark session for configuration changes to take effect. All cached notebook variables are cleared.
302
+
On the **Configure session** pane, which you can find by selecting the gear icon at the top of the notebook, you can specify the timeout duration, the number of executors, and the size of executors to give to the current Spark session. Restart the Spark session for configuration changes to take effect. All cached notebook variables are cleared.
303
303
304
304
You can also create a configuration from the Apache Spark configuration or select an existing configuration. For details, refer to [Manage Apache Spark configuration](../../synapse-analytics/spark/apache-spark-azure-create-spark-configuration.md).
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql/develop-openrowset.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ author: filippopovic
5
5
ms.service: azure-synapse-analytics
6
6
ms.topic: overview
7
7
ms.subservice: sql
8
-
ms.date: 03/23/2022
8
+
ms.date: 09/11/2023
9
9
ms.author: fipopovi
10
10
ms.reviewer: whhender
11
11
---
@@ -21,7 +21,7 @@ The `OPENROWSET` function can be referenced in the `FROM` clause of a query as i
21
21
22
22
## Data source
23
23
24
-
OPENROWSET function in Synapse SQL reads the content of the file(s) from a data source. The data source is an Azure storage account and it can be explicitly referenced in the `OPENROWSET` function or can be dynamically inferred from URL of the files that you want to read.
24
+
OPENROWSET function in Synapse SQL reads the content of the files from a data source. The data source is an Azure storage account and it can be explicitly referenced in the `OPENROWSET` function or can be dynamically inferred from URL of the files that you want to read.
25
25
The `OPENROWSET` function can optionally contain a `DATA_SOURCE` parameter to specify the data source that contains files.
26
26
-`OPENROWSET` without `DATA_SOURCE` can be used to directly read the contents of the files from the URL location specified as `BULK` option:
27
27
@@ -31,7 +31,7 @@ The `OPENROWSET` function can optionally contain a `DATA_SOURCE` parameter to sp
31
31
FORMAT ='PARQUET') AS [file]
32
32
```
33
33
34
-
This is a quick and easy way to read the content of the files without pre-configuration. This option enables you to use the basic authentication option to access the storage (Microsoft Entra passthrough for Microsoft Entra logins and SAS token for SQL logins).
34
+
This is a quick and easy way to read the content of the files without preconfiguration. This option enables you to use the basic authentication option to access the storage (Microsoft Entra passthrough for Microsoft Entra logins and SAS token for SQL logins).
35
35
36
36
-`OPENROWSET` with `DATA_SOURCE` can be used to access files on specified storage account:
37
37
@@ -46,7 +46,7 @@ This is a quick and easy way to read the content of the files without pre-config
46
46
This option enables you to configure location of the storage account in the data source and specify the authentication method that should be used to access storage.
47
47
48
48
> [!IMPORTANT]
49
-
>`OPENROWSET` without `DATA_SOURCE` provides quick and easy way to access the storage files but offers limited authentication options. As an example, Microsoft Entra principals can access files only using their [Microsoft Entra identity](develop-storage-files-storage-access-control.md?tabs=user-identity) or publicly available files. If you need more powerful authentication options, use `DATA_SOURCE` option and define credential that you want to use to access storage.
49
+
>`OPENROWSET` without `DATA_SOURCE` provides quick and easy way to access the storage files but offers limited authentication options. As an example, Microsoft Entra principals can access files only using their [Microsoft Entra identity](develop-storage-files-storage-access-control.md?tabs=user-identity#supported-storage-authorization-types) or publicly available files. If you need more powerful authentication options, use `DATA_SOURCE` option and define credential that you want to use to access storage.
50
50
51
51
52
52
## Security
@@ -112,15 +112,15 @@ You have three choices for input files that contain the target data for querying
112
112
113
113
-'CSV'- Includes any delimited text file with row/column separators. Any character can be used as a field separator, such as TSV: FIELDTERMINATOR = tab.
114
114
115
-
-'PARQUET'- Binary file in Parquet format
115
+
-'PARQUET'- Binary file in Parquet format.
116
116
117
-
-'DELTA'- A set of Parquet files organized in Delta Lake (preview) format
117
+
-'DELTA'- A set of Parquet files organized in Delta Lake (preview) format.
118
118
119
-
Values with blank spaces are not valid, e.g. 'CSV 'is not a valid value.
119
+
Values with blank spaces aren't valid. For example, 'CSV 'isn't a valid value.
120
120
121
121
**'unstructured_data_path'**
122
122
123
-
The unstructured_data_path that establishes a path to the data may be an absolute or relative path:
123
+
The unstructured_data_path that establishes a path to the data could be an absolute or relative path:
124
124
- Absolute pathin the format `\<prefix>://\<storage_account_path>/\<storage_path>` enables a user to directly read the files.
125
125
- Relative pathin the format `<storage_path>` that must be used with the `DATA_SOURCE` parameter and describes the file pattern within the <storage_account_path> location defined in`EXTERNAL DATA SOURCE`.
126
126
@@ -137,7 +137,7 @@ Below you'll find the relevant \<storage account path> values that will link to
137
137
138
138
'\<storage_path>'
139
139
140
-
Specifies a path within your storage that points to the folder or file you want to read. If the path points to a container or folder, all files will be read from that particular container or folder. Files in subfolders won't be included.
140
+
Specifies a path within your storage that points to the folder or file you want to read. If the path points to a container or folder, all files will be read from that particular container or folder. Files in subfolders won't be included.
141
141
142
142
You can use wildcards to target multiple files or folders. Usage of multiple nonconsecutive wildcards is allowed.
143
143
Below is an example that reads all *csv* files starting with *population*from all folders starting with */csv/population*:
@@ -168,7 +168,7 @@ The WITH clause allows you to specify columns that you want to read from files.
168
168
> Column names in Parquet and Delta Lake files are case sensitive. If you specify column name with casing different from column name casing in the files, the `NULL` values will be returned for that column.
169
169
170
170
171
-
column_name = Name for the output column. If provided, this name overrides the column name in the source file and column name provided in JSON path if there is one. If json_path is not provided, it will be automatically added as '$.column_name'. Check json_path argument for behavior.
171
+
column_name = Name for the output column. If provided, this name overrides the column name in the source file and column name provided in JSON path if there's one. If json_path isn't provided, it will be automatically added as '$.column_name'. Check json_path argument for behavior.
172
172
173
173
column_type = Data type for the output column. The implicit data type conversion will take place here.
174
174
@@ -196,7 +196,7 @@ Specifies the field terminator to be used. The default field terminator is a com
196
196
197
197
ROWTERMINATOR ='row_terminator'`
198
198
199
-
Specifies the row terminator to be used. If row terminator is not specified, one of default terminators will be used. Default terminators for PARSER_VERSION = '1.0' are \r\n, \n and \r. Default terminators for PARSER_VERSION = '2.0' are \r\n and \n.
199
+
Specifies the row terminator to be used. If row terminator isn't specified, one of default terminators will be used. Default terminators for PARSER_VERSION = '1.0' are \r\n, \n and \r. Default terminators for PARSER_VERSION = '2.0' are \r\n and \n.
200
200
201
201
> [!NOTE]
202
202
> When you use PARSER_VERSION='1.0' and specify \n (newline) as the row terminator, it will be automatically prefixed with a \r (carriage return) character, which results in a row terminator of \r\n.
@@ -228,7 +228,7 @@ Specifies parser version to be used when reading files. Currently supported CSV
228
228
- PARSER_VERSION = '1.0'
229
229
- PARSER_VERSION = '2.0'
230
230
231
-
CSV parser version 1.0 is default and feature rich. Version 2.0 is built for performance and does not support all options and encodings.
231
+
CSV parser version 1.0 is default and feature rich. Version 2.0 is built for performance and doesn't support all options and encodings.
232
232
233
233
CSV parser version 1.0 specifics:
234
234
@@ -243,7 +243,7 @@ CSV parser version 2.0 specifics:
243
243
- Maximum row size limit is 8 MB.
244
244
- Following options aren't supported: DATA_COMPRESSION.
245
245
- Quoted empty string ("") is interpreted as empty string.
246
-
- DATEFORMAT SET option is not honored.
246
+
- DATEFORMAT SET option isn't honored.
247
247
- Supported format for DATE data type: YYYY-MM-DD
248
248
- Supported format for TIME data type: HH:MM:SS[.fractional seconds]
249
249
- Supported format for DATETIME2 data type: YYYY-MM-DD HH:MM:SS[.fractional seconds]
@@ -263,7 +263,7 @@ Specifies the code page of the data in the data file. The default value is 65001
This option will disable the file modification check during the query execution, and read the files that are updated while the query is running. This is useful option when you need to read append-only files that are appended while the query is running. In the appendable files, the existing content is not updated, and only new rows are added. Therefore, the probability of wrong results is minimized compared to the updateable files. This option might enable you to read the frequently appended files without handling the errors. See more information in [querying appendable CSV files](query-single-csv-file.md#querying-appendable-files) section.
266
+
This option will disable the file modification check during the query execution, and read the files that are updated while the query is running. This is useful option when you need to read append-only files that are appended while the query is running. In the appendable files, the existing content isn't updated, and only new rows are added. Therefore, the probability of wrong results is minimized compared to the updateable files. This option might enable you to read the frequently appended files without handling the errors. See more information in [querying appendable CSV files](query-single-csv-file.md#querying-appendable-files) section.
267
267
268
268
Reject Options
269
269
@@ -311,10 +311,10 @@ Parquet files contain column metadata, which will be read, type mappings can be
311
311
312
312
For the CSV files, column names can be read from header row. You can specify whether header row exists using HEADER_ROW argument. If HEADER_ROW = FALSE, generic column names will be used: C1, C2, ... Cn where n is number of columns in file. Data types will be inferred from first 100 data rows. Check [reading CSV files without specifying schema](#read-csv-files-without-specifying-schema) for samples.
313
313
314
-
Have in mind that if you are reading number of files at once, the schema will be inferred from the first file service gets from the storage. This can mean that some of the columns expected are omitted, all because the file used by the service to define the schema did not contain these columns. In that case, please use OPENROWSET WITH clause.
314
+
Have in mind that if you're reading number of files at once, the schema will be inferred from the first file service gets from the storage. This can mean that some of the columns expected are omitted, all because the file used by the service to define the schema didn't contain these columns. In that case, use OPENROWSET WITH clause.
315
315
316
316
> [!IMPORTANT]
317
-
> There are cases when appropriate data type cannot be inferred due to lack of information and larger data type will be used instead. This brings performance overhead and is particularly important for character columns which will be inferred as varchar(8000). For optimal performance, please [check inferred data types](./best-practices-serverless-sql-pool.md#check-inferred-data-types) and [use appropriate data types](./best-practices-serverless-sql-pool.md#use-appropriate-data-types).
317
+
> There are cases when appropriate data type cannot be inferred due to lack of information and larger data type will be used instead. This brings performance overhead and is particularly important for character columns which will be inferred as varchar(8000). For optimal performance, [check inferred data types](./best-practices-serverless-sql-pool.md#check-inferred-data-types) and [use appropriate data types](./best-practices-serverless-sql-pool.md#use-appropriate-data-types).
0 commit comments