Skip to content

Commit 1e38d32

Browse files
author
Jill Grant
authored
Merge pull request #286573 from whhender/synapse-freshness-sept-2024
Synapse freshness sept 2024
2 parents 837fcff + fdcd542 commit 1e38d32

File tree

3 files changed

+22
-22
lines changed

3 files changed

+22
-22
lines changed

articles/synapse-analytics/spark/apache-spark-development-using-notebooks.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ description: In this article, you learn how to create and develop Synapse notebo
44
services: synapse analytics
55
author: JeneZhang
66
ms.service: azure-synapse-analytics
7-
ms.topic: conceptual
7+
ms.topic: how-to
88
ms.subservice: spark
9-
ms.date: 05/08/2021
9+
ms.date: 09/11/2024
1010
ms.author: jingzh
1111
ms.custom: devx-track-python
1212
---
@@ -26,7 +26,7 @@ This article describes how to use notebooks in Synapse Studio.
2626

2727
## Create a notebook
2828

29-
You can create a new notebook or import an existing notebook to a Synapse workspace from **Object Explorer**. Select **Develop**, right-click **Notebooks**, and then select **New notebook** or **Import**. Synapse notebooks recognize standard Jupyter Notebook IPYNB files.
29+
You can create a new notebook or import an existing notebook to a Synapse workspace from **Object Explorer**. Select the **Develop** menu. Select the **+** button and select **Notebook** or right-click **Notebooks**, and then select **New notebook** or **Import**. Synapse notebooks recognize standard Jupyter Notebook IPYNB files.
3030

3131
![Screenshot of selections for creating or importing a notebook.](./media/apache-spark-development-using-notebooks/synapse-create-import-notebook-2.png)
3232

@@ -188,7 +188,7 @@ To move a cell, select the left side of the cell and drag the cell to the desire
188188

189189
### <a name = "move-a-cell"></a>Copy a cell
190190

191-
To copy a cell, create a new cell, select all the text in your original cell, copy the text, and paste the text into the new cell. When your cell is in edit mode, traditional keyboard shortcuts to select all text are limited to the cell.
191+
To copy a cell, first create a new cell, then select all the text in your original cell, copy the text, and paste the text into the new cell. When your cell is in edit mode, traditional keyboard shortcuts to select all text are limited to the cell.
192192

193193
>[!TIP]
194194
>Synapse notebooks also provide [snippits](#code-snippets) of commonly used code patterns.
@@ -269,7 +269,7 @@ The `%run` magic command has these limitations:
269269
* The command supports nested calls but not recursive calls.
270270
* The command supports passing an absolute path or notebook name only as a parameter. It doesn't support relative paths.
271271
* The command currently supports only four parameter value types: `int`, `float`, `bool`, and `string`. It doesn't support variable replacement operations.
272-
* The referenced notebooks must be published. You need to publish the notebooks to reference them, unless you select the [option to enable an unpublished notebook reference](#reference-unpublished-notebook). Synapse Studio does not recognize the unpublished notebooks from the Git repo.
272+
* The referenced notebooks must be published. You need to publish the notebooks to reference them, unless you select the [option to enable an unpublished notebook reference](#reference-unpublished-notebook). Synapse Studio doesn't recognize the unpublished notebooks from the Git repo.
273273
* Referenced notebooks don't support statement depths larger than five.
274274

275275
### Use the variable explorer
@@ -299,7 +299,7 @@ The number of tasks for each job or stage helps you identify the parallel level
299299

300300
### <a name = "spark-session-configuration"></a>Configure a Spark session
301301

302-
On the **Configure session** pane, you can specify the timeout duration, the number of executors, and the size of executors to give to the current Spark session. Restart the Spark session for configuration changes to take effect. All cached notebook variables are cleared.
302+
On the **Configure session** pane, which you can find by selecting the gear icon at the top of the notebook, you can specify the timeout duration, the number of executors, and the size of executors to give to the current Spark session. Restart the Spark session for configuration changes to take effect. All cached notebook variables are cleared.
303303

304304
You can also create a configuration from the Apache Spark configuration or select an existing configuration. For details, refer to [Manage Apache Spark configuration](../../synapse-analytics/spark/apache-spark-azure-create-spark-configuration.md).
305305

Loading

articles/synapse-analytics/sql/develop-openrowset.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ author: filippopovic
55
ms.service: azure-synapse-analytics
66
ms.topic: overview
77
ms.subservice: sql
8-
ms.date: 03/23/2022
8+
ms.date: 09/11/2023
99
ms.author: fipopovi
1010
ms.reviewer: whhender
1111
---
@@ -21,7 +21,7 @@ The `OPENROWSET` function can be referenced in the `FROM` clause of a query as i
2121
2222
## Data source
2323

24-
OPENROWSET function in Synapse SQL reads the content of the file(s) from a data source. The data source is an Azure storage account and it can be explicitly referenced in the `OPENROWSET` function or can be dynamically inferred from URL of the files that you want to read.
24+
OPENROWSET function in Synapse SQL reads the content of the files from a data source. The data source is an Azure storage account and it can be explicitly referenced in the `OPENROWSET` function or can be dynamically inferred from URL of the files that you want to read.
2525
The `OPENROWSET` function can optionally contain a `DATA_SOURCE` parameter to specify the data source that contains files.
2626
- `OPENROWSET` without `DATA_SOURCE` can be used to directly read the contents of the files from the URL location specified as `BULK` option:
2727

@@ -31,7 +31,7 @@ The `OPENROWSET` function can optionally contain a `DATA_SOURCE` parameter to sp
3131
FORMAT = 'PARQUET') AS [file]
3232
```
3333

34-
This is a quick and easy way to read the content of the files without pre-configuration. This option enables you to use the basic authentication option to access the storage (Microsoft Entra passthrough for Microsoft Entra logins and SAS token for SQL logins).
34+
This is a quick and easy way to read the content of the files without preconfiguration. This option enables you to use the basic authentication option to access the storage (Microsoft Entra passthrough for Microsoft Entra logins and SAS token for SQL logins).
3535

3636
- `OPENROWSET` with `DATA_SOURCE` can be used to access files on specified storage account:
3737

@@ -46,7 +46,7 @@ This is a quick and easy way to read the content of the files without pre-config
4646
This option enables you to configure location of the storage account in the data source and specify the authentication method that should be used to access storage.
4747

4848
> [!IMPORTANT]
49-
> `OPENROWSET` without `DATA_SOURCE` provides quick and easy way to access the storage files but offers limited authentication options. As an example, Microsoft Entra principals can access files only using their [Microsoft Entra identity](develop-storage-files-storage-access-control.md?tabs=user-identity) or publicly available files. If you need more powerful authentication options, use `DATA_SOURCE` option and define credential that you want to use to access storage.
49+
> `OPENROWSET` without `DATA_SOURCE` provides quick and easy way to access the storage files but offers limited authentication options. As an example, Microsoft Entra principals can access files only using their [Microsoft Entra identity](develop-storage-files-storage-access-control.md?tabs=user-identity#supported-storage-authorization-types) or publicly available files. If you need more powerful authentication options, use `DATA_SOURCE` option and define credential that you want to use to access storage.
5050

5151

5252
## Security
@@ -112,15 +112,15 @@ You have three choices for input files that contain the target data for querying
112112

113113
- 'CSV' - Includes any delimited text file with row/column separators. Any character can be used as a field separator, such as TSV: FIELDTERMINATOR = tab.
114114

115-
- 'PARQUET' - Binary file in Parquet format
115+
- 'PARQUET' - Binary file in Parquet format.
116116

117-
- 'DELTA' - A set of Parquet files organized in Delta Lake (preview) format
117+
- 'DELTA' - A set of Parquet files organized in Delta Lake (preview) format.
118118

119-
Values with blank spaces are not valid, e.g. 'CSV ' is not a valid value.
119+
Values with blank spaces aren't valid. For example, 'CSV ' isn't a valid value.
120120

121121
**'unstructured_data_path'**
122122

123-
The unstructured_data_path that establishes a path to the data may be an absolute or relative path:
123+
The unstructured_data_path that establishes a path to the data could be an absolute or relative path:
124124
- Absolute path in the format `\<prefix>://\<storage_account_path>/\<storage_path>` enables a user to directly read the files.
125125
- Relative path in the format `<storage_path>` that must be used with the `DATA_SOURCE` parameter and describes the file pattern within the <storage_account_path> location defined in `EXTERNAL DATA SOURCE`.
126126

@@ -137,7 +137,7 @@ Below you'll find the relevant \<storage account path> values that will link to
137137
138138
'\<storage_path>'
139139
140-
Specifies a path within your storage that points to the folder or file you want to read. If the path points to a container or folder, all files will be read from that particular container or folder. Files in subfolders won't be included.
140+
Specifies a path within your storage that points to the folder or file you want to read. If the path points to a container or folder, all files will be read from that particular container or folder. Files in subfolders won't be included.
141141

142142
You can use wildcards to target multiple files or folders. Usage of multiple nonconsecutive wildcards is allowed.
143143
Below is an example that reads all *csv* files starting with *population* from all folders starting with */csv/population*:
@@ -168,7 +168,7 @@ The WITH clause allows you to specify columns that you want to read from files.
168168
> Column names in Parquet and Delta Lake files are case sensitive. If you specify column name with casing different from column name casing in the files, the `NULL` values will be returned for that column.
169169
170170
171-
column_name = Name for the output column. If provided, this name overrides the column name in the source file and column name provided in JSON path if there is one. If json_path is not provided, it will be automatically added as '$.column_name'. Check json_path argument for behavior.
171+
column_name = Name for the output column. If provided, this name overrides the column name in the source file and column name provided in JSON path if there's one. If json_path isn't provided, it will be automatically added as '$.column_name'. Check json_path argument for behavior.
172172
173173
column_type = Data type for the output column. The implicit data type conversion will take place here.
174174
@@ -196,7 +196,7 @@ Specifies the field terminator to be used. The default field terminator is a com
196196
197197
ROWTERMINATOR ='row_terminator'`
198198
199-
Specifies the row terminator to be used. If row terminator is not specified, one of default terminators will be used. Default terminators for PARSER_VERSION = '1.0' are \r\n, \n and \r. Default terminators for PARSER_VERSION = '2.0' are \r\n and \n.
199+
Specifies the row terminator to be used. If row terminator isn't specified, one of default terminators will be used. Default terminators for PARSER_VERSION = '1.0' are \r\n, \n and \r. Default terminators for PARSER_VERSION = '2.0' are \r\n and \n.
200200
201201
> [!NOTE]
202202
> When you use PARSER_VERSION='1.0' and specify \n (newline) as the row terminator, it will be automatically prefixed with a \r (carriage return) character, which results in a row terminator of \r\n.
@@ -228,7 +228,7 @@ Specifies parser version to be used when reading files. Currently supported CSV
228228
- PARSER_VERSION = '1.0'
229229
- PARSER_VERSION = '2.0'
230230
231-
CSV parser version 1.0 is default and feature rich. Version 2.0 is built for performance and does not support all options and encodings.
231+
CSV parser version 1.0 is default and feature rich. Version 2.0 is built for performance and doesn't support all options and encodings.
232232
233233
CSV parser version 1.0 specifics:
234234
@@ -243,7 +243,7 @@ CSV parser version 2.0 specifics:
243243
- Maximum row size limit is 8 MB.
244244
- Following options aren't supported: DATA_COMPRESSION.
245245
- Quoted empty string ("") is interpreted as empty string.
246-
- DATEFORMAT SET option is not honored.
246+
- DATEFORMAT SET option isn't honored.
247247
- Supported format for DATE data type: YYYY-MM-DD
248248
- Supported format for TIME data type: HH:MM:SS[.fractional seconds]
249249
- Supported format for DATETIME2 data type: YYYY-MM-DD HH:MM:SS[.fractional seconds]
@@ -263,7 +263,7 @@ Specifies the code page of the data in the data file. The default value is 65001
263263
264264
ROWSET_OPTIONS = '{"READ_OPTIONS":["ALLOW_INCONSISTENT_READS"]}'
265265
266-
This option will disable the file modification check during the query execution, and read the files that are updated while the query is running. This is useful option when you need to read append-only files that are appended while the query is running. In the appendable files, the existing content is not updated, and only new rows are added. Therefore, the probability of wrong results is minimized compared to the updateable files. This option might enable you to read the frequently appended files without handling the errors. See more information in [querying appendable CSV files](query-single-csv-file.md#querying-appendable-files) section.
266+
This option will disable the file modification check during the query execution, and read the files that are updated while the query is running. This is useful option when you need to read append-only files that are appended while the query is running. In the appendable files, the existing content isn't updated, and only new rows are added. Therefore, the probability of wrong results is minimized compared to the updateable files. This option might enable you to read the frequently appended files without handling the errors. See more information in [querying appendable CSV files](query-single-csv-file.md#querying-appendable-files) section.
267267
268268
Reject Options
269269
@@ -311,10 +311,10 @@ Parquet files contain column metadata, which will be read, type mappings can be
311311
312312
For the CSV files, column names can be read from header row. You can specify whether header row exists using HEADER_ROW argument. If HEADER_ROW = FALSE, generic column names will be used: C1, C2, ... Cn where n is number of columns in file. Data types will be inferred from first 100 data rows. Check [reading CSV files without specifying schema](#read-csv-files-without-specifying-schema) for samples.
313313
314-
Have in mind that if you are reading number of files at once, the schema will be inferred from the first file service gets from the storage. This can mean that some of the columns expected are omitted, all because the file used by the service to define the schema did not contain these columns. In that case, please use OPENROWSET WITH clause.
314+
Have in mind that if you're reading number of files at once, the schema will be inferred from the first file service gets from the storage. This can mean that some of the columns expected are omitted, all because the file used by the service to define the schema didn't contain these columns. In that case, use OPENROWSET WITH clause.
315315
316316
> [!IMPORTANT]
317-
> There are cases when appropriate data type cannot be inferred due to lack of information and larger data type will be used instead. This brings performance overhead and is particularly important for character columns which will be inferred as varchar(8000). For optimal performance, please [check inferred data types](./best-practices-serverless-sql-pool.md#check-inferred-data-types) and [use appropriate data types](./best-practices-serverless-sql-pool.md#use-appropriate-data-types).
317+
> There are cases when appropriate data type cannot be inferred due to lack of information and larger data type will be used instead. This brings performance overhead and is particularly important for character columns which will be inferred as varchar(8000). For optimal performance, [check inferred data types](./best-practices-serverless-sql-pool.md#check-inferred-data-types) and [use appropriate data types](./best-practices-serverless-sql-pool.md#use-appropriate-data-types).
318318
319319
### Type mapping for Parquet
320320

0 commit comments

Comments
 (0)