Skip to content

Commit 9290feb

Browse files
Merge pull request #269840 from jonburchel/patch-39
Adds reference to SQL isolation levels to Source Options section
2 parents b3655b1 + 28ff58f commit 9290feb

File tree

1 file changed

+17
-17
lines changed

1 file changed

+17
-17
lines changed

articles/data-factory/data-flow-source.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Source transformation in mapping data flow
2+
title: Source transformation in mapping data flows
33
titleSuffix: Azure Data Factory & Azure Synapse
44
description: Learn how to set up a source transformation in a mapping data flow in Azure Data Factory or Azure Synapse Analytics pipelines.
55
author: kromerm
@@ -10,7 +10,7 @@ ms.topic: conceptual
1010
ms.date: 10/20/2023
1111
---
1212

13-
# Source transformation in mapping data flow
13+
# Source transformation in mapping data flows
1414

1515
[!INCLUDE[appliesto-adf-asa-md](includes/appliesto-adf-asa-md.md)]
1616

@@ -34,18 +34,18 @@ To use an inline dataset, select the format you want in the **Source type** sele
3434

3535
### Schema options
3636

37-
Because an inline dataset is defined inside the data flow, there is not a defined schema associated with the inline dataset. On the Projection tab, you can import the source data schema and store that schema as your source projection. On this tab, you will find a "Schema options" button that allows you to define the behavior of ADF's schema discovery service.
37+
Because an inline dataset is defined inside the data flow, there isn't a defined schema associated with the inline dataset. On the Projection tab, you can import the source data schema and store that schema as your source projection. On this tab, you find a "Schema options" button that allows you to define the behavior of ADF's schema discovery service.
3838

39-
* Use projected schema: This option is useful when you have a large number of source files that ADF will scan as your source. ADF's default behavior is to discover the schema of every source file. But if you have a pre-defined projection already stored in your source transformation, you can set this to true and ADF will skip auto-discovery of every schema. With this option turned on, the source transformation can read all files in a much faster manner, applying the pre-defined schema to every file.
40-
* Allow schema drift: Turn on schema drift so that your data flow will allow new columns that are not already defined in the source schema.
41-
* Validate schema: Setting this option will cause data flow to fail if any column and type defined in the projection does not match the discovered schema of the source data.
42-
* Infer drifted column types: When new drifted columns are identified by ADF, those new columns will be cast to the appropriate data type using ADF's automatic type inference.
39+
* Use projected schema: This option is useful when you have a large number of source files that ADF scans as your source. ADF's default behavior is to discover the schema of every source file. But if you have a pre-defined projection already stored in your source transformation, you can set this to true and ADF skips auto-discovery of every schema. With this option turned on, the source transformation can read all files in a much faster manner, applying the pre-defined schema to every file.
40+
* Allow schema drift: Turn on schema drift so that your data flow allows new columns that aren't already defined in the source schema.
41+
* Validate schema: Setting this option causes the data flow to fail if any column and type defined in the projection doesn't match the discovered schema of the source data.
42+
* Infer drifted column types: When new drifted columns are identified by ADF, those new columns are cast to the appropriate data type using ADF's automatic type inference.
4343

4444
:::image type="content" source="media/data-flow/inline-selector.png" alt-text="Screenshot that shows Inline selected.":::
4545

4646
## Workspace DB (Synapse workspaces only)
4747

48-
In Azure Synapse workspaces, an additional option is present in data flow source transformations called ```Workspace DB```. This will allow you to directly pick a workspace database of any available type as your source data without requiring additional linked services or datasets. The databases created through the [Azure Synapse database templates](../synapse-analytics/database-designer/overview-database-templates.md) are also accessible when you select Workspace DB.
48+
In Azure Synapse workspaces, an additional option is present in data flow source transformations called ```Workspace DB```. This allows you to directly pick a workspace database of any available type as your source data without requiring additional linked services or datasets. The databases created through the [Azure Synapse database templates](../synapse-analytics/database-designer/overview-database-templates.md) are also accessible when you select Workspace DB.
4949

5050
:::image type="content" source="media/data-flow/syms-source.png" alt-text="Screenshot that shows workspacedb selected.":::
5151

@@ -104,11 +104,11 @@ Development values for dataset parameters can be configured in [debug settings](
104104

105105
**Schema drift**: [Schema drift](concepts-data-flow-schema-drift.md) is the ability of the service to natively handle flexible schemas in your data flows without needing to explicitly define column changes.
106106

107-
* Select the **Allow schema drift** check box if the source columns will change often. This setting allows all incoming source fields to flow through the transformations to the sink.
107+
* Select the **Allow schema drift** check box if the source columns change often. This setting allows all incoming source fields to flow through the transformations to the sink.
108108

109-
* Selecting **Infer drifted column types** instructs the service to detect and define data types for each new column discovered. With this feature turned off, all drifted columns will be of type string.
109+
* Selecting **Infer drifted column types** instructs the service to detect and define data types for each new column discovered. With this feature turned off, all drifted columns are of type string.
110110

111-
**Validate schema:** If **Validate schema** is selected, the data flow will fail to run if the incoming source data doesn't match the defined schema of the dataset.
111+
**Validate schema:** If **Validate schema** is selected, the data flow fails to run if the incoming source data doesn't match the defined schema of the dataset.
112112

113113
**Skip line count**: The **Skip line count** field specifies how many lines to ignore at the beginning of the dataset.
114114

@@ -117,35 +117,35 @@ Development values for dataset parameters can be configured in [debug settings](
117117
To validate your source is configured correctly, turn on debug mode and fetch a data preview. For more information, see [Debug mode](concepts-data-flow-debug-mode.md).
118118

119119
> [!NOTE]
120-
> When debug mode is turned on, the row limit configuration in debug settings will overwrite the sampling setting in the source during data preview.
120+
> When debug mode is turned on, the row limit configuration in debug settings overwrite the sampling setting in the source during data preview.
121121
122122
## Source options
123123

124-
The **Source options** tab contains settings specific to the connector and format chosen. For more information and examples, see the relevant [connector documentation](#supported-sources).
124+
The **Source options** tab contains settings specific to the connector and format chosen. For more information and examples, see the relevant [connector documentation](#supported-sources). This includes details like isolation level for those data sources that support it (like on-premises SQL Servers, Azure SQL Databases, and Azure SQL Managed instances), and other data source specific settings as well.
125125

126126
## Projection
127127

128128
Like schemas in datasets, the projection in a source defines the data columns, types, and formats from the source data. For most dataset types, such as SQL and Parquet, the projection in a source is fixed to reflect the schema defined in a dataset. When your source files aren't strongly typed (for example, flat .csv files rather than Parquet files), you can define the data types for each field in the source transformation.
129129

130130
:::image type="content" source="media/data-flow/source-3.png" alt-text="Screenshot that shows settings on the Projection tab.":::
131131

132-
If your text file has no defined schema, select **Detect data type** so that the service will sample and infer the data types. Select **Define default format** to autodetect the default data formats.
132+
If your text file has no defined schema, select **Detect data type** so that the service samples and infers the data types. Select **Define default format** to autodetect the default data formats.
133133

134134
**Reset schema** resets the projection to what is defined in the referenced dataset.
135135

136136
**Overwrite schema** allows you to modify the projected data types here the source, overwriting the schema-defined data types. You can alternatively modify the column data types in a downstream derived-column transformation. Use a select transformation to modify the column names.
137137

138138
### Import schema
139139

140-
Select the **Import schema** button on the **Projection** tab to use an active debug cluster to create a schema projection. It's available in every source type. Importing the schema here will override the projection defined in the dataset. The dataset object won't be changed.
140+
Select the **Import schema** button on the **Projection** tab to use an active debug cluster to create a schema projection. It's available in every source type. Importing the schema here overrides the projection defined in the dataset. The dataset object won't be changed.
141141

142142
Importing schema is useful in datasets like Avro and Azure Cosmos DB that support complex data structures that don't require schema definitions to exist in the dataset. For inline datasets, importing schema is the only way to reference column metadata without schema drift.
143143

144144
## Optimize the source transformation
145145

146-
The **Optimize** tab allows for editing of partition information at each transformation step. In most cases, **Use current partitioning** will optimize for the ideal partitioning structure for a source.
146+
The **Optimize** tab allows for editing of partition information at each transformation step. In most cases, **Use current partitioning** optimizes for the ideal partitioning structure for a source.
147147

148-
If you're reading from an Azure SQL Database source, custom **Source** partitioning will likely read data the fastest. The service will read large queries by making connections to your database in parallel. This source partitioning can be done on a column or by using a query.
148+
If you're reading from an Azure SQL Database source, custom **Source** partitioning likely reads data the fastest. The service reads large queries by making connections to your database in parallel. This source partitioning can be done on a column or by using a query.
149149

150150
:::image type="content" source="media/data-flow/sourcepart3.png" alt-text="Screenshot that shows the Source partition settings.":::
151151

0 commit comments

Comments
 (0)