Skip to content

Commit 88b8337

Browse files
committed
moving files
1 parent 2eb8f17 commit 88b8337

File tree

45 files changed

+1594
-1546
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+1594
-1546
lines changed

articles/data-factory/connector-azure-sql-data-warehouse.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -526,7 +526,7 @@ To use this feature, create an [Azure Blob Storage linked service](connector-azu
526526
527527
### Best practices for using PolyBase
528528
529-
The following sections provide best practices in addition to those mentioned in [Best practices for Azure Synapse Analytics](../sql-data-warehouse/sql-data-warehouse-best-practices.md).
529+
The following sections provide best practices in addition to those mentioned in [Best practices for Azure Synapse Analytics](../synapse-analytics/sql-data-warehouse/sql-data-warehouse-best-practices.md).
530530
531531
#### Required database permission
532532

articles/data-factory/v1/data-factory-azure-sql-data-warehouse-connector.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -256,7 +256,7 @@ To use this feature, create an [Azure Storage linked service](data-factory-azure
256256
```
257257

258258
## Best practices when using PolyBase
259-
The following sections provide additional best practices to the ones that are mentioned in [Best practices for Azure SQL Data Warehouse](../../sql-data-warehouse/sql-data-warehouse-best-practices.md).
259+
The following sections provide additional best practices to the ones that are mentioned in [Best practices for Azure SQL Data Warehouse](../../synapse-analytics/sql-data-warehouse/sql-data-warehouse-best-practices.md).
260260

261261
### Required database permission
262262
To use PolyBase, it requires the user being used to load data into SQL Data Warehouse has the ["CONTROL" permission](https://msdn.microsoft.com/library/ms191291.aspx) on the target database. One way to achieve that is to add that user as a member of "db_owner" role. Learn how to do that by following [this section](../../synapse-analytics/sql-data-warehouse/sql-data-warehouse-overview-manage-security.md#authorization).

articles/data-factory/v1/data-factory-load-sql-data-warehouse.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,7 @@ Here are a few best practices for running your Azure SQL Data Warehouse database
211211
* For faster load speeds, consider using heap for transient data.
212212
* Create statistics after you finish loading Azure SQL Data Warehouse.
213213

214-
See [Best practices for Azure SQL Data Warehouse](../../sql-data-warehouse/sql-data-warehouse-best-practices.md) for details.
214+
See [Best practices for Azure SQL Data Warehouse](../../synapse-analytics/sql-data-warehouse/sql-data-warehouse-best-practices.md) for details.
215215

216216
## Next steps
217217
* [Data Factory Copy Wizard](data-factory-copy-wizard.md) - This article provides details about the Copy Wizard.

articles/machine-learning/team-data-science-process/sqldw-walkthrough.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ Follow the documentation at [Create and query an Azure SQL Data Warehouse in the
7878

7979
**Install Visual Studio and SQL Server Data Tools.** For instructions, see [Getting started with Visual Studio 2019 for SQL Data Warehouse](../../synapse-analytics/sql-data-warehouse/sql-data-warehouse-install-visual-studio.md).
8080

81-
**Connect to your Azure Synapse Analytics with Visual Studio.** For instructions, see steps 1 & 2 in [Connect to Azure SQL Data Warehouse](../../sql-data-warehouse/sql-data-warehouse-connect-overview.md).
81+
**Connect to your Azure Synapse Analytics with Visual Studio.** For instructions, see steps 1 & 2 in [Connect to Azure SQL Data Warehouse](../../synapse-analytics/sql-data-warehouse/sql-data-warehouse-connect-overview.md).
8282

8383
> [!NOTE]
8484
> Run the following SQL query on the database you created in your Azure Synapse Analytics (instead of the query provided in step 3 of the connect topic,) to **create a master key**.
Lines changed: 2 additions & 152 deletions
Original file line numberDiff line numberDiff line change
@@ -1,154 +1,4 @@
11
---
2-
title: Instead of ETL, design ELT
3-
description: Implement flexible data loading strategies for SQL Analytics within Azure Synapse Analytics
4-
services: sql-data-warehouse
5-
author: kevinvngo
6-
manager: craigg
7-
ms.service: sql-data-warehouse
8-
ms.topic: conceptual
9-
ms.subservice: load-data
10-
ms.date: 02/19/2020
11-
ms.author: kevin
12-
ms.reviewer: igorstan
13-
ms.custom: azure-synapse
2+
redirect_url: /azure/synapse-analytics/sql-data-warehouse/design-elt-data-loading
3+
redirect_document_id: true
144
---
15-
16-
# Data loading strategies for data warehousing
17-
18-
Traditional SMP data warehouses use an Extract, Transform, and Load (ETL) process for loading data. SQL pools in Azure Synapse Analytics have a massively parallel processing (MPP) architecture that takes advantage of the scalability and flexibility of compute and storage resources. Utilizing an Extract, Load, and Transform (ELT) process can take advantage of MPP and eliminate resources needed to transform the data prior to loading. While SQL pools support many loading methods including popular SQL Server options such as BCP and the SQL BulkCopy API, the fastest and most scalable way to load data is through PolyBase external tables and the [COPY statement](/sql/t-sql/statements/copy-into-transact-sql?view=azure-sqldw-latest) (preview). With PolyBase and the COPY statement, you can access external data stored in Azure Blob storage or Azure Data Lake Store via the T-SQL language. For the most flexibility when loading, we recommend using the COPY statement.
19-
20-
> [!NOTE]
21-
> The COPY statement is currently in public preview. To provide feedback, send email to the following distribution list: [email protected].
22-
23-
24-
> [!VIDEO https://www.youtube.com/embed/l9-wP7OdhDk]
25-
26-
27-
## What is ELT?
28-
29-
Extract, Load, and Transform (ELT) is a process by which data is extracted from a source system, loaded into a data warehouse, and then transformed.
30-
31-
The basic steps for implementing ELT are:
32-
33-
1. Extract the source data into text files.
34-
2. Land the data into Azure Blob storage or Azure Data Lake Store.
35-
3. Prepare the data for loading.
36-
4. Load the data into staging tables with PolyBase or the COPY command.
37-
5. Transform the data.
38-
6. Insert the data into production tables.
39-
40-
41-
For a PolyBase loading tutorial, see [Use PolyBase to load data from Azure blob storage](../synapse-analytics/sql-data-warehouse/load-data-from-azure-blob-storage-using-polybase.md).
42-
43-
For more information, see [Loading patterns blog](https://blogs.msdn.microsoft.com/sqlcat/20../../azure-sql-data-warehouse-loading-patterns-and-strategies/).
44-
45-
46-
## 1. Extract the source data into text files
47-
48-
Getting data out of your source system depends on the storage location. The goal is to move the data into PolyBase and the COPY supported delimited text or CSV files.
49-
50-
### PolyBase and COPY external file formats
51-
52-
With PolyBase and the COPY statement, you can load data from UTF-8 and UTF-16 encoded delimited text or CSV files. In addition to delimited text or CSV files, it loads from the Hadoop file formats such as ORC and Parquet. PolyBase and the COPY statement can also load data from Gzip and Snappy compressed files. Extended ASCII, fixed-width format, and nested formats such as WinZip or XML are not supported. If you are exporting from SQL Server, you can use the [bcp command-line tool](/sql/tools/bcp-utility?view=azure-sqldw-latest) to export the data into delimited text files.
53-
54-
## 2. Land the data into Azure Blob storage or Azure Data Lake Store
55-
56-
To land the data in Azure storage, you can move it to [Azure Blob storage](../storage/blobs/storage-blobs-introduction.md) or [Azure Data Lake Store Gen2](../data-lake-store/data-lake-store-overview.md). In either location, the data should be stored in text files. PolyBase and the COPY statement can load from either location.
57-
58-
Tools and services you can use to move data to Azure Storage:
59-
60-
- [Azure ExpressRoute](../expressroute/expressroute-introduction.md) service enhances network throughput, performance, and predictability. ExpressRoute is a service that routes your data through a dedicated private connection to Azure. ExpressRoute connections do not route data through the public internet. The connections offer more reliability, faster speeds, lower latencies, and higher security than typical connections over the public internet.
61-
- [AZCopy utility](../storage/common/storage-moving-data.md) moves data to Azure Storage over the public internet. This works if your data sizes are less than 10 TB. To perform loads on a regular basis with AZCopy, test the network speed to see if it is acceptable.
62-
- [Azure Data Factory (ADF)](../data-factory/introduction.md) has a gateway that you can install on your local server. Then you can create a pipeline to move data from your local server up to Azure Storage. To use Data Factory with SQL Analytics, see [Loading data for SQL Analytics](/azure/data-factory/load-azure-sql-data-warehouse).
63-
64-
65-
## 3. Prepare the data for loading
66-
67-
You might need to prepare and clean the data in your storage account before loading. Data preparation can be performed while your data is in the source, as you export the data to text files, or after the data is in Azure Storage. It is easiest to work with the data as early in the process as possible.
68-
69-
### Define external tables
70-
71-
If you are using PolyBase, you need to define external tables in your data warehouse before loading. External tables are not required by the COPY statement. PolyBase uses external tables to define and access the data in Azure Storage. An external table is similar to a database view. The external table contains the table schema and points to data that is stored outside the data warehouse.
72-
73-
Defining external tables involves specifying the data source, the format of the text files, and the table definitions. T-SQL syntax topics that you will need are:
74-
- [CREATE EXTERNAL DATA SOURCE](/sql/t-sql/statements/create-external-data-source-transact-sql?view=azure-sqldw-latest)
75-
- [CREATE EXTERNAL FILE FORMAT](/sql/t-sql/statements/create-external-file-format-transact-sql?view=azure-sqldw-latest)
76-
- [CREATE EXTERNAL TABLE](/sql/t-sql/statements/create-external-table-transact-sql?view=azure-sqldw-latest)
77-
78-
When loading Parquet, the SQL data type mapping is:
79-
80-
| **Parquet Data Type** | **SQL Data Type** |
81-
| :-------------------: | :---------------: |
82-
| tinyint | tinyint |
83-
| smallint | smallint |
84-
| int | int |
85-
| bigint | bigint |
86-
| boolean | bit |
87-
| double | float |
88-
| float | real |
89-
| double | money |
90-
| double | smallmoney |
91-
| string | nchar |
92-
| string | nvarchar |
93-
| string | char |
94-
| string | varchar |
95-
| binary | binary |
96-
| binary | varbinary |
97-
| timestamp | date |
98-
| timestamp | smalldatetime |
99-
| timestamp | datetime2 |
100-
| timestamp | datetime |
101-
| timestamp | time |
102-
| date | date |
103-
| decimal | decimal |
104-
105-
For an example of creating external objects, see the [Create external tables](../synapse-analytics/sql-data-warehouse/load-data-from-azure-blob-storage-using-polybase.md#create-external-tables-for-the-sample-data) step in the loading tutorial.
106-
107-
### Format text files
108-
109-
If you are using PolyBase, the external objects defined need to align the rows of the text files with the external table and file format definition. The data in each row of the text file must align with the table definition.
110-
To format the text files:
111-
112-
- If your data is coming from a non-relational source, you need to transform it into rows and columns. Whether the data is from a relational or non-relational source, the data must be transformed to align with the column definitions for the table into which you plan to load the data.
113-
- Format data in the text file to align with the columns and data types in the destination table. Misalignment between data types in the external text files and the data warehouse table causes rows to be rejected during the load.
114-
- Separate fields in the text file with a terminator. Be sure to use a character or a character sequence that is not found in your source data. Use the terminator you specified with [CREATE EXTERNAL FILE FORMAT](/sql/t-sql/statements/create-external-file-format-transact-sql).
115-
116-
117-
## 4. Load the data using PolyBase or the COPY statement
118-
119-
It is best practice to load data into a staging table. Staging tables allow you to handle errors without interfering with the production tables. A staging table also gives you the opportunity to use the SQL pool MPP for data transformations before inserting the data into production tables. The table will need to be pre-created when loading into a staging table with COPY.
120-
121-
### Options for loading with PolyBase and COPY statement
122-
123-
To load data with PolyBase, you can use any of these loading options:
124-
125-
- [PolyBase with T-SQL](../synapse-analytics/sql-data-warehouse/load-data-from-azure-blob-storage-using-polybase.md) works well when your data is in Azure Blob storage or Azure Data Lake Store. It gives you the most control over the loading process, but also requires you to define external data objects. The other methods define these objects behind the scenes as you map source tables to destination tables. To orchestrate T-SQL loads, you can use Azure Data Factory, SSIS, or Azure functions.
126-
- [PolyBase with SSIS](/sql/integration-services/load-data-to-sql-data-warehouse) works well when your source data is in SQL Server, either SQL Server on-premises or in the cloud. SSIS defines the source to destination table mappings, and also orchestrates the load. If you already have SSIS packages, you can modify the packages to work with the new data warehouse destination.
127-
- [PolyBase and COPY statement with Azure Data Factory (ADF)](sql-data-warehouse-load-with-data-factory.md) is another orchestration tool. It defines a pipeline and schedules jobs.
128-
- [PolyBase with Azure Databricks](../azure-databricks/databricks-extract-load-sql-data-warehouse.md) transfers data from a table to a Databricks dataframe and/or writes data from a Databricks dataframe to a table using PolyBase.
129-
130-
### Other loading options
131-
132-
In addition to PolyBase and the COPY statement, you can use [bcp](/sql/tools/bcp-utility?view=azure-sqldw-latest) or the [SQLBulkCopy API](https://msdn.microsoft.com/library/system.data.sqlclient.sqlbulkcopy.aspx). bcp loads directly to the database without going through Azure Blob storage, and is intended only for small loads. Note, the load performance of these options is slower than PolyBase and the COPY statement.
133-
134-
135-
## 5. Transform the data
136-
137-
While data is in the staging table, perform transformations that your workload requires. Then move the data into a production table.
138-
139-
140-
## 6. Insert the data into production tables
141-
142-
The INSERT INTO ... SELECT statement moves the data from the staging table to the permanent table.
143-
144-
As you design an ETL process, try running the process on a small test sample. Try extracting 1000 rows from the table to a file, move it to Azure, and then try loading it into a staging table.
145-
146-
147-
## Partner loading solutions
148-
149-
Many of our partners have loading solutions. To find out more, see a list of our [solution partners](../synapse-analytics/sql-data-warehouse/sql-data-warehouse-partner-business-intelligence.md).
150-
151-
152-
## Next steps
153-
154-
For loading guidance, see [Guidance for loading data](guidance-for-loading-data.md).

0 commit comments

Comments
 (0)