Skip to content

Commit 2d31f46

Browse files
authored
Create quickstart for data loading
1 parent 5439933 commit 2d31f46

File tree

1 file changed

+110
-0
lines changed

1 file changed

+110
-0
lines changed
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
---
2+
title: 'Quickstart: Bulk load data using a single T-SQL statement'
3+
description: Bulk load data using the COPY statement
4+
services: synapse-analytics
5+
author: kevinvngo
6+
manager: craigg
7+
ms.service: synapse-analytics
8+
ms.topic: quickstart
9+
ms.subservice:
10+
ms.date: 04/08/2020
11+
ms.author: kevin
12+
ms.reviewer: jrasnick
13+
ms.custom: azure-synapse
14+
---
15+
16+
# Quickstart: Bulk load data using the COPY statement
17+
18+
In this quickstart, you'll bulk load data into your SQL pool using the simple and flexible [COPY statement](https://docs.microsoft.com/sql/t-sql/statements/copy-into-transact-sql?view=azure-sqldw-latest) for high-throughput data ingestion. The COPY statement is the recommended loading utility as it enables you to seamlessly and flexibly load data by providing functionality to:
19+
20+
- Allow lower privileged users to load without needing strict CONTROL permissions on the data warehouse
21+
- Leverage only a single T-SQL statement without having to create any additional database objects
22+
- Leverage a finer permission model without exposing storage account keys using Share Access Signatures (SAS)
23+
- Specify a different storage account for the ERRORFILE location (REJECTED_ROW_LOCATION)
24+
- Customize default values for each target column and specify source data fields to load into specific target columns
25+
- Specify a custom row terminator for CSV files
26+
- Escape string, field, and row delimiters for CSV files
27+
- Leverage SQL Server Date formats for CSV files
28+
- Specify wildcards and multiple files in the storage location path
29+
30+
## Prerequisites
31+
32+
This quickstart assumes you already have a SQL pool. If a SQL pool has not been created, use the [Create and Connect - portal](create-data-warehouse-portal.md) quickstart.
33+
34+
## Create the target table
35+
36+
In this example, we will be loading data from the New York taxi dataset. We will be loading a table called Trip which represents taxi trips taken within a single year. Run the following to create the table:
37+
38+
```sql
39+
CREATE TABLE [dbo].[Trip]
40+
(
41+
[DateID] int NOT NULL,
42+
[MedallionID] int NOT NULL,
43+
[HackneyLicenseID] int NOT NULL,
44+
[PickupTimeID] int NOT NULL,
45+
[DropoffTimeID] int NOT NULL,
46+
[PickupGeographyID] int NULL,
47+
[DropoffGeographyID] int NULL,
48+
[PickupLatitude] float NULL,
49+
[PickupLongitude] float NULL,
50+
[PickupLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
51+
[DropoffLatitude] float NULL,
52+
[DropoffLongitude] float NULL,
53+
[DropoffLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
54+
[PassengerCount] int NULL,
55+
[TripDurationSeconds] int NULL,
56+
[TripDistanceMiles] float NULL,
57+
[PaymentType] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
58+
[FareAmount] money NULL,
59+
[SurchargeAmount] money NULL,
60+
[TaxAmount] money NULL,
61+
[TipAmount] money NULL,
62+
[TollsAmount] money NULL,
63+
[TotalAmount] money NULL
64+
)
65+
WITH
66+
(
67+
DISTRIBUTION = ROUND_ROBIN,
68+
CLUSTERED COLUMNSTORE INDEX
69+
);
70+
```
71+
72+
## Run the COPY statement
73+
74+
Run the following COPY statement which will load data from the Azure blob storage account into the Trip table.
75+
76+
```sql
77+
COPY INTO [dbo].[Trip] FROM 'https://nytaxiblob.blob.core.windows.net/2013/Trip2013/'
78+
WITH (
79+
FIELDTERMINATOR='|',
80+
ROWTERMINATOR='0x0A'
81+
) OPTION (LABEL = 'COPY: dbo.trip');
82+
```
83+
84+
## Monitor the load
85+
86+
Check whether your load is making progress by periodically running the following query:
87+
88+
```sql
89+
SELECT r.[request_id]
90+
, r.[status]
91+
, r.resource_class
92+
, r.command
93+
, sum(bytes_processed) AS bytes_processed
94+
, sum(rows_processed) AS rows_processed
95+
FROM sys.dm_pdw_exec_requests r
96+
JOIN sys.dm_pdw_dms_workers w
97+
ON r.[request_id] = w.request_id
98+
WHERE [label] = 'COPY: dbo.trip' and session_id <> session_id() and type = 'WRITER'
99+
GROUP BY r.[request_id]
100+
, r.[status]
101+
, r.resource_class
102+
, r.command
103+
, [type];
104+
105+
```
106+
107+
## Next steps
108+
109+
- For best practices on data loading, see [Best Practices for Loading Data](https://docs.microsoft.com/azure/synapse-analytics/sql-data-warehouse/guidance-for-loading-data).
110+
- For information on how to manage the resources for your data loads, see [Workload Isolation](https://docs.microsoft.com/azure/synapse-analytics/sql-data-warehouse/quickstart-configure-workload-isolation-tsql).

0 commit comments

Comments
 (0)