Skip to content

Commit 26ff709

Browse files
authored
Merge pull request #110769 from kevinvngo/patch-148
Updated TOC with data load quickstart
2 parents a8a8e58 + 004fe67 commit 26ff709

File tree

2 files changed

+119
-6
lines changed

2 files changed

+119
-6
lines changed
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
---
2+
title: 'Quickstart: Bulk load data using a single T-SQL statement'
3+
description: Bulk load data using the COPY statement
4+
services: synapse-analytics
5+
author: kevinvngo
6+
manager: craigg
7+
ms.service: synapse-analytics
8+
ms.topic: quickstart
9+
ms.subservice:
10+
ms.date: 04/08/2020
11+
ms.author: kevin
12+
ms.reviewer: jrasnick
13+
ms.custom: azure-synapse
14+
---
15+
16+
# Quickstart: Bulk load data using the COPY statement
17+
18+
In this quickstart, you'll bulk load data into your SQL pool using the simple and flexible [COPY statement](https://docs.microsoft.com/sql/t-sql/statements/copy-into-transact-sql?view=azure-sqldw-latest) for high-throughput data ingestion. The COPY statement is the recommended loading utility as it enables you to seamlessly and flexibly load data by providing functionality to:
19+
20+
- Allow lower privileged users to load without needing strict CONTROL permissions on the data warehouse
21+
- Leverage only a single T-SQL statement without having to create any additional database objects
22+
- Leverage a finer permission model without exposing storage account keys using Share Access Signatures (SAS)
23+
- Specify a different storage account for the ERRORFILE location (REJECTED_ROW_LOCATION)
24+
- Customize default values for each target column and specify source data fields to load into specific target columns
25+
- Specify a custom row terminator for CSV files
26+
- Escape string, field, and row delimiters for CSV files
27+
- Leverage SQL Server Date formats for CSV files
28+
- Specify wildcards and multiple files in the storage location path
29+
30+
## Prerequisites
31+
32+
This quickstart assumes you already have a SQL pool. If a SQL pool hasn't been created, use the [Create and Connect - portal](create-data-warehouse-portal.md) quickstart.
33+
34+
## Create the target table
35+
36+
In this example, we'll be loading data from the New York taxi dataset. we'll load a table called Trip that represents taxi trips taken within a single year. Run the following to create the table:
37+
38+
```sql
39+
CREATE TABLE [dbo].[Trip]
40+
(
41+
[DateID] int NOT NULL,
42+
[MedallionID] int NOT NULL,
43+
[HackneyLicenseID] int NOT NULL,
44+
[PickupTimeID] int NOT NULL,
45+
[DropoffTimeID] int NOT NULL,
46+
[PickupGeographyID] int NULL,
47+
[DropoffGeographyID] int NULL,
48+
[PickupLatitude] float NULL,
49+
[PickupLongitude] float NULL,
50+
[PickupLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
51+
[DropoffLatitude] float NULL,
52+
[DropoffLongitude] float NULL,
53+
[DropoffLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
54+
[PassengerCount] int NULL,
55+
[TripDurationSeconds] int NULL,
56+
[TripDistanceMiles] float NULL,
57+
[PaymentType] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
58+
[FareAmount] money NULL,
59+
[SurchargeAmount] money NULL,
60+
[TaxAmount] money NULL,
61+
[TipAmount] money NULL,
62+
[TollsAmount] money NULL,
63+
[TotalAmount] money NULL
64+
)
65+
WITH
66+
(
67+
DISTRIBUTION = ROUND_ROBIN,
68+
CLUSTERED COLUMNSTORE INDEX
69+
);
70+
```
71+
72+
## Run the COPY statement
73+
74+
Run the following COPY statement that will load data from the Azure blob storage account into the Trip table.
75+
76+
```sql
77+
COPY INTO [dbo].[Trip] FROM 'https://nytaxiblob.blob.core.windows.net/2013/Trip2013/'
78+
WITH (
79+
FIELDTERMINATOR='|',
80+
ROWTERMINATOR='0x0A'
81+
) OPTION (LABEL = 'COPY: dbo.trip');
82+
```
83+
84+
## Monitor the load
85+
86+
Check whether your load is making progress by periodically running the following query:
87+
88+
```sql
89+
SELECT r.[request_id]
90+
, r.[status]
91+
, r.resource_class
92+
, r.command
93+
, sum(bytes_processed) AS bytes_processed
94+
, sum(rows_processed) AS rows_processed
95+
FROM sys.dm_pdw_exec_requests r
96+
JOIN sys.dm_pdw_dms_workers w
97+
ON r.[request_id] = w.request_id
98+
WHERE [label] = 'COPY: dbo.trip' and session_id <> session_id() and type = 'WRITER'
99+
GROUP BY r.[request_id]
100+
, r.[status]
101+
, r.resource_class
102+
, r.command;
103+
104+
```
105+
106+
## Next steps
107+
108+
- For best practices on data loading, see [Best Practices for Loading Data](https://docs.microsoft.com/azure/synapse-analytics/sql-data-warehouse/guidance-for-loading-data).
109+
- For information on how to manage the resources for your data loads, see [Workload Isolation](https://docs.microsoft.com/azure/synapse-analytics/sql-data-warehouse/quickstart-configure-workload-isolation-tsql).

articles/synapse-analytics/sql-data-warehouse/toc.yml

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,10 @@
3131
href: create-data-warehouse-portal.md
3232
- name: PowerShell
3333
href: create-data-warehouse-powershell.md
34-
- name: Pause and resume
35-
items:
36-
- name: Portal
37-
href: pause-and-resume-compute-portal.md
38-
- name: PowerShell
39-
href: pause-and-resume-compute-powershell.md
34+
- name: Load data
35+
items:
36+
- name: COPY statement
37+
href: quickstart-bulk-load-copy-tsql.md
4038
- name: Scale
4139
items:
4240
- name: Portal
@@ -55,6 +53,12 @@
5553
items:
5654
- name: T-SQL
5755
href: quickstart-configure-workload-isolation-tsql.md
56+
- name: Pause and resume
57+
items:
58+
- name: Portal
59+
href: pause-and-resume-compute-portal.md
60+
- name: PowerShell
61+
href: pause-and-resume-compute-powershell.md
5862
- name: Concepts
5963
items:
6064
- name: Security

0 commit comments

Comments
 (0)