|
| 1 | +--- |
| 2 | +title: 'Quickstart: Bulk load data using a single T-SQL statement' |
| 3 | +description: Bulk load data using the COPY statement |
| 4 | +services: synapse-analytics |
| 5 | +author: kevinvngo |
| 6 | +manager: craigg |
| 7 | +ms.service: synapse-analytics |
| 8 | +ms.topic: quickstart |
| 9 | +ms.subservice: |
| 10 | +ms.date: 04/08/2020 |
| 11 | +ms.author: kevin |
| 12 | +ms.reviewer: jrasnick |
| 13 | +ms.custom: azure-synapse |
| 14 | +--- |
| 15 | + |
| 16 | +# Quickstart: Bulk load data using the COPY statement |
| 17 | + |
| 18 | +In this quickstart, you'll bulk load data into your SQL pool using the simple and flexible [COPY statement](https://docs.microsoft.com/sql/t-sql/statements/copy-into-transact-sql?view=azure-sqldw-latest) for high-throughput data ingestion. The COPY statement is the recommended loading utility as it enables you to seamlessly and flexibly load data by providing functionality to: |
| 19 | + |
| 20 | +- Allow lower privileged users to load without needing strict CONTROL permissions on the data warehouse |
| 21 | +- Leverage only a single T-SQL statement without having to create any additional database objects |
| 22 | +- Leverage a finer permission model without exposing storage account keys using Share Access Signatures (SAS) |
| 23 | +- Specify a different storage account for the ERRORFILE location (REJECTED_ROW_LOCATION) |
| 24 | +- Customize default values for each target column and specify source data fields to load into specific target columns |
| 25 | +- Specify a custom row terminator for CSV files |
| 26 | +- Escape string, field, and row delimiters for CSV files |
| 27 | +- Leverage SQL Server Date formats for CSV files |
| 28 | +- Specify wildcards and multiple files in the storage location path |
| 29 | + |
| 30 | +## Prerequisites |
| 31 | + |
| 32 | +This quickstart assumes you already have a SQL pool. If a SQL pool hasn't been created, use the [Create and Connect - portal](create-data-warehouse-portal.md) quickstart. |
| 33 | + |
| 34 | +## Create the target table |
| 35 | + |
| 36 | +In this example, we'll be loading data from the New York taxi dataset. we'll load a table called Trip that represents taxi trips taken within a single year. Run the following to create the table: |
| 37 | + |
| 38 | +```sql |
| 39 | +CREATE TABLE [dbo].[Trip] |
| 40 | +( |
| 41 | + [DateID] int NOT NULL, |
| 42 | + [MedallionID] int NOT NULL, |
| 43 | + [HackneyLicenseID] int NOT NULL, |
| 44 | + [PickupTimeID] int NOT NULL, |
| 45 | + [DropoffTimeID] int NOT NULL, |
| 46 | + [PickupGeographyID] int NULL, |
| 47 | + [DropoffGeographyID] int NULL, |
| 48 | + [PickupLatitude] float NULL, |
| 49 | + [PickupLongitude] float NULL, |
| 50 | + [PickupLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, |
| 51 | + [DropoffLatitude] float NULL, |
| 52 | + [DropoffLongitude] float NULL, |
| 53 | + [DropoffLatLong] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, |
| 54 | + [PassengerCount] int NULL, |
| 55 | + [TripDurationSeconds] int NULL, |
| 56 | + [TripDistanceMiles] float NULL, |
| 57 | + [PaymentType] varchar(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, |
| 58 | + [FareAmount] money NULL, |
| 59 | + [SurchargeAmount] money NULL, |
| 60 | + [TaxAmount] money NULL, |
| 61 | + [TipAmount] money NULL, |
| 62 | + [TollsAmount] money NULL, |
| 63 | + [TotalAmount] money NULL |
| 64 | +) |
| 65 | +WITH |
| 66 | +( |
| 67 | + DISTRIBUTION = ROUND_ROBIN, |
| 68 | + CLUSTERED COLUMNSTORE INDEX |
| 69 | +); |
| 70 | +``` |
| 71 | + |
| 72 | +## Run the COPY statement |
| 73 | + |
| 74 | +Run the following COPY statement that will load data from the Azure blob storage account into the Trip table. |
| 75 | + |
| 76 | +```sql |
| 77 | +COPY INTO [dbo].[Trip] FROM 'https://nytaxiblob.blob.core.windows.net/2013/Trip2013/' |
| 78 | +WITH ( |
| 79 | + FIELDTERMINATOR='|', |
| 80 | + ROWTERMINATOR='0x0A' |
| 81 | +) OPTION (LABEL = 'COPY: dbo.trip'); |
| 82 | +``` |
| 83 | + |
| 84 | +## Monitor the load |
| 85 | + |
| 86 | +Check whether your load is making progress by periodically running the following query: |
| 87 | + |
| 88 | +```sql |
| 89 | +SELECT r.[request_id] |
| 90 | +, r.[status] |
| 91 | +, r.resource_class |
| 92 | +, r.command |
| 93 | +, sum(bytes_processed) AS bytes_processed |
| 94 | +, sum(rows_processed) AS rows_processed |
| 95 | +FROM sys.dm_pdw_exec_requests r |
| 96 | + JOIN sys.dm_pdw_dms_workers w |
| 97 | + ON r.[request_id] = w.request_id |
| 98 | +WHERE [label] = 'COPY: dbo.trip' and session_id <> session_id() and type = 'WRITER' |
| 99 | +GROUP BY r.[request_id] |
| 100 | +, r.[status] |
| 101 | +, r.resource_class |
| 102 | +, r.command; |
| 103 | + |
| 104 | +``` |
| 105 | + |
| 106 | +## Next steps |
| 107 | + |
| 108 | +- For best practices on data loading, see [Best Practices for Loading Data](https://docs.microsoft.com/azure/synapse-analytics/sql-data-warehouse/guidance-for-loading-data). |
| 109 | +- For information on how to manage the resources for your data loads, see [Workload Isolation](https://docs.microsoft.com/azure/synapse-analytics/sql-data-warehouse/quickstart-configure-workload-isolation-tsql). |
0 commit comments