Skip to content

Commit 39a0f89

Browse files
Update query-folders-multiple-csv-files.md
1 parent a7e964f commit 39a0f89

File tree

1 file changed

+22
-62
lines changed

1 file changed

+22
-62
lines changed

articles/synapse-analytics/sql/query-folders-multiple-csv-files.md

Lines changed: 22 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -19,25 +19,10 @@ SQL on-demand supports reading multiple files/folders by using wildcards, which
1919

2020
## Prerequisites
2121

22-
Before reading the rest of this article, make sure to review the articles listed below:
23-
24-
- [First-time setup](query-data-storage.md#first-time-setup)
25-
- [Prerequisites](query-data-storage.md#prerequisites)
26-
27-
## Read multiple files in folder
28-
29-
You'll use the folder *csv/taxi* to follow the sample queries. It contains NYC Taxi - Yellow Taxi Trip Records data from July 2016 to June 2018.
30-
31-
Files in *csv/taxi* are named after year and month:
32-
33-
- yellow_tripdata_2016-07.csv
34-
- yellow_tripdata_2016-08.csv
35-
- yellow_tripdata_2016-09.csv
36-
- ...
37-
- yellow_tripdata_2018-04.csv
38-
- yellow_tripdata_2018-05.csv
39-
- yellow_tripdata_2018-06.csv
22+
Your first step is to **create a database** where you will execute the queries. Then initialize the objects by executing [setup script](https://github.com/Azure-Samples/Synapse/blob/master/SQL/Samples/LdwSample/SampleDB.sql) on that database. This setup script will create the data sources, database scoped credentials, and external file formats that are used in these samples.
4023

24+
You'll use the folder *csv/taxi* to follow the sample queries. It contains NYC Taxi - Yellow Taxi Trip Records data from July 2016 to June 2018. Files in *csv/taxi* are named after year and month using the following pattern: yellow_tripdata_<year>-<month>.csv
25+
4126
Each file has the following structure:
4227

4328
[First 10 rows of the CSV file](./media/querying-folders-and-multiple-csv-files/nyc-taxi.png)
@@ -52,28 +37,14 @@ SELECT
5237
SUM(passenger_count) AS passengers_total,
5338
COUNT(*) AS [rides_total]
5439
FROM OPENROWSET(
55-
BULK 'https://sqlondemandstorage.blob.core.windows.net/csv/taxi/*.*',
56-
FORMAT = 'CSV',
40+
BULK 'csv/taxi/*.csv',
41+
DATA_SOURCE = 'sqlondemanddemo',
42+
FORMAT = 'CSV', PARSER_VERSION = '2.0',
5743
FIRSTROW = 2
5844
)
5945
WITH (
60-
vendor_id VARCHAR(100) COLLATE Latin1_General_BIN2,
61-
pickup_datetime DATETIME2,
62-
dropoff_datetime DATETIME2,
63-
passenger_count INT,
64-
trip_distance FLOAT,
65-
rate_code INT,
66-
store_and_fwd_flag VARCHAR(100) COLLATE Latin1_General_BIN2,
67-
pickup_location_id INT,
68-
dropoff_location_id INT,
69-
payment_type INT,
70-
fare_amount FLOAT,
71-
extra FLOAT,
72-
mta_tax FLOAT,
73-
tip_amount FLOAT,
74-
tolls_amount FLOAT,
75-
improvement_surcharge FLOAT,
76-
total_amount FLOAT
46+
pickup_datetime DATETIME2 2,
47+
passenger_count INT 4
7748
) AS nyc
7849
GROUP BY
7950
YEAR(pickup_datetime)
@@ -93,28 +64,14 @@ SELECT
9364
payment_type,
9465
SUM(fare_amount) AS fare_total
9566
FROM OPENROWSET(
96-
BULK 'https://sqlondemandstorage.blob.core.windows.net/csv/taxi/yellow_tripdata_2017-*.csv',
97-
FORMAT = 'CSV',
67+
BULK 'csv/taxi/yellow_tripdata_2017-*.csv',
68+
DATA_SOURCE = 'sqlondemanddemo',
69+
FORMAT = 'CSV', PARSER_VERSION = '2.0',
9870
FIRSTROW = 2
9971
)
10072
WITH (
101-
vendor_id VARCHAR(100) COLLATE Latin1_General_BIN2,
102-
pickup_datetime DATETIME2,
103-
dropoff_datetime DATETIME2,
104-
passenger_count INT,
105-
trip_distance FLOAT,
106-
rate_code INT,
107-
store_and_fwd_flag VARCHAR(100) COLLATE Latin1_General_BIN2,
108-
pickup_location_id INT,
109-
dropoff_location_id INT,
110-
payment_type INT,
111-
fare_amount FLOAT,
112-
extra FLOAT,
113-
mta_tax FLOAT,
114-
tip_amount FLOAT,
115-
tolls_amount FLOAT,
116-
improvement_surcharge FLOAT,
117-
total_amount FLOAT
73+
payment_type INT 10,
74+
fare_amount FLOAT 11
11875
) AS nyc
11976
GROUP BY payment_type
12077
ORDER BY payment_type;
@@ -142,8 +99,9 @@ SELECT
14299
SUM(passenger_count) AS passengers_total,
143100
COUNT(*) AS [rides_total]
144101
FROM OPENROWSET(
145-
BULK 'https://sqlondemandstorage.blob.core.windows.net/csv/taxi/',
146-
FORMAT = 'CSV',
102+
BULK 'csv/taxi/',
103+
DATA_SOURCE = 'sqlondemanddemo',
104+
FORMAT = 'CSV', PARSER_VERSION = '2.0',
147105
FIRSTROW = 2
148106
)
149107
WITH (
@@ -187,8 +145,9 @@ SELECT
187145
SUM(passenger_count) AS passengers_total,
188146
COUNT(*) AS [rides_total]
189147
FROM OPENROWSET(
190-
BULK 'https://sqlondemandstorage.blob.core.windows.net/csv/t*i/',
191-
FORMAT = 'CSV',
148+
BULK 'csv/t*i/',
149+
DATA_SOURCE = 'sqlondemanddemo',
150+
FORMAT = 'CSV', PARSER_VERSION = '2.0',
192151
FIRSTROW = 2
193152
)
194153
WITH (
@@ -235,8 +194,9 @@ SELECT
235194
SUM(passenger_count) AS passengers_total,
236195
COUNT(*) AS [rides_total]
237196
FROM OPENROWSET(
238-
BULK 'https://sqlondemandstorage.blob.core.windows.net/csv/t*i/yellow_tripdata_2017-*.csv',
239-
FORMAT = 'CSV',
197+
BULK 'csv/t*i/yellow_tripdata_2017-*.csv',
198+
DATA_SOURCE = 'sqlondemanddemo',
199+
FORMAT = 'CSV', PARSER_VERSION = '2.0',
240200
FIRSTROW = 2
241201
)
242202
WITH (

0 commit comments

Comments
 (0)