Skip to content

Commit fe52358

Browse files
authored
Merge pull request #115851 from jovanpop-msft/patch-178
Update query-specific-files.md
2 parents c28dec2 + dea1849 commit fe52358

File tree

1 file changed

+25
-52
lines changed

1 file changed

+25
-52
lines changed

articles/synapse-analytics/sql/query-specific-files.md

Lines changed: 25 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: azaricstefan
66
ms.service: synapse-analytics
77
ms.topic: how-to
88
ms.subservice:
9-
ms.date: 04/15/2020
9+
ms.date: 05/20/2020
1010
ms.author: v-stazar
1111
ms.reviewer: jrasnick, carlrab
1212
---
@@ -21,10 +21,7 @@ You can use function `filepath` and `filename` to return file names and/or the p
2121

2222
## Prerequisites
2323

24-
Before reading the rest of this article, review the following prerequisites:
25-
26-
- [First-time setup](query-data-storage.md#first-time-setup)
27-
- [Prerequisites](query-data-storage.md#prerequisites)
24+
Your first step is to **create a database** with a datasource that references storage account. Then initialize the objects by executing [setup script](https://github.com/Azure-Samples/Synapse/blob/master/SQL/Samples/LdwSample/SampleDB.sql) on that database. This setup script will create the data sources, database scoped credentials, and external file formats that are used in these samples.
2825

2926
## Functions
3027

@@ -36,15 +33,15 @@ The following sample reads the NYC Yellow Taxi data files for the last three mon
3633

3734
```sql
3835
SELECT
39-
r.filename() AS [filename]
36+
nyc.filename() AS [filename]
4037
,COUNT_BIG(*) AS [rows]
41-
FROM OPENROWSET(
42-
BULK 'https://sqlondemandstorage.blob.core.windows.net/parquet/taxi/year=2017/month=9/*.parquet',
43-
FORMAT='PARQUET') AS [r]
44-
GROUP BY
45-
r.filename()
46-
ORDER BY
47-
[filename];
38+
FROM
39+
OPENROWSET(
40+
BULK 'parquet/taxi/year=2017/month=9/*.parquet',
41+
DATA_SOURCE = 'SqlOnDemandDemo',
42+
FORMAT='PARQUET'
43+
) nyc
44+
GROUP BY nyc.filename();
4845
```
4946

5047
The following example shows how *filename()* can be used in the WHERE clause to filter the files to be read. It accesses the entire folder in the OPENROWSET part of the query and filters files in the WHERE clause.
@@ -56,10 +53,14 @@ SELECT
5653
r.filename() AS [filename]
5754
,COUNT_BIG(*) AS [rows]
5855
FROM OPENROWSET(
59-
BULK 'https://sqlondemandstorage.blob.core.windows.net/parquet/taxi/year=2017/month=9/*.parquet',
60-
FORMAT='PARQUET') AS [r]
56+
BULK 'csv/taxi/yellow_tripdata_2017-*.csv',
57+
DATA_SOURCE = 'SqlOnDemandDemo',
58+
FORMAT = 'CSV',
59+
PARSER_VERSION = '2.0',
60+
FIRSTROW = 2)
61+
WITH (C1 varchar(200) ) AS [r]
6162
WHERE
62-
r.filename() IN ('yellow_tripdata_2017-10.parquet', 'yellow_tripdata_2017-11.parquet', 'yellow_tripdata_2017-12.parquet')
63+
r.filename() IN ('yellow_tripdata_2017-10.csv', 'yellow_tripdata_2017-11.csv', 'yellow_tripdata_2017-12.csv')
6364
GROUP BY
6465
r.filename()
6566
ORDER BY
@@ -80,28 +81,14 @@ SELECT
8081
r.filepath() AS filepath
8182
,COUNT_BIG(*) AS [rows]
8283
FROM OPENROWSET(
83-
BULK 'https://sqlondemandstorage.blob.core.windows.net/csv/taxi/yellow_tripdata_2017-1*.csv',
84+
BULK 'csv/taxi/yellow_tripdata_2017-1*.csv',
85+
DATA_SOURCE = 'SqlOnDemandDemo',
8486
FORMAT = 'CSV',
87+
PARSER_VERSION = '2.0',
8588
FIRSTROW = 2
8689
)
8790
WITH (
88-
vendor_id INT,
89-
pickup_datetime DATETIME2,
90-
dropoff_datetime DATETIME2,
91-
passenger_count SMALLINT,
92-
trip_distance FLOAT,
93-
rate_code SMALLINT,
94-
store_and_fwd_flag SMALLINT,
95-
pickup_location_id INT,
96-
dropoff_location_id INT,
97-
payment_type SMALLINT,
98-
fare_amount FLOAT,
99-
extra FLOAT,
100-
mta_tax FLOAT,
101-
tip_amount FLOAT,
102-
tolls_amount FLOAT,
103-
improvement_surcharge FLOAT,
104-
total_amount FLOAT
91+
vendor_id INT
10592
) AS [r]
10693
GROUP BY
10794
r.filepath()
@@ -120,28 +107,14 @@ SELECT
120107
,r.filepath(2) AS [month]
121108
,COUNT_BIG(*) AS [rows]
122109
FROM OPENROWSET(
123-
BULK 'https://sqlondemandstorage.blob.core.windows.net/csv/taxi/yellow_tripdata_*-*.csv',
110+
BULK 'csv/taxi/yellow_tripdata_*-*.csv',
111+
DATA_SOURCE = 'SqlOnDemandDemo',
124112
FORMAT = 'CSV',
113+
PARSER_VERSION = '2.0',
125114
FIRSTROW = 2
126115
)
127116
WITH (
128-
vendor_id INT,
129-
pickup_datetime DATETIME2,
130-
dropoff_datetime DATETIME2,
131-
passenger_count SMALLINT,
132-
trip_distance FLOAT,
133-
rate_code SMALLINT,
134-
store_and_fwd_flag SMALLINT,
135-
pickup_location_id INT,
136-
dropoff_location_id INT,
137-
payment_type SMALLINT,
138-
fare_amount FLOAT,
139-
extra FLOAT,
140-
mta_tax FLOAT,
141-
tip_amount FLOAT,
142-
tolls_amount FLOAT,
143-
improvement_surcharge FLOAT,
144-
total_amount FLOAT
117+
vendor_id INT
145118
) AS [r]
146119
WHERE
147120
r.filepath(1) IN ('2017')

0 commit comments

Comments
 (0)