Skip to content

Commit 9d3378a

Browse files
Merge pull request #115927 from julieMSFT/20200520_mtm
20200520 mtm
2 parents 483c5f1 + 1aa0864 commit 9d3378a

File tree

6 files changed

+103
-163
lines changed

6 files changed

+103
-163
lines changed

articles/synapse-analytics/sql/create-use-views.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: azaricstefan
66
ms.service: synapse-analytics
77
ms.topic: overview
88
ms.subservice:
9-
ms.date: 04/15/2020
9+
ms.date: 05/20/2020
1010
ms.author: v-stazar
1111
ms.reviewer: jrasnick, carlrab
1212
---
@@ -36,8 +36,9 @@ GO
3636
CREATE VIEW populationView AS
3737
SELECT *
3838
FROM OPENROWSET(
39-
BULK 'https://sqlondemandstorage.blob.core.windows.net/csv/population/population.csv',
40-
FORMAT = 'CSV',
39+
BULK 'csv/population/population.csv',
40+
DATA_SOURCE = 'SqlOnDemandDemo',
41+
FORMAT = 'CSV',
4142
FIELDTERMINATOR =',',
4243
ROWTERMINATOR = '\n'
4344
)
@@ -57,7 +58,7 @@ AS SELECT *, nyc.filepath(1) AS [year], nyc.filepath(2) AS [month]
5758
FROM
5859
OPENROWSET(
5960
BULK 'parquet/taxi/year=*/month=*/*.parquet',
60-
DATA_SOURCE = 'sqlondemandstorage',
61+
DATA_SOURCE = 'sqlondemanddemo',
6162
FORMAT='PARQUET'
6263
) AS nyc
6364
```

articles/synapse-analytics/sql/query-parquet-files.md

Lines changed: 30 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: azaricstefan
66
ms.service: synapse-analytics
77
ms.topic: how-to
88
ms.subservice:
9-
ms.date: 04/15/2020
9+
ms.date: 05/20/2020
1010
ms.author: v-stazar
1111
ms.reviewer: jrasnick, carlrab
1212
---
@@ -17,58 +17,36 @@ In this article, you'll learn how to write a query using SQL on-demand (preview)
1717

1818
## Prerequisites
1919

20-
Before reading rest of this article, review the following articles:
21-
22-
- [First-time setup](query-data-storage.md#first-time-setup)
23-
- [Prerequisites](query-data-storage.md#prerequisites)
20+
Your first step is to **create a database** with a datasource that references [NYC Yellow Taxi](https://azure.microsoft.com/services/open-datasets/catalog/nyc-taxi-limousine-commission-yellow-taxi-trip-records/) storage account. Then initialize the objects by executing [setup script](https://github.com/Azure-Samples/Synapse/blob/master/SQL/Samples/LdwSample/SampleDB.sql) on that database. This setup script will create the data sources, database scoped credentials, and external file formats that are used in these samples.
2421

2522
## Dataset
2623

27-
You can query Parquet files the same way you read CSV files. The only difference is that the FILEFORMAT parameter should be set to PARQUET. Examples in this article show the specifics of reading Parquet files.
28-
29-
> [!NOTE]
30-
> You do not have to specify columns in the OPENROWSET WITH clause when reading parquet files. SQL on-demand will utilize metadata in the Parquet file and bind columns by name.
31-
32-
You'll use the folder *parquet/taxi* for the sample queries. It contains NYC Taxi - Yellow Taxi Trip Records data from July 2016. to June 2018.
33-
34-
Data is partitioned by year and month and the folder structure is as follows:
35-
36-
- year=2016
37-
- month=6
38-
- ...
39-
- month=12
40-
- year=2017
41-
- month=1
42-
- ...
43-
- month=12
44-
- year=2018
45-
- month=1
46-
- ...
47-
- month=6
24+
[NYC Yellow Taxi](https://azure.microsoft.com/services/open-datasets/catalog/nyc-taxi-limousine-commission-yellow-taxi-trip-records/) dataset is used in this sample. You can query Parquet files the same way you [read CSV files](query-parquet-files.md). The only difference is that the `FILEFORMAT` parameter should be set to `PARQUET`. Examples in this article show the specifics of reading Parquet files.
4825

4926
## Query set of parquet files
5027

5128
You can specify only the columns of interest when you query Parquet files.
5229

5330
```sql
5431
SELECT
55-
YEAR(pickup_datetime),
56-
passenger_count,
32+
YEAR(tpepPickupDateTime),
33+
passengerCount,
5734
COUNT(*) AS cnt
5835
FROM
5936
OPENROWSET(
60-
BULK 'https://sqlondemandstorage.blob.core.windows.net/parquet/taxi/*/*/*',
37+
BULK 'puYear=2018/puMonth=*/*.snappy.parquet',
38+
DATA_SOURCE = 'YellowTaxi',
6139
FORMAT='PARQUET'
6240
) WITH (
63-
pickup_datetime DATETIME2,
64-
passenger_count INT
41+
tpepPickupDateTime DATETIME2,
42+
passengerCount INT
6543
) AS nyc
6644
GROUP BY
67-
passenger_count,
68-
YEAR(pickup_datetime)
45+
passengerCount,
46+
YEAR(tpepPickupDateTime)
6947
ORDER BY
70-
YEAR(pickup_datetime),
71-
passenger_count;
48+
YEAR(tpepPickupDateTime),
49+
passengerCount;
7250
```
7351

7452
## Automatic schema inference
@@ -81,13 +59,13 @@ The sample below shows the automatic schema inference capabilities for Parquet f
8159
> You don't have to specify columns in the OPENROWSET WITH clause when reading Parquet files. In that case, SQL on-demand Query service will utilize metadata in the Parquet file and bind columns by name.
8260
8361
```sql
84-
SELECT
85-
COUNT_BIG(*)
86-
FROM
62+
SELECT TOP 10 *
63+
FROM
8764
OPENROWSET(
88-
BULK 'https://sqlondemandstorage.blob.core.windows.net/parquet/taxi/year=2017/month=9/*.parquet',
65+
BULK 'puYear=2018/puMonth=*/*.snappy.parquet',
66+
DATA_SOURCE = 'YellowTaxi',
8967
FORMAT='PARQUET'
90-
) AS nyc;
68+
) AS nyc
9169
```
9270

9371
### Query partitioned data
@@ -99,27 +77,25 @@ The data set provided in this sample is divided (partitioned) into separate subf
9977
10078
```sql
10179
SELECT
102-
nyc.filepath(1) AS [year],
103-
nyc.filepath(2) AS [month],
104-
payment_type,
105-
SUM(fare_amount) AS fare_total
106-
FROM
80+
YEAR(tpepPickupDateTime),
81+
passengerCount,
82+
COUNT(*) AS cnt
83+
FROM
10784
OPENROWSET(
108-
BULK 'https://sqlondemandstorage.blob.core.windows.net/parquet/taxi/year=*/month=*/*.parquet',
85+
BULK 'puYear=*/puMonth=*/*.snappy.parquet',
86+
DATA_SOURCE = 'YellowTaxi',
10987
FORMAT='PARQUET'
110-
) AS nyc
88+
) nyc
11189
WHERE
11290
nyc.filepath(1) = 2017
11391
AND nyc.filepath(2) IN (1, 2, 3)
114-
AND pickup_datetime BETWEEN CAST('1/1/2017' AS datetime) AND CAST('3/31/2017' AS datetime)
92+
AND tpepPickupDateTime BETWEEN CAST('1/1/2017' AS datetime) AND CAST('3/31/2017' AS datetime)
11593
GROUP BY
116-
nyc.filepath(1),
117-
nyc.filepath(2),
118-
payment_type
94+
passengerCount,
95+
YEAR(tpepPickupDateTime)
11996
ORDER BY
120-
nyc.filepath(1),
121-
nyc.filepath(2),
122-
payment_type;
97+
YEAR(tpepPickupDateTime),
98+
passengerCount;
12399
```
124100

125101
## Type mapping

articles/synapse-analytics/sql/query-parquet-nested-types.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: azaricstefan
66
ms.service: synapse-analytics
77
ms.topic: how-to
88
ms.subservice:
9-
ms.date: 04/15/2020
9+
ms.date: 05/20/2020
1010
ms.author: v-stazar
1111
ms.reviewer: jrasnick, carlrab
1212
---
@@ -17,10 +17,7 @@ In this article, you'll learn how to write a query using SQL on-demand (preview)
1717

1818
## Prerequisites
1919

20-
Before reading the rest of this article, review the following articles:
21-
22-
- [First-time setup](query-data-storage.md#first-time-setup)
23-
- [Prerequisites](query-data-storage.md#prerequisites)
20+
Your first step is to **create a database** with a datasource that references. Then initialize the objects by executing [setup script](https://github.com/Azure-Samples/Synapse/blob/master/SQL/Samples/LdwSample/SampleDB.sql) on that database. This setup script will create the data sources, database scoped credentials, and external file formats that are used in these samples.
2421

2522
## Project nested or repeated data
2623

@@ -31,7 +28,8 @@ SELECT
3128
*
3229
FROM
3330
OPENROWSET(
34-
BULK 'https://sqlondemandstorage.blob.core.windows.net/parquet/nested/justSimpleArray.parquet',
31+
BULK 'parquet/nested/justSimpleArray.parquet',
32+
DATA_SOURCE = 'SqlOnDemandDemo',
3533
FORMAT='PARQUET'
3634
) AS [r];
3735
```
@@ -45,7 +43,8 @@ SELECT
4543
*
4644
FROM
4745
OPENROWSET(
48-
BULK 'https://sqlondemandstorage.blob.core.windows.net/parquet/nested/structExample.parquet',
46+
BULK 'parquet/nested/structExample.parquet',
47+
DATA_SOURCE = 'SqlOnDemandDemo',
4948
FORMAT='PARQUET'
5049
)
5150
WITH (
@@ -75,7 +74,8 @@ SELECT
7574
JSON_VALUE(SimpleArray, '$[2]') AS ThirdElement
7675
FROM
7776
OPENROWSET(
78-
BULK 'https://sqlondemandstorage.blob.core.windows.net/parquet/nested/justSimpleArray.parquet',
77+
BULK 'parquet/nested/justSimpleArray.parquet',
78+
DATA_SOURCE = 'SqlOnDemandDemo',
7979
FORMAT='PARQUET'
8080
) AS [r];
8181
```
@@ -88,7 +88,8 @@ SELECT
8888
JSON_QUERY(MapOfPersons, '$."John Doe"') AS [John]
8989
FROM
9090
OPENROWSET(
91-
BULK 'https://sqlondemandstorage.blob.core.windows.net/parquet/nested/mapExample.parquet',
91+
BULK 'parquet/nested/mapExample.parquet',
92+
DATA_SOURCE = 'SqlOnDemandDemo',
9293
FORMAT='PARQUET'
9394
) AS [r];
9495
```

articles/synapse-analytics/sql/query-specific-files.md

Lines changed: 25 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: azaricstefan
66
ms.service: synapse-analytics
77
ms.topic: how-to
88
ms.subservice:
9-
ms.date: 04/15/2020
9+
ms.date: 05/20/2020
1010
ms.author: v-stazar
1111
ms.reviewer: jrasnick, carlrab
1212
---
@@ -21,10 +21,7 @@ You can use function `filepath` and `filename` to return file names and/or the p
2121

2222
## Prerequisites
2323

24-
Before reading the rest of this article, review the following prerequisites:
25-
26-
- [First-time setup](query-data-storage.md#first-time-setup)
27-
- [Prerequisites](query-data-storage.md#prerequisites)
24+
Your first step is to **create a database** with a datasource that references storage account. Then initialize the objects by executing [setup script](https://github.com/Azure-Samples/Synapse/blob/master/SQL/Samples/LdwSample/SampleDB.sql) on that database. This setup script will create the data sources, database scoped credentials, and external file formats that are used in these samples.
2825

2926
## Functions
3027

@@ -36,15 +33,15 @@ The following sample reads the NYC Yellow Taxi data files for the last three mon
3633

3734
```sql
3835
SELECT
39-
r.filename() AS [filename]
36+
nyc.filename() AS [filename]
4037
,COUNT_BIG(*) AS [rows]
41-
FROM OPENROWSET(
42-
BULK 'https://sqlondemandstorage.blob.core.windows.net/parquet/taxi/year=2017/month=9/*.parquet',
43-
FORMAT='PARQUET') AS [r]
44-
GROUP BY
45-
r.filename()
46-
ORDER BY
47-
[filename];
38+
FROM
39+
OPENROWSET(
40+
BULK 'parquet/taxi/year=2017/month=9/*.parquet',
41+
DATA_SOURCE = 'SqlOnDemandDemo',
42+
FORMAT='PARQUET'
43+
) nyc
44+
GROUP BY nyc.filename();
4845
```
4946

5047
The following example shows how *filename()* can be used in the WHERE clause to filter the files to be read. It accesses the entire folder in the OPENROWSET part of the query and filters files in the WHERE clause.
@@ -56,10 +53,14 @@ SELECT
5653
r.filename() AS [filename]
5754
,COUNT_BIG(*) AS [rows]
5855
FROM OPENROWSET(
59-
BULK 'https://sqlondemandstorage.blob.core.windows.net/parquet/taxi/year=2017/month=9/*.parquet',
60-
FORMAT='PARQUET') AS [r]
56+
BULK 'csv/taxi/yellow_tripdata_2017-*.csv',
57+
DATA_SOURCE = 'SqlOnDemandDemo',
58+
FORMAT = 'CSV',
59+
PARSER_VERSION = '2.0',
60+
FIRSTROW = 2)
61+
WITH (C1 varchar(200) ) AS [r]
6162
WHERE
62-
r.filename() IN ('yellow_tripdata_2017-10.parquet', 'yellow_tripdata_2017-11.parquet', 'yellow_tripdata_2017-12.parquet')
63+
r.filename() IN ('yellow_tripdata_2017-10.csv', 'yellow_tripdata_2017-11.csv', 'yellow_tripdata_2017-12.csv')
6364
GROUP BY
6465
r.filename()
6566
ORDER BY
@@ -80,28 +81,14 @@ SELECT
8081
r.filepath() AS filepath
8182
,COUNT_BIG(*) AS [rows]
8283
FROM OPENROWSET(
83-
BULK 'https://sqlondemandstorage.blob.core.windows.net/csv/taxi/yellow_tripdata_2017-1*.csv',
84+
BULK 'csv/taxi/yellow_tripdata_2017-1*.csv',
85+
DATA_SOURCE = 'SqlOnDemandDemo',
8486
FORMAT = 'CSV',
87+
PARSER_VERSION = '2.0',
8588
FIRSTROW = 2
8689
)
8790
WITH (
88-
vendor_id INT,
89-
pickup_datetime DATETIME2,
90-
dropoff_datetime DATETIME2,
91-
passenger_count SMALLINT,
92-
trip_distance FLOAT,
93-
rate_code SMALLINT,
94-
store_and_fwd_flag SMALLINT,
95-
pickup_location_id INT,
96-
dropoff_location_id INT,
97-
payment_type SMALLINT,
98-
fare_amount FLOAT,
99-
extra FLOAT,
100-
mta_tax FLOAT,
101-
tip_amount FLOAT,
102-
tolls_amount FLOAT,
103-
improvement_surcharge FLOAT,
104-
total_amount FLOAT
91+
vendor_id INT
10592
) AS [r]
10693
GROUP BY
10794
r.filepath()
@@ -120,28 +107,14 @@ SELECT
120107
,r.filepath(2) AS [month]
121108
,COUNT_BIG(*) AS [rows]
122109
FROM OPENROWSET(
123-
BULK 'https://sqlondemandstorage.blob.core.windows.net/csv/taxi/yellow_tripdata_*-*.csv',
110+
BULK 'csv/taxi/yellow_tripdata_*-*.csv',
111+
DATA_SOURCE = 'SqlOnDemandDemo',
124112
FORMAT = 'CSV',
113+
PARSER_VERSION = '2.0',
125114
FIRSTROW = 2
126115
)
127116
WITH (
128-
vendor_id INT,
129-
pickup_datetime DATETIME2,
130-
dropoff_datetime DATETIME2,
131-
passenger_count SMALLINT,
132-
trip_distance FLOAT,
133-
rate_code SMALLINT,
134-
store_and_fwd_flag SMALLINT,
135-
pickup_location_id INT,
136-
dropoff_location_id INT,
137-
payment_type SMALLINT,
138-
fare_amount FLOAT,
139-
extra FLOAT,
140-
mta_tax FLOAT,
141-
tip_amount FLOAT,
142-
tolls_amount FLOAT,
143-
improvement_surcharge FLOAT,
144-
total_amount FLOAT
117+
vendor_id INT
145118
) AS [r]
146119
WHERE
147120
r.filepath(1) IN ('2017')

0 commit comments

Comments
 (0)