Skip to content

Commit 08a0b72

Browse files
authored
Merge pull request #115832 from jovanpop-msft/patch-172
Refactoring to use db-scoped credentials
2 parents 8a5d771 + 8767448 commit 08a0b72

File tree

2 files changed

+35
-58
lines changed

2 files changed

+35
-58
lines changed

articles/synapse-analytics/sql/create-use-views.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: azaricstefan
66
ms.service: synapse-analytics
77
ms.topic: overview
88
ms.subservice:
9-
ms.date: 04/15/2020
9+
ms.date: 05/20/2020
1010
ms.author: v-stazar
1111
ms.reviewer: jrasnick, carlrab
1212
---
@@ -36,8 +36,9 @@ GO
3636
CREATE VIEW populationView AS
3737
SELECT *
3838
FROM OPENROWSET(
39-
BULK 'https://sqlondemandstorage.blob.core.windows.net/csv/population/population.csv',
40-
FORMAT = 'CSV',
39+
BULK 'csv/population/population.csv',
40+
DATA_SOURCE = 'SqlOnDemandDemo',
41+
FORMAT = 'CSV',
4142
FIELDTERMINATOR =',',
4243
ROWTERMINATOR = '\n'
4344
)
@@ -57,7 +58,7 @@ AS SELECT *, nyc.filepath(1) AS [year], nyc.filepath(2) AS [month]
5758
FROM
5859
OPENROWSET(
5960
BULK 'parquet/taxi/year=*/month=*/*.parquet',
60-
DATA_SOURCE = 'sqlondemandstorage',
61+
DATA_SOURCE = 'sqlondemanddemo',
6162
FORMAT='PARQUET'
6263
) AS nyc
6364
```

articles/synapse-analytics/sql/query-parquet-files.md

Lines changed: 30 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: azaricstefan
66
ms.service: synapse-analytics
77
ms.topic: how-to
88
ms.subservice:
9-
ms.date: 04/15/2020
9+
ms.date: 05/20/2020
1010
ms.author: v-stazar
1111
ms.reviewer: jrasnick, carlrab
1212
---
@@ -17,58 +17,36 @@ In this article, you'll learn how to write a query using SQL on-demand (preview)
1717

1818
## Prerequisites
1919

20-
Before reading rest of this article, review the following articles:
21-
22-
- [First-time setup](query-data-storage.md#first-time-setup)
23-
- [Prerequisites](query-data-storage.md#prerequisites)
20+
Your first step is to **create a database** with a datasource that references [NYC Yellow Taxi](https://azure.microsoft.com/services/open-datasets/catalog/nyc-taxi-limousine-commission-yellow-taxi-trip-records/) storage account. Then initialize the objects by executing [setup script](https://github.com/Azure-Samples/Synapse/blob/master/SQL/Samples/LdwSample/SampleDB.sql) on that database. This setup script will create the data sources, database scoped credentials, and external file formats that are used in these samples.
2421

2522
## Dataset
2623

27-
You can query Parquet files the same way you read CSV files. The only difference is that the FILEFORMAT parameter should be set to PARQUET. Examples in this article show the specifics of reading Parquet files.
28-
29-
> [!NOTE]
30-
> You do not have to specify columns in the OPENROWSET WITH clause when reading parquet files. SQL on-demand will utilize metadata in the Parquet file and bind columns by name.
31-
32-
You'll use the folder *parquet/taxi* for the sample queries. It contains NYC Taxi - Yellow Taxi Trip Records data from July 2016. to June 2018.
33-
34-
Data is partitioned by year and month and the folder structure is as follows:
35-
36-
- year=2016
37-
- month=6
38-
- ...
39-
- month=12
40-
- year=2017
41-
- month=1
42-
- ...
43-
- month=12
44-
- year=2018
45-
- month=1
46-
- ...
47-
- month=6
24+
[NYC Yellow Taxi](https://azure.microsoft.com/services/open-datasets/catalog/nyc-taxi-limousine-commission-yellow-taxi-trip-records/) dataset is used in this sample. You can query Parquet files the same way you [read CSV files](query-parquet-files.md). The only difference is that the `FILEFORMAT` parameter should be set to `PARQUET`. Examples in this article show the specifics of reading Parquet files.
4825

4926
## Query set of parquet files
5027

5128
You can specify only the columns of interest when you query Parquet files.
5229

5330
```sql
5431
SELECT
55-
YEAR(pickup_datetime),
56-
passenger_count,
32+
YEAR(tpepPickupDateTime),
33+
passengerCount,
5734
COUNT(*) AS cnt
5835
FROM
5936
OPENROWSET(
60-
BULK 'https://sqlondemandstorage.blob.core.windows.net/parquet/taxi/*/*/*',
37+
BULK 'puYear=2018/puMonth=*/*.snappy.parquet',
38+
DATA_SOURCE = 'YellowTaxi',
6139
FORMAT='PARQUET'
6240
) WITH (
63-
pickup_datetime DATETIME2,
64-
passenger_count INT
41+
tpepPickupDateTime DATETIME2,
42+
passengerCount INT
6543
) AS nyc
6644
GROUP BY
67-
passenger_count,
68-
YEAR(pickup_datetime)
45+
passengerCount,
46+
YEAR(tpepPickupDateTime)
6947
ORDER BY
70-
YEAR(pickup_datetime),
71-
passenger_count;
48+
YEAR(tpepPickupDateTime),
49+
passengerCount;
7250
```
7351

7452
## Automatic schema inference
@@ -81,13 +59,13 @@ The sample below shows the automatic schema inference capabilities for Parquet f
8159
> You don't have to specify columns in the OPENROWSET WITH clause when reading Parquet files. In that case, SQL on-demand Query service will utilize metadata in the Parquet file and bind columns by name.
8260
8361
```sql
84-
SELECT
85-
COUNT_BIG(*)
86-
FROM
62+
SELECT TOP 10 *
63+
FROM
8764
OPENROWSET(
88-
BULK 'https://sqlondemandstorage.blob.core.windows.net/parquet/taxi/year=2017/month=9/*.parquet',
65+
BULK 'puYear=2018/puMonth=*/*.snappy.parquet',
66+
DATA_SOURCE = 'YellowTaxi',
8967
FORMAT='PARQUET'
90-
) AS nyc;
68+
) AS nyc
9169
```
9270

9371
### Query partitioned data
@@ -99,27 +77,25 @@ The data set provided in this sample is divided (partitioned) into separate subf
9977
10078
```sql
10179
SELECT
102-
nyc.filepath(1) AS [year],
103-
nyc.filepath(2) AS [month],
104-
payment_type,
105-
SUM(fare_amount) AS fare_total
106-
FROM
80+
YEAR(tpepPickupDateTime),
81+
passengerCount,
82+
COUNT(*) AS cnt
83+
FROM
10784
OPENROWSET(
108-
BULK 'https://sqlondemandstorage.blob.core.windows.net/parquet/taxi/year=*/month=*/*.parquet',
85+
BULK 'puYear=*/puMonth=*/*.snappy.parquet',
86+
DATA_SOURCE = 'YellowTaxi',
10987
FORMAT='PARQUET'
110-
) AS nyc
88+
) nyc
11189
WHERE
11290
nyc.filepath(1) = 2017
11391
AND nyc.filepath(2) IN (1, 2, 3)
114-
AND pickup_datetime BETWEEN CAST('1/1/2017' AS datetime) AND CAST('3/31/2017' AS datetime)
92+
AND tpepPickupDateTime BETWEEN CAST('1/1/2017' AS datetime) AND CAST('3/31/2017' AS datetime)
11593
GROUP BY
116-
nyc.filepath(1),
117-
nyc.filepath(2),
118-
payment_type
94+
passengerCount,
95+
YEAR(tpepPickupDateTime)
11996
ORDER BY
120-
nyc.filepath(1),
121-
nyc.filepath(2),
122-
payment_type;
97+
YEAR(tpepPickupDateTime),
98+
passengerCount;
12399
```
124100

125101
## Type mapping

0 commit comments

Comments
 (0)