Skip to content

Commit 5d86bbc

Browse files
Stefan AzaricStefan Azaric
authored andcommitted
Updated tutorial
1 parent 5f404dd commit 5d86bbc

File tree

3 files changed

+35
-26
lines changed

3 files changed

+35
-26
lines changed
9.07 KB
Loading
5.35 KB
Loading

articles/synapse-analytics/sql/tutorial-data-analyst.md

Lines changed: 35 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -135,41 +135,50 @@ From the plot chart, you can see there's a weekly pattern, with Saturdays as the
135135
Next, let's see if the drop in rides correlates with public holidays. We can see if there is a correlation by joining the NYC Taxi rides dataset with the Public Holidays dataset:
136136

137137
```sql
138-
WITH taxi_rides AS
139-
(
140-
SELECT
141-
CAST([tpepPickupDateTime] AS DATE) AS [current_day],
142-
COUNT(*) as rides_per_day
143-
FROM
144-
OPENROWSET(
145-
BULK 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/puYear=*/puMonth=*/*.parquet',
146-
FORMAT='PARQUET'
147-
) AS [nyc]
148-
WHERE nyc.filepath(1) = '2016'
149-
GROUP BY CAST([tpepPickupDateTime] AS DATE)
138+
WITH taxi_rides AS (
139+
SELECT
140+
CAST([tpepPickupDateTime] AS DATE) AS [current_day],
141+
COUNT(*) as rides_per_day
142+
FROM
143+
OPENROWSET(
144+
BULK 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/puYear=*/puMonth=*/*.parquet',
145+
FORMAT='PARQUET'
146+
) AS [nyc]
147+
WHERE nyc.filepath(1) = '2016'
148+
GROUP BY CAST([tpepPickupDateTime] AS DATE)
150149
),
151-
public_holidays AS
152-
(
153-
SELECT
154-
holidayname as holiday,
155-
date
156-
FROM
157-
OPENROWSET(
158-
BULK 'https://azureopendatastorage.blob.core.windows.net/holidaydatacontainer/Processed/*.parquet',
159-
FORMAT='PARQUET'
160-
) AS [holidays]
161-
WHERE countryorregion = 'United States' AND YEAR(date) = 2016
162-
)
150+
public_holidays AS (
151+
SELECT
152+
holidayname as holiday,
153+
date
154+
FROM
155+
OPENROWSET(
156+
BULK 'https://azureopendatastorage.blob.core.windows.net/holidaydatacontainer/Processed/*.parquet',
157+
FORMAT='PARQUET'
158+
) AS [holidays]
159+
WHERE countryorregion = 'United States' AND YEAR(date) = 2016
160+
),
161+
joined_data AS (
163162
SELECT
164-
*
163+
*
165164
FROM taxi_rides t
166165
LEFT OUTER JOIN public_holidays p on t.current_day = p.date
166+
)
167+
168+
SELECT
169+
*,
170+
holiday_rides =
171+
CASE
172+
WHEN holiday is null THEN 0
173+
WHEN holiday is not null THEN rides_per_day
174+
END
175+
FROM joined_data
167176
ORDER BY current_day ASC
168177
```
169178

170179
![NYC Taxi rides and Public Holidays datasets result visualization](./media/tutorial-data-analyst/rides-public-holidays.png)
171180

172-
This time, we want to highlight the number of taxi rides during public holidays. For that purpose, we choose **none** for the **Category** column and **rides_per_day** and **holiday** as the **Legend (series)** columns.
181+
This time, we want to highlight the number of taxi rides during public holidays. For that purpose, we choose **current_day** for the **Category** column and **rides_per_day** and **holiday_rides** as the **Legend (series)** columns.
173182

174183
![Number of taxi rides during public holidays plot chart](./media/tutorial-data-analyst/plot-chart-public-holidays.png)
175184

0 commit comments

Comments
 (0)