Skip to content

Commit 866149b

Browse files
committed
Clarifying instructions
1 parent 8a99ea5 commit 866149b

File tree

1 file changed

+25
-9
lines changed

1 file changed

+25
-9
lines changed

articles/synapse-analytics/sql/tutorial-data-analyst.md

Lines changed: 25 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,20 +13,23 @@ ms.reviewer: whhender
1313

1414
# Tutorial: Explore and Analyze data lakes with serverless SQL pool
1515

16-
In this tutorial, you learn how to perform exploratory data analysis. You combine different Azure Open Datasets using serverless SQL pool. You then visualize the results in Synapse Studio for Azure Synapse Analytics.
16+
In this tutorial, you learn how to perform exploratory data analysis using existing open datasets, with no storage setup required. You combine different Azure Open Datasets using serverless SQL pool. You then visualize the results in Synapse Studio for Azure Synapse Analytics.
1717

1818
The `OPENROWSET(BULK...)` function allows you to access files in Azure Storage. `[OPENROWSET](develop-openrowset.md)` reads content of a remote data source, such as a file, and returns the content as a set of rows.
1919

20-
## Automatic schema inference
20+
## Access the serverless SQL pool
2121

22-
Since data is stored in the Parquet file format, automatic schema inference is available. You can query the data without listing the data types of all columns in the files. You also can use the virtual column mechanism and the `filepath` function to filter out a certain subset of files.
22+
Every workspace comes with a preconfigured serverless SQL pool for you to use called *Built-in*. To access it:
2323

24-
> [!NOTE]
25-
> The default collation is `SQL_Latin1_General_CP1_CI_ASIf`. For a non-default collation, take into account case sensitivity.
26-
>
27-
> If you create a database with case sensitive collation when you specify columns, make sure to use correct name of the column.
28-
>
29-
> A column name `tpepPickupDateTime` would be correct while `tpeppickupdatetime` wouldn't work in a non-default collation.
24+
1. Open your workspace and select the **Develop** hub.
25+
1. Select the **+** *Add new resource* button.'
26+
1. Select SQL script.
27+
28+
You can use this script to explore your data without having to reserve SQL capacity.
29+
30+
## Access the tutorial data
31+
32+
All the data we use in this tutorial is housed in the storage account *azureopendatastorage*, which holds Azure Open Datasets for open use in tutorials like this one. You can run all the scripts as-is directly from your workspace as long as your workspace can access a public network.
3033

3134
This tutorial uses a dataset about [New York City (NYC) Taxi](https://azure.microsoft.com/services/open-datasets/catalog/nyc-taxi-limousine-commission-yellow-taxi-trip-records/):
3235

@@ -48,6 +51,8 @@ SELECT TOP 100 * FROM
4851
) AS [nyc]
4952
```
5053

54+
### Other accessible datasets
55+
5156
Similarly, you can query the Public Holidays dataset by using the following query:
5257

5358
```sql
@@ -76,6 +81,17 @@ You can learn more about the meaning of the individual columns in the descriptio
7681
- [Public Holidays](https://azure.microsoft.com/services/open-datasets/catalog/public-holidays/)
7782
- [Weather Data](https://azure.microsoft.com/services/open-datasets/catalog/noaa-integrated-surface-data/)
7883

84+
## Automatic schema inference
85+
86+
Since the data is stored in the Parquet file format, automatic schema inference is available. You can query the data without listing the data types of all columns in the files. You also can use the virtual column mechanism and the `filepath` function to filter out a certain subset of files.
87+
88+
> [!NOTE]
89+
> The default collation is `SQL_Latin1_General_CP1_CI_ASIf`. For a non-default collation, take into account case sensitivity.
90+
>
91+
> If you create a database with case sensitive collation when you specify columns, make sure to use correct name of the column.
92+
>
93+
> A column name `tpepPickupDateTime` would be correct while `tpeppickupdatetime` wouldn't work in a non-default collation.
94+
7995
## Time series, seasonality, and outlier analysis
8096

8197
You can summarize the yearly number of taxi rides by using the following query:

0 commit comments

Comments
 (0)