You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql/tutorial-data-analyst.md
+25-9Lines changed: 25 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,20 +13,23 @@ ms.reviewer: whhender
13
13
14
14
# Tutorial: Explore and Analyze data lakes with serverless SQL pool
15
15
16
-
In this tutorial, you learn how to perform exploratory data analysis. You combine different Azure Open Datasets using serverless SQL pool. You then visualize the results in Synapse Studio for Azure Synapse Analytics.
16
+
In this tutorial, you learn how to perform exploratory data analysis using existing open datasets, with no storage setup required. You combine different Azure Open Datasets using serverless SQL pool. You then visualize the results in Synapse Studio for Azure Synapse Analytics.
17
17
18
18
The `OPENROWSET(BULK...)` function allows you to access files in Azure Storage. `[OPENROWSET](develop-openrowset.md)` reads content of a remote data source, such as a file, and returns the content as a set of rows.
19
19
20
-
## Automatic schema inference
20
+
## Access the serverless SQL pool
21
21
22
-
Since data is stored in the Parquet file format, automatic schema inference is available. You can query the data without listing the data types of all columns in the files. You also can use the virtual column mechanism and the `filepath` function to filter out a certain subset of files.
22
+
Every workspace comes with a preconfigured serverless SQL pool for you to use called *Built-in*. To access it:
23
23
24
-
> [!NOTE]
25
-
> The default collation is `SQL_Latin1_General_CP1_CI_ASIf`. For a non-default collation, take into account case sensitivity.
26
-
>
27
-
> If you create a database with case sensitive collation when you specify columns, make sure to use correct name of the column.
28
-
>
29
-
> A column name `tpepPickupDateTime` would be correct while `tpeppickupdatetime` wouldn't work in a non-default collation.
24
+
1. Open your workspace and select the **Develop** hub.
25
+
1. Select the **+***Add new resource* button.'
26
+
1. Select SQL script.
27
+
28
+
You can use this script to explore your data without having to reserve SQL capacity.
29
+
30
+
## Access the tutorial data
31
+
32
+
All the data we use in this tutorial is housed in the storage account *azureopendatastorage*, which holds Azure Open Datasets for open use in tutorials like this one. You can run all the scripts as-is directly from your workspace as long as your workspace can access a public network.
30
33
31
34
This tutorial uses a dataset about [New York City (NYC) Taxi](https://azure.microsoft.com/services/open-datasets/catalog/nyc-taxi-limousine-commission-yellow-taxi-trip-records/):
32
35
@@ -48,6 +51,8 @@ SELECT TOP 100 * FROM
48
51
) AS [nyc]
49
52
```
50
53
54
+
### Other accessible datasets
55
+
51
56
Similarly, you can query the Public Holidays dataset by using the following query:
52
57
53
58
```sql
@@ -76,6 +81,17 @@ You can learn more about the meaning of the individual columns in the descriptio
Since the data is stored in the Parquet file format, automatic schema inference is available. You can query the data without listing the data types of all columns in the files. You also can use the virtual column mechanism and the `filepath` function to filter out a certain subset of files.
87
+
88
+
> [!NOTE]
89
+
> The default collation is `SQL_Latin1_General_CP1_CI_ASIf`. For a non-default collation, take into account case sensitivity.
90
+
>
91
+
> If you create a database with case sensitive collation when you specify columns, make sure to use correct name of the column.
92
+
>
93
+
> A column name `tpepPickupDateTime` would be correct while `tpeppickupdatetime` wouldn't work in a non-default collation.
94
+
79
95
## Time series, seasonality, and outlier analysis
80
96
81
97
You can summarize the yearly number of taxi rides by using the following query:
0 commit comments