You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Azure Data Lake Store Gen1 | https |\<storage_account>.azuredatalakestore.net/webhdfs/v1 |
70
+
| Azure Data Lake Store Gen2 | https |\<storage_account>.dfs.core.windows.net |
71
+
||||
72
72
73
-
'<storage_path>'
73
+
'\<storage_path>'
74
74
75
75
Specifies a path within your storage that points to the folder or file you want to read. If the path points to a container or folder, all files will be read from that particular container or folder. Files in subfolders won't be included.
76
76
77
77
You can use wildcards to target multiple files or folders. Usage of multiple nonconsecutive wildcards is allowed.
78
-
Below is an example that reads all *csv* files starting with *population* from all folders starting with */csv/population*: 'https://sqlondemandstorage.blob.core.windows.net/csv/population*/population*.csv'
78
+
Below is an example that reads all *csv* files starting with *population* from all folders starting with */csv/population*:
If you specify the unstructured_data_path to be a folder, a SQL on-demand query will retrieve files from that folder.
81
82
82
83
> [!NOTE]
83
84
> Unlike Hadoop and PolyBase, SQL on-demand doesn't return subfolders. Also, unlike Hadoop and PloyBase, SQL on-demand does return files for which the file name begins with an underline (_) or a period (.).
84
85
85
-
In the example below, if the unstructured_data_path='https://mystorageaccount.dfs.core.windows.net/webdata/', a SQL on-demand query will return rows from mydata.txt and _hidden.txt. It won't return mydata2.txt and mydata3.txt because they are located in a subfolder.
86
+
In the example below, if the unstructured_data_path=`https://mystorageaccount.dfs.core.windows.net/webdata/`, a SQL on-demand query will return rows from mydata.txt and _hidden.txt. It won't return mydata2.txt and mydata3.txt because they are located in a subfolder.
86
87
87
88

Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql/develop-storage-files-overview.md
+4-6Lines changed: 4 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,11 +6,10 @@ author: azaricstefan
6
6
ms.service: synapse-analytics
7
7
ms.topic: overview
8
8
ms.subservice:
9
-
ms.date: 04/15/2020
9
+
ms.date: 04/19/2020
10
10
ms.author: v-stazar
11
11
ms.reviewer: jrasnick, carlrab
12
12
---
13
-
14
13
# Query storage files using SQL on-demand (preview) resources within Synapse SQL
15
14
16
15
SQL on-demand (preview) enables you to query data in your data lake. It offers a T-SQL query surface area that accommodates semi-structured and unstructured data queries.
@@ -57,7 +56,7 @@ Refer to [Query folders and multiple files](query-folders-multiple-csv-files.md)
57
56
58
57
To query Parquet source data, use FORMAT = 'PARQUET'
59
58
60
-
```sql
59
+
```syntaxsql
61
60
OPENROWSET
62
61
(
63
62
{ BULK 'data_file' ,
@@ -119,7 +118,6 @@ By omitting the WITH clause from OPENROWSET statement, you can instruct the serv
119
118
```sql
120
119
OPENROWSET(
121
120
BULK N'path_to_file(s)', FORMAT='PARQUET');
122
-
123
121
```
124
122
125
123
### Filename function
@@ -161,7 +159,7 @@ To access nested elements from a nested column, such as Struct, use "dot notatio
161
159
162
160
The syntax fragment example is as follows:
163
161
164
-
```sql
162
+
```syntaxsql
165
163
OPENROWSET
166
164
( BULK 'unstructured_data_path' ,
167
165
FORMAT = 'PARQUET' )
@@ -195,7 +193,7 @@ To access non-scalar elements from a repeated column, use the [JSON_QUERY](/sql/
Copy file name to clipboardExpand all lines: articles/synapse-analytics/sql/develop-tables-statistics.md
+43-20Lines changed: 43 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,12 +7,11 @@ manager: craigg
7
7
ms.service: synapse-analytics
8
8
ms.topic: conceptual
9
9
ms.subservice:
10
-
ms.date: 04/15/2020
10
+
ms.date: 04/19/2020
11
11
ms.author: fipopovi
12
12
ms.reviewer: jrasnick
13
13
ms.custom:
14
14
---
15
-
16
15
# Statistics in Synapse SQL
17
16
18
17
Provided in this article are recommendations and examples for creating and updating query-optimization statistics using the Synapse SQL resources: SQL pool and SQL on-demand (preview).
@@ -158,35 +157,43 @@ To create statistics on a column, provide a name for the statistics object and t
158
157
This syntax uses all of the default options. By default, SQL pool samples **20 percent** of the table when it creates statistics.
159
158
160
159
```sql
161
-
CREATE STATISTICS [statistics_name] ON [schema_name].[table_name]([column_name]);
160
+
CREATE STATISTICS [statistics_name]
161
+
ON [schema_name].[table_name]([column_name]);
162
162
```
163
163
164
164
For example:
165
165
166
166
```sql
167
-
CREATE STATISTICS col1_stats ONdbo.table1 (col1);
167
+
CREATE STATISTICS col1_stats
168
+
ONdbo.table1 (col1);
168
169
```
169
170
170
171
#### Create single-column statistics by examining every row
171
172
172
173
The default sampling rate of 20 percent is sufficient for most situations. However, you can adjust the sampling rate. To sample the full table, use this syntax:
173
174
174
175
```sql
175
-
CREATE STATISTICS [statistics_name] ON [schema_name].[table_name]([column_name]) WITH FULLSCAN;
176
+
CREATE STATISTICS [statistics_name]
177
+
ON [schema_name].[table_name]([column_name])
178
+
WITH FULLSCAN;
176
179
```
177
180
178
181
For example:
179
182
180
183
```sql
181
-
CREATE STATISTICS col1_stats ONdbo.table1 (col1) WITH FULLSCAN;
184
+
CREATE STATISTICS col1_stats
185
+
ONdbo.table1 (col1)
186
+
WITH FULLSCAN;
182
187
```
183
188
184
189
#### Create single-column statistics by specifying the sample size
185
190
186
191
Another option you have is to specify the sample size as a percent:
187
192
188
193
```sql
189
-
CREATE STATISTICS col1_stats ONdbo.table1 (col1) WITH SAMPLE =50 PERCENT;
194
+
CREATE STATISTICS col1_stats
195
+
ONdbo.table1 (col1)
196
+
WITH SAMPLE =50 PERCENT;
190
197
```
191
198
192
199
#### Create single-column statistics on only some of the rows
@@ -198,7 +205,9 @@ For example, you can use filtered statistics when you plan to query a specific p
198
205
This example creates statistics on a range of values. The values can easily be defined to match the range of values in a partition.
199
206
200
207
```sql
201
-
CREATE STATISTICS stats_col1 ON table1(col1) WHERE col1 >'2000101'AND col1 <'20001231';
208
+
CREATE STATISTICS stats_col1
209
+
ON table1(col1)
210
+
WHERE col1 >'2000101'AND col1 <'20001231';
202
211
```
203
212
204
213
> [!NOTE]
@@ -209,7 +218,10 @@ CREATE STATISTICS stats_col1 ON table1(col1) WHERE col1 > '2000101' AND col1 < '
209
218
You can also combine the options together. The following example creates a filtered statistics object with a custom sample size:
210
219
211
220
```sql
212
-
CREATE STATISTICS stats_col1 ON table1 (col1) WHERE col1 >'2000101'AND col1 <'20001231' WITH SAMPLE =50 PERCENT;
221
+
CREATE STATISTICS stats_col1
222
+
ON table1 (col1)
223
+
WHERE col1 >'2000101'AND col1 <'20001231'
224
+
WITH SAMPLE =50 PERCENT;
213
225
```
214
226
215
227
For the full reference, see [CREATE STATISTICS](/sql/t-sql/statements/create-statistics-transact-sql?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json&view=azure-sqldw-latest).
@@ -224,7 +236,10 @@ To create a multi-column statistics object, use the previous examples, but speci
224
236
In this example, the histogram is on *product\_category*. Cross-column statistics are calculated on *product\_category* and *product\_sub_category*:
225
237
226
238
```sql
227
-
CREATE STATISTICS stats_2cols ON table1 (product_category, product_sub_category) WHERE product_category >'2000101'AND product_category <'20001231' WITH SAMPLE =50 PERCENT;
239
+
CREATE STATISTICS stats_2cols
240
+
ON table1 (product_category, product_sub_category)
241
+
WHERE product_category >'2000101'AND product_category <'20001231'
242
+
WITH SAMPLE =50 PERCENT;
228
243
```
229
244
230
245
Because a correlation exists between *product\_category* and *product\_sub\_category*, a multi-column statistics object can be useful if these columns are accessed at the same time.
@@ -258,7 +273,7 @@ The following example will help you get started with your database design. Feel
DBCC SHOW_STATISTICS (dbo.table1, stats_col1) WITH histogram, density_vector
526
+
DBCC SHOW_STATISTICS (dbo.table1, stats_col1)
527
+
WITH histogram, density_vector
511
528
```
512
529
513
530
### DBCC SHOW_STATISTICS() differences
514
531
515
-
DBCC SHOW_STATISTICS() is more strictly implemented in SQL pool compared to SQL Server:
532
+
`DBCC SHOW_STATISTICS()` is more strictly implemented in SQL pool compared to SQL Server:
516
533
517
534
- Undocumented features aren't supported.
518
535
- Can't use Stats_stream.
@@ -599,7 +616,7 @@ Arguments:
599
616
[@stmt = ] N'statement_text' -
600
617
Specifies a Transact-SQL statement that will return column values to be used for statistics. You can use TABLESAMPLE to specify samples of data to be used. If TABLESAMPLE isn't specified, FULLSCAN will be used.
0 commit comments