You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/azure-sql/managed-instance/data-virtualization-overview.md
+9-7Lines changed: 9 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ ms.topic: conceptual
11
11
author: MladjoA
12
12
ms.author: mlandzic
13
13
ms.reviewer: mathoma, MashaMSFT
14
-
ms.date: 03/02/2022
14
+
ms.date: 03/08/2022
15
15
---
16
16
17
17
# Data virtualization with Azure SQL Managed Instance (Preview)
@@ -332,9 +332,13 @@ Just like `OPENROWSET`, external tables allow querying multiple files and folder
332
332
333
333
There's no hard limit in terms of number of files or amount of data that can be queried, but query performance depends on the amount of data, data format, and complexity of queries and joins.
334
334
335
-
Collecting statistics on your external data is one of the most important things you can do for query optimization. The more the instance knows about your data, the faster it can execute queries. Automatic creation of statistics isn't supported, but you can and should create statistics manually.
335
+
Collecting statistics on your external data is one of the most important things you can do for query optimization. The more the instance knows about your data, the faster it can execute queries. The SQL engine query optimizer is a cost-based optimizer. It compares the cost of various query plans, and then chooses the plan with the lowest cost. In most cases, it chooses the plan that will execute the fastest.
336
336
337
-
### OPENROWSET statistics
337
+
### Automatic creation of statistics
338
+
339
+
Managed Instance analyzes incoming user queries for missing statistics. If statistics are missing, the query optimizer automatically creates statistics on individual columns in the query predicate or join condition to improve cardinality estimates for the query plan. Automatic creation of statistics is done synchronously so you may incur slightly degraded query performance if your columns are missing statistics. The time to create statistics for a single column depends on the size of the files targeted.
340
+
341
+
### OPENROWSET manual statistics
338
342
339
343
Single-column statistics for the `OPENROWSET` path can be created using the `sp_create_openrowset_statistics` stored procedure, by passing the select query with a single column as a parameter:
340
344
@@ -349,9 +353,7 @@ FROM OPENROWSET(
349
353
350
354
By default, the instance uses 100% of the data provided in the dataset to create statistics. You can optionally specify the sample size as a percentage using the `TABLESAMPLE` options. To create single-column statistics for multiple columns, execute the stored procedure for each of the columns. You can't create multi-column statistics for the `OPENROWSET` path.
351
355
352
-
To update existing statistics, drop them first using the `sp_drop_openrowset_statistics` stored procedure, and then recreate them using the `sp_create_openrowset_statistics`.
353
-
354
-
To drop existing statistics, use the following example:
356
+
To update existing statistics, drop them first using the `sp_drop_openrowset_statistics` stored procedure, and then recreate them using the `sp_create_openrowset_statistics`:
355
357
356
358
```sql
357
359
EXEC sys.sp_drop_openrowset_statistics N'
@@ -362,7 +364,7 @@ FROM OPENROWSET(
362
364
'
363
365
```
364
366
365
-
### External table statistics
367
+
### External table manual statistics
366
368
367
369
The syntax for creating statistics on external tables resembles the one used for ordinary user tables. To create statistics on a column, provide a name for the statistics object and the name of the column:
0 commit comments