Skip to content

Commit 08a6a66

Browse files
authored
Merge pull request #190928 from MladjoA/patch-3
Update data-virtualization-overview.md
2 parents 3813739 + 398c436 commit 08a6a66

File tree

1 file changed

+9
-7
lines changed

1 file changed

+9
-7
lines changed

articles/azure-sql/managed-instance/data-virtualization-overview.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ ms.topic: conceptual
1111
author: MladjoA
1212
ms.author: mlandzic
1313
ms.reviewer: mathoma, MashaMSFT
14-
ms.date: 03/02/2022
14+
ms.date: 03/08/2022
1515
---
1616

1717
# Data virtualization with Azure SQL Managed Instance (Preview)
@@ -332,9 +332,13 @@ Just like `OPENROWSET`, external tables allow querying multiple files and folder
332332

333333
There's no hard limit in terms of number of files or amount of data that can be queried, but query performance depends on the amount of data, data format, and complexity of queries and joins.
334334

335-
Collecting statistics on your external data is one of the most important things you can do for query optimization. The more the instance knows about your data, the faster it can execute queries. Automatic creation of statistics isn't supported, but you can and should create statistics manually.
335+
Collecting statistics on your external data is one of the most important things you can do for query optimization. The more the instance knows about your data, the faster it can execute queries. The SQL engine query optimizer is a cost-based optimizer. It compares the cost of various query plans, and then chooses the plan with the lowest cost. In most cases, it chooses the plan that will execute the fastest.
336336

337-
### OPENROWSET statistics
337+
### Automatic creation of statistics
338+
339+
Managed Instance analyzes incoming user queries for missing statistics. If statistics are missing, the query optimizer automatically creates statistics on individual columns in the query predicate or join condition to improve cardinality estimates for the query plan. Automatic creation of statistics is done synchronously so you may incur slightly degraded query performance if your columns are missing statistics. The time to create statistics for a single column depends on the size of the files targeted.
340+
341+
### OPENROWSET manual statistics
338342

339343
Single-column statistics for the `OPENROWSET` path can be created using the `sp_create_openrowset_statistics` stored procedure, by passing the select query with a single column as a parameter:
340344

@@ -349,9 +353,7 @@ FROM OPENROWSET(
349353

350354
By default, the instance uses 100% of the data provided in the dataset to create statistics. You can optionally specify the sample size as a percentage using the `TABLESAMPLE` options. To create single-column statistics for multiple columns, execute the stored procedure for each of the columns. You can't create multi-column statistics for the `OPENROWSET` path.
351355

352-
To update existing statistics, drop them first using the `sp_drop_openrowset_statistics` stored procedure, and then recreate them using the `sp_create_openrowset_statistics`.
353-
354-
To drop existing statistics, use the following example:
356+
To update existing statistics, drop them first using the `sp_drop_openrowset_statistics` stored procedure, and then recreate them using the `sp_create_openrowset_statistics`:
355357

356358
```sql
357359
EXEC sys.sp_drop_openrowset_statistics N'
@@ -362,7 +364,7 @@ FROM OPENROWSET(
362364
'
363365
```
364366

365-
### External table statistics
367+
### External table manual statistics
366368

367369
The syntax for creating statistics on external tables resembles the one used for ordinary user tables. To create statistics on a column, provide a name for the statistics object and the name of the column:
368370

0 commit comments

Comments
 (0)