Skip to content

Commit 2b12d3e

Browse files
Merge pull request #247677 from ilijazagorac/main
Update on autostatistics and CSV parser version 1.0
2 parents a1363ab + 4fbb193 commit 2b12d3e

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

articles/synapse-analytics/sql/develop-tables-statistics.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -572,13 +572,13 @@ Serverless SQL pool analyzes incoming user queries for missing statistics. If st
572572
The SELECT statement will trigger automatic creation of statistics.
573573

574574
> [!NOTE]
575-
> Automatic creation of statistics is turned on for Parquet files. For CSV files, statistics will be automatically created if you use OPENROWSET. You need to create statistics manually you use CSV external tables.
575+
> For automatic creation of statistics sampling is used and in most cases sampling percentage will be less than 100%. This flow is the same for every file format. Have in mind that when reading CSV with parser version 1.0 sampling is not supported and automatic creation of statistics will not happen with sampling percentage less than 100%. For small tables with estimated low cardinality (number of rows) automatic statistics creation will be triggered with sampling percentage of 100%. That basically means that fullscan is triggered and automatic statistics are created even for CSV with parser version 1.0.
576576
577577
Automatic creation of statistics is done synchronously so you may incur slightly degraded query performance if your columns are missing statistics. The time to create statistics for a single column depends on the size of the files targeted.
578578

579579
### Manual creation of statistics
580580

581-
Serverless SQL pool lets you create statistics manually. For CSV external tables, you have to create statistics manually because automatic creation of statistics isn't turned on for CSV external tables.
581+
Serverless SQL pool lets you create statistics manually. In case you are using parser version 1.0 with CSV, you will probably have to create statistics manually, because this parser version does not support sampling. Automatic creation of statistics in case of parser version 1.0 will not happen, unless the sampling percent is 100%.
582582

583583
See the following examples for instructions on how to manually create statistics.
584584

@@ -593,7 +593,7 @@ When statistics are stale, new ones will be created. The algorithm goes through
593593
Manual stats are never declared stale.
594594

595595
> [!NOTE]
596-
> Automatic recreation of statistics is turned on for Parquet files. For CSV files, statistics will be recreated if you use OPENROWSET. You need to drop and create statistics manually for CSV external tables. Check the examples below on how to drop and create statistics.
596+
> For automatic recreation of statistics sampling is used and in most cases sampling percentage will be less than 100%. This flow is the same for every file format. Have in mind that when reading CSV with parser version 1.0 sampling is not supported and automatic recreation of statistics will not happen with sampling percentage less than 100%. In that case you need to drop and recreate statistics manually. Check the examples below on how to drop and create statistics. For small tables with estimated low cardinality (number of rows) automatic statistics recreation will be triggered with sampling percentage of 100%. That basically means that fullscan is triggered and automatic statistics are created even for CSV with parser version 1.0.
597597
598598
One of the first questions to ask when you're troubleshooting a query is, **"Are the statistics up to date?"**
599599

@@ -639,7 +639,7 @@ Specifies a Transact-SQL statement that will return column values to be used for
639639
```
640640

641641
> [!NOTE]
642-
> CSV sampling does not work at this time, only FULLSCAN is supported for CSV.
642+
> CSV sampling does not work if you are using parser version 1.0, only FULLSCAN is supported for CSV with parser version 1.0.
643643
644644
#### Create single-column statistics by examining every row
645645

@@ -767,7 +767,7 @@ Specifies the approximate percentage or number of rows in the table or indexed v
767767
SAMPLE can't be used with the FULLSCAN option.
768768

769769
> [!NOTE]
770-
> CSV sampling does not work at this time, only FULLSCAN is supported for CSV.
770+
> CSV sampling does not work if you are using parser version 1.0, only FULLSCAN is supported for CSV with parser version 1.0.
771771
772772
#### Create single-column statistics by examining every row
773773

0 commit comments

Comments
 (0)