Skip to content

Commit 94d80bc

Browse files
20250617 Azure SQL Database virtualization
1 parent 7545619 commit 94d80bc

File tree

4 files changed

+28
-114
lines changed

4 files changed

+28
-114
lines changed

azure-sql/database/data-virtualization-overview.md

Lines changed: 10 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ SELECT TOP 10 *
7171
FROM OPENROWSET(
7272
BULK 'abs://[email protected]/curated/covid-19/bing_covid-19_data/latest/bing_covid-19_data.parquet',
7373
FORMAT = 'parquet'
74-
) AS filerows
74+
) AS filerows;
7575
```
7676

7777
You can continue data set exploration by appending `WHERE`, `GROUP BY` and other clauses based on the result set of the first query.
@@ -101,8 +101,8 @@ A shared access signature (SAS) provides delegated access to files in a storage
101101

102102
```sql
103103
-- Create MASTER KEY if it doesn't exist in the database:
104-
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<Some Very Strong Password Here>'
105-
GO
104+
CREATE MASTER KEY
105+
ENCRYPTION BY PASSWORD = '<Some Very Strong Password Here>';
106106
```
107107

108108
1. When a SAS token is generated, it includes a question mark (`?`) at the beginning of the token. To use the token, you must remove the question mark (`?`) when creating a credential. For example:
@@ -111,7 +111,6 @@ A shared access signature (SAS) provides delegated access to files in a storage
111111
CREATE DATABASE SCOPED CREDENTIAL MyCredential
112112
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
113113
SECRET = 'sv=secret string here';
114-
GO
115114
```
116115

117116
### [Managed identity](#tab/managed-identity)
@@ -166,7 +165,7 @@ An external data source is an abstraction that enables easy referencing of a fil
166165
CREATE EXTERNAL DATA SOURCE MyExternalDataSource
167166
WITH (
168167
LOCATION = 'abs://[email protected]/curated/covid-19/bing_covid-19_data/latest'
169-
)
168+
);
170169
```
171170

172171
When accessing nonpublic storage accounts, along with the location, you also need to reference a database scoped credential with encapsulated authentication parameters. The following script creates an external data source pointing to the file path, and referencing a database-scoped credential.
@@ -176,8 +175,8 @@ When accessing nonpublic storage accounts, along with the location, you also nee
176175
CREATE EXTERNAL DATA SOURCE MyPrivateExternalDataSource
177176
WITH (
178177
LOCATION = 'abs://<privatecontainer>@privatestorageaccount.blob.core.windows.net/dataset/'
179-
CREDENTIAL = [MyCredential];
180-
)
178+
CREDENTIAL = [MyCredential]
179+
);
181180
```
182181

183182
## Query data sources using OPENROWSET
@@ -336,7 +335,7 @@ FROM OPENROWSET(
336335
BULK 'yellow/puYear=*/puMonth=*/*.parquet',
337336
DATA_SOURCE = 'NYCTaxiExternalDataSource',
338337
FORMAT = 'parquet'
339-
) AS filerows
338+
) AS filerows;
340339
```
341340
342341
It's also convenient to add columns with the file location data to a view using the `filepath()` function for easier and more performant filtering. Using views can reduce the number of files and the amount of data the query on top of the view needs to read and process when filtered by any of those columns:
@@ -350,7 +349,7 @@ FROM OPENROWSET(
350349
BULK 'yellow/puYear=*/puMonth=*/*.parquet',
351350
DATA_SOURCE = 'NYCTaxiExternalDataSource',
352351
FORMAT = 'parquet'
353-
) AS filerows
352+
) AS filerows;
354353
```
355354
356355
Views also enable reporting and analytic tools like Power BI to consume results of `OPENROWSET`.
@@ -364,8 +363,7 @@ External tables encapsulate access to files making the querying experience almos
364363
CREATE EXTERNAL FILE FORMAT DemoFileFormat
365364
WITH (
366365
FORMAT_TYPE=PARQUET
367-
)
368-
GO
366+
);
369367
370368
--Create external table:
371369
CREATE EXTERNAL TABLE tbl_TaxiRides(
@@ -396,7 +394,6 @@ WITH (
396394
DATA_SOURCE = NYCTaxiExternalDataSource,
397395
FILE_FORMAT = DemoFileFormat
398396
);
399-
GO
400397
```
401398
402399
Once the external table is created, you can query it just like any other table:
@@ -445,49 +442,7 @@ ORDER BY
445442
446443
If your stored data isn't partitioned, consider partitioning it to improve query performance.
447444
448-
If you are using external tables, `filepath()` and `filename()` functions are supported but not in the WHERE clause. You can still filter by `filename` or `filepath` if you use them in computed columns. The following example demonstrates this:
449-
450-
```sql
451-
CREATE EXTERNAL TABLE tbl_TaxiRides (
452-
vendorID VARCHAR(100) COLLATE Latin1_General_BIN2,
453-
tpepPickupDateTime DATETIME2,
454-
tpepDropoffDateTime DATETIME2,
455-
passengerCount INT,
456-
tripDistance FLOAT,
457-
puLocationId VARCHAR(8000),
458-
doLocationId VARCHAR(8000),
459-
startLon FLOAT,
460-
startLat FLOAT,
461-
endLon FLOAT,
462-
endLat FLOAT,
463-
rateCodeId SMALLINT,
464-
storeAndFwdFlag VARCHAR(8000),
465-
paymentType VARCHAR(8000),
466-
fareAmount FLOAT,
467-
extra FLOAT,
468-
mtaTax FLOAT,
469-
improvementSurcharge VARCHAR(8000),
470-
tipAmount FLOAT,
471-
tollsAmount FLOAT,
472-
totalAmount FLOAT,
473-
[Year] AS CAST(filepath(1) AS INT), --use filepath() for partitioning
474-
[Month] AS CAST(filepath(2) AS INT) --use filepath() for partitioning
475-
)
476-
WITH (
477-
LOCATION = 'yellow/puYear=*/puMonth=*/*.parquet',
478-
DATA_SOURCE = NYCTaxiExternalDataSource,
479-
FILE_FORMAT = DemoFileFormat
480-
);
481-
GO
482-
483-
SELECT *
484-
      FROM tbl_TaxiRides
485-
WHERE
486-
      [year]=2017
487-
      AND [month] in (10,11,12);
488-
```
489-
490-
If your stored data isn't partitioned, consider partitioning it to improve query performance.
445+
If you are using external tables, `filepath()` and `filename()` functions are supported but not in the `WHERE` clause.
491446
492447
<!--
493448
### Statistics

docs/t-sql/functions/openrowset-bulk-transact-sql.md

Lines changed: 5 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ OPENROWSET( BULK 'data_file' ,
4444
)
4545
4646
<bulk_options> ::=
47-
[ , DATASOURCE = 'data_source_name' ]
47+
[ , DATA_SOURCE = 'data_source_name' ]
4848
4949
-- bulk_options related to input file format
5050
[ , CODEPAGE = { 'ACP' | 'OEM' | 'RAW' | 'code_page' } ]
@@ -66,51 +66,6 @@ OPENROWSET( BULK 'data_file' ,
6666

6767
## Arguments
6868

69-
#### '*provider_name*'
70-
71-
A character string that represents the friendly name (or `PROGID`) of the data provider as specified in the registry. *provider_name* has no default value. Provider name examples are `Microsoft.Jet.OLEDB.4.0`, `SQLNCLI`, or `MSDASQL`.
72-
73-
#### '*datasource*'
74-
75-
A string constant that corresponds to a particular OLE DB data source. *datasource* is the `DBPROP_INIT_DATASOURCE` property to be passed to the `IDBProperties` interface of the provider to initialize the provider. Typically, this string includes the name of the database file, the name of a database server, or a name that the provider understands for locating the database or databases.
76-
77-
Data source can be file path `C:\SAMPLES\Northwind.mdb'` for `Microsoft.Jet.OLEDB.4.0` provider, or connection string `Server=Seattle1;Trusted_Connection=yes;` for `SQLNCLI` provider.
78-
79-
#### '*user_id*'
80-
81-
A string constant that is the user name passed to the specified data provider. *user_id* specifies the security context for the connection and is passed in as the `DBPROP_AUTH_USERID` property to initialize the provider. *user_id* can't be a Microsoft Windows login name.
82-
83-
#### '*password*'
84-
85-
A string constant that is the user password to be passed to the data provider. *password* is passed in as the `DBPROP_AUTH_PASSWORD` property when initializing the provider. *password* can't be a Microsoft Windows password.
86-
87-
#### '*provider_string*'
88-
89-
A provider-specific connection string that is passed in as the `DBPROP_INIT_PROVIDERSTRING` property to initialize the OLE DB provider. *provider_string* typically encapsulates all the connection information required to initialize the provider. For a list of keywords that the [!INCLUDE [ssNoVersion](../../includes/ssnoversion-md.md)] Native Client OLE DB provider recognizes, see [Initialization and Authorization Properties (Native Client OLE DB Provider)](../../relational-databases/native-client-ole-db-data-source-objects/initialization-and-authorization-properties.md).
90-
91-
<a id="table_or_view"></a>
92-
93-
#### [ catalog. ] [ schema. ] object
94-
95-
Remote table or view containing the data that `OPENROWSET` should read. It can be three-part-name object with the following components:
96-
97-
- *catalog* (optional) - the name of the catalog or database in which the specified object resides.
98-
- *schema* (optional) - the name of the schema or object owner for the specified object.
99-
- *object* - the object name that uniquely identifies the object to work with.
100-
101-
#### '*query*'
102-
103-
A string constant sent to and executed by the provider. The local instance of [!INCLUDE [ssNoVersion](../../includes/ssnoversion-md.md)] doesn't process this query, but processes query results returned by the provider, a pass-through query. Pass-through queries are useful when used on providers that don't make available their tabular data through table names, but only through a command language. Pass-through queries are supported on the remote server, as long as the query provider supports the OLE DB Command object and its mandatory interfaces. For more information, see [SQL Server Native Client (OLE DB) Interfaces](../../relational-databases/native-client-ole-db-interfaces/sql-server-native-client-ole-db-interfaces.md).
104-
105-
```sql
106-
SELECT a.*
107-
FROM OPENROWSET(
108-
'SQLNCLI',
109-
'Server=Seattle1;Trusted_Connection=yes;',
110-
'SELECT TOP 10 GroupName, Name FROM AdventureWorks2022.HumanResources.Department'
111-
) AS a;
112-
```
113-
11469
### BULK arguments
11570

11671
Uses the `BULK` rowset provider for `OPENROWSET` to read data from a file. In [!INCLUDE [ssNoVersion](../../includes/ssnoversion-md.md)], `OPENROWSET` can read from a data file without loading the data into a target table. This lets you use `OPENROWSET` with a basic `SELECT` statement.
@@ -162,6 +117,10 @@ The default for *maximum_errors* is 10.
162117
163118
### BULK data processing options
164119

120+
#### DATA_SOURCE
121+
122+
`DATA_SOURCE` is the external location created with [CREATE EXTERNAL DATA SOURCE](../statements/create-external-data-source-transact-sql?view=azuresqldb-current&preserve-view=true).
123+
165124
#### FIRSTROW = *first_row*
166125

167126
Specifies the number of the first row to load. The default is 1. This indicates the first row in the specified data file. The row numbers are determined by counting the row terminators. `FIRSTROW` is 1-based.

docs/t-sql/functions/openrowset-transact-sql.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -60,13 +60,13 @@ OPENROWSET
6060

6161
#### '*provider_name*'
6262

63-
A character string that represents the friendly name (or `PROGID`) of the data provider as specified in the registry. *provider_name* has no default value. Provider name examples are `Microsoft.Jet.OLEDB.4.0`, `SQLNCLI`, or `MSDASQL`.
63+
A character string that represents the friendly name (or `PROGID`) of the data provider as specified in the registry. *provider_name* has no default value. Provider name examples are `MSOLEDBSQL`, `Microsoft.Jet.OLEDB.4.0`, or `MSDASQL`.
6464

6565
#### '*datasource*'
6666

6767
A string constant that corresponds to a particular data source. *datasource* is the `DBPROP_INIT_DATASOURCE` property to be passed to the `IDBProperties` interface of the provider to initialize the provider. Typically, this string includes the name of the database file, the name of a database server, or a name that the provider understands for locating the database or databases.
6868

69-
Data source can be file path `C:\SAMPLES\Northwind.mdb'` for `Microsoft.Jet.OLEDB.4.0` provider, or connection string `Server=Seattle1;Trusted_Connection=yes;` for `SQLNCLI` provider.
69+
Data source can be file path `C:\SAMPLES\Northwind.mdb'` for `Microsoft.Jet.OLEDB.4.0` provider, or connection string `Server=Seattle1;Trusted_Connection=yes;` for `MSOLEDBSQL` provider.
7070

7171
#### '*user_id*'
7272

@@ -127,7 +127,7 @@ For more information, see [SQL Server Native Client (OLE DB) Interfaces](../../r
127127
```sql
128128
SELECT a.*
129129
FROM OPENROWSET(
130-
'SQLNCLI',
130+
'MSOLEDBSQL',
131131
'Server=Seattle1;Trusted_Connection=yes;',
132132
'SELECT TOP 10 GroupName, Name FROM AdventureWorks2022.HumanResources.Department'
133133
) AS a;
@@ -162,12 +162,12 @@ This section provides general examples to demonstrate how to use OPENROWSET.
162162

163163
### A. Use OPENROWSET with SELECT and the SQL Server Native Client OLE DB Provider
164164

165-
The following example uses the [!INCLUDE [ssNoVersion](../../includes/ssnoversion-md.md)] Native Client OLE DB provider to access the `HumanResources.Department` table in the [!INCLUDE [ssSampleDBobject](../../includes/sssampledbobject-md.md)] database on the remote server `Seattle1`. (Use SQLNCLI and [!INCLUDE [ssNoVersion](../../includes/ssnoversion-md.md)] will redirect to the latest version of [!INCLUDE [ssNoVersion](../../includes/ssnoversion-md.md)] Native Client OLE DB Provider.) A `SELECT` statement is used to define the row set returned. The provider string contains the `Server` and `Trusted_Connection` keywords. These keywords are recognized by the [!INCLUDE [ssNoVersion](../../includes/ssnoversion-md.md)] Native Client OLE DB provider.
165+
The following example uses the [!INCLUDE [ssNoVersion](../../includes/ssnoversion-md.md)] Native Client OLE DB provider to access the `HumanResources.Department` table in the [!INCLUDE [ssSampleDBobject](../../includes/sssampledbobject-md.md)] database on the remote server `Seattle1`. (Use `MSOLEDBSQL` for the modern Microsoft SQL Server OLE DB Data Provider that replaced `SQLNCLI`.) A `SELECT` statement is used to define the row set returned. The provider string contains the `Server` and `Trusted_Connection` keywords. These keywords are recognized by the [!INCLUDE [ssNoVersion](../../includes/ssnoversion-md.md)] Native Client OLE DB provider.
166166

167167
```sql
168168
SELECT a.*
169169
FROM OPENROWSET(
170-
'SQLNCLI', 'Server=Seattle1;Trusted_Connection=yes;',
170+
'MSOLEDBSQL', 'Server=Seattle1;Trusted_Connection=yes;',
171171
'SELECT GroupName, Name, DepartmentID
172172
FROM AdventureWorks2022.HumanResources.Department
173173
ORDER BY GroupName, Name'

docs/t-sql/statements/create-external-data-source-transact-sql.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -555,7 +555,7 @@ WITH (
555555
556556
### E. Create an external data source for bulk operations retrieving data from Azure Storage
557557

558-
**Applies to:** [!INCLUDE [sssql17-md](../../includes/sssql17-md.md)] and later.
558+
**Applies to:** [!INCLUDE [sssql17-md](../../includes/sssql17-md.md)] and later versions.
559559

560560
Use the following data source for bulk operations using [BULK INSERT](bulk-insert-transact-sql.md) or [OPENROWSET](../functions/openrowset-bulk-transact-sql.md). The credential must set `SHARED ACCESS SIGNATURE` as the identity, mustn't have the leading `?` in the SAS token, must have at least read permission on the file that should be loaded (for example `srt=o&sp=r`), and the expiration period should be valid (all dates are in UTC time). For more information on shared access signatures, see [Using Shared Access Signatures (SAS)](/azure/storage/common/storage-sas-overview).
561561

@@ -684,7 +684,7 @@ Additional notes and guidance when setting the location:
684684

685685
<!-- See also docs\t-sql\statements\create-external-data-source-connection-options.md -->
686686

687-
Specified for [!INCLUDE [sssql19-md](../../includes/sssql19-md.md)] and later. Specifies additional options when connecting over `ODBC` to an external data source. To use multiple connection options, separate them by a semi-colon.
687+
Specified for [!INCLUDE [sssql19-md](../../includes/sssql19-md.md)] and later versions. Specifies additional options when connecting over `ODBC` to an external data source. To use multiple connection options, separate them by a semi-colon.
688688

689689
Applies to generic `ODBC` connections, as well as built-in `ODBC` connectors for [!INCLUDE [ssNoVersion](../../includes/ssnoversion-md.md)], Oracle, Teradata, MongoDB, and Azure Cosmos DB API for MongoDB.
690690

@@ -1138,11 +1138,11 @@ WITH (
11381138

11391139
::: moniker-end
11401140

1141-
::: moniker range=">=sql-server-ver16||=sql-server-linux-ver16"
1141+
::: moniker range="=sql-server-ver16||=sql-server-linux-ver16"
11421142

11431143
## Overview: SQL Server 2022
11441144

1145-
**Applies to**: [!INCLUDE [sssql22-md](../../includes/sssql22-md.md)]
1145+
**Applies to**: [!INCLUDE [sssql22-md](../../includes/sssql22-md.md)] and later versions
11461146

11471147
Creates an external data source for PolyBase queries. External data sources are used to establish connectivity and support these primary use cases:
11481148

@@ -1154,7 +1154,7 @@ Creates an external data source for PolyBase queries. External data sources are
11541154
11551155
## <a id="syntax"></a> Syntax for SQL Server 2022
11561156

1157-
## Syntax for SQL Server 2022 and later
1157+
## Syntax for SQL Server 2022 and later versions
11581158

11591159
```syntaxsql
11601160
CREATE EXTERNAL DATA SOURCE <data_source_name>
@@ -1226,7 +1226,7 @@ Additional notes and guidance when setting the location:
12261226

12271227
<!-- See also docs\t-sql\statements\create-external-data-source-connection-options.md -->
12281228

1229-
Specified for [!INCLUDE [sssql19-md](../../includes/sssql19-md.md)] and later. Specifies additional options when connecting over `ODBC` to an external data source. To use multiple connection options, separate them by a semi-colon.
1229+
Specified for [!INCLUDE [sssql19-md](../../includes/sssql19-md.md)] and later versions. Specifies additional options when connecting over `ODBC` to an external data source. To use multiple connection options, separate them by a semi-colon.
12301230

12311231
Applies to generic `ODBC` connections, as well as built-in `ODBC` connectors for [!INCLUDE [ssNoVersion](../../includes/ssnoversion-md.md)], Oracle, Teradata, MongoDB, and Azure Cosmos DB API for MongoDB.
12321232

@@ -1241,7 +1241,7 @@ Starting in [!INCLUDE [sssql22-md](../../includes/sssql22-md.md)] Cumulative Upd
12411241

12421242
#### PUSHDOWN = ON | OFF
12431243

1244-
**Applies to: [!INCLUDE [sssql19-md](../../includes/sssql19-md.md)] and later.** States whether computation can be pushed down to the external data source. It is on by default.
1244+
**Applies to: [!INCLUDE [sssql19-md](../../includes/sssql19-md.md)] and later versions.** States whether computation can be pushed down to the external data source. It is on by default.
12451245

12461246
`PUSHDOWN` is supported when connecting to [!INCLUDE [ssNoVersion](../../includes/ssnoversion-md.md)], Oracle, Teradata, MongoDB, the Azure Cosmos DB API for MongoDB, or ODBC at the external data source level.
12471247

@@ -1678,7 +1678,7 @@ For a more detailed example on how to access delta files stored on Azure Data L
16781678
16791679
### H. Create an external data source for bulk operations retrieving data from Azure Storage
16801680

1681-
**Applies to:** [!INCLUDE [sssql22-md](../../includes/sssql22-md.md)] and later.
1681+
**Applies to:** [!INCLUDE [sssql22-md](../../includes/sssql22-md.md)] and later versions.
16821682

16831683
Use the following data source for bulk operations using [BULK INSERT (Transact-SQL)](bulk-insert-transact-sql.md) or [OPENROWSET (Transact-SQL)](../functions/openrowset-bulk-transact-sql.md). The credential must set `SHARED ACCESS SIGNATURE` as the identity, mustn't have the leading `?` in the SAS token, must have at least read permission on the file that should be loaded (for example `srt=o&sp=r`), and the expiration period should be valid (all dates are in UTC time). For more information on shared access signatures, see [Using Shared Access Signatures (SAS)](/azure/storage/common/storage-sas-overview).
16841684

0 commit comments

Comments
 (0)