Skip to content

Commit 0cda5f1

Browse files
authored
Merge pull request #208227 from Rodrigossz/main
Update on Datatypes
2 parents d9e3b73 + f0d5633 commit 0cda5f1

File tree

1 file changed

+60
-30
lines changed

1 file changed

+60
-30
lines changed

articles/cosmos-db/analytical-store-introduction.md

Lines changed: 60 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,8 @@ df = spark.read\
211211
* Spark pools in Azure Synapse will represent these columns as `string`.
212212
* SQL serverless pools in Azure Synapse will represent these columns as `varchar(8000)`.
213213

214+
* Properties with `UNIQUEIDENTIFIER (guid)` types are represented as `string` in analytical store and should be converted to `VARCHAR` in **SQL** or to `string` in **Spark** for correct visualization.
215+
214216
* SQL serverless pools in Azure Synapse support result sets with up to 1000 columns, and exposing nested columns also counts towards that limit. Please consider this information when designing your data architecture and modeling your transactional data.
215217

216218
* If you rename a property, in one or many documents, it will be considered a new column. If you execute the same rename in all documents in the collection, all data will be migrated to the new column and the old column will be represented with `NULL` values.
@@ -222,35 +224,6 @@ There are two types of schema representation in the analytical store. These type
222224
* Well-defined schema representation, default option for SQL (CORE) API accounts.
223225
* Full fidelity schema representation, default option for Azure Cosmos DB API for MongoDB accounts.
224226

225-
#### Full fidelity schema for SQL API accounts
226-
227-
It's possible to use full fidelity Schema for SQL (Core) API accounts, instead of the default option, by setting the schema type when enabling Synapse Link on a Cosmos DB account for the first time. Here are the considerations about changing the default schema representation type:
228-
229-
* This option is only valid for accounts that **don't** have Synapse Link already enabled.
230-
* It isn't possible to reset the schema representation type, from well-defined to full fidelity or vice-versa.
231-
* Currently Azure Cosmos DB API for MongoDB isn't compatible with this possibility of changing the schema representation. All MongoDB accounts will always have full fidelity schema representation type.
232-
* Currently this change can't be made through the Azure portal. All database accounts that have Synapse Link enabled by the Azure portal will have the default schema representation type, well-defined schema.
233-
234-
The schema representation type decision must be made at the same time that Synapse Link is enabled on the account, using Azure CLI or PowerShell.
235-
236-
With the Azure CLI:
237-
```cli
238-
az cosmosdb create --name MyCosmosDBDatabaseAccount --resource-group MyResourceGroup --subscription MySubscription --analytical-storage-schema-type "FullFidelity" --enable-analytical-storage true
239-
```
240-
241-
> [!NOTE]
242-
> In the command above, replace `create` with `update` for existing accounts.
243-
244-
With the PowerShell:
245-
```
246-
New-AzCosmosDBAccount -ResourceGroupName MyResourceGroup -Name MyCosmosDBDatabaseAccount -EnableAnalyticalStorage true -AnalyticalStorageSchemaType "FullFidelity"
247-
```
248-
249-
> [!NOTE]
250-
> In the command above, replace `New-AzCosmosDBAccount` with `Update-AzCosmosDBAccount` for existing accounts.
251-
252-
253-
254227
#### Well-defined schema representation
255228

256229
The well-defined schema representation creates a simple tabular representation of the schema-agnostic data in the transactional store. The well-defined schema representation has the following considerations:
@@ -325,7 +298,7 @@ salary: 1000000
325298

326299
The leaf property `streetNo` within the nested object `address` will be represented in the analytical store schema as a column `address.object.streetNo.int32`. The datatype is added as a suffix to the column. This way, if another document is added to the transactional store where the value of leaf property `streetNo` is "123" (note it's a string), the schema of the analytical store automatically evolves without altering the type of a previously written column. A new column added to the analytical store as `address.object.streetNo.string` where this value of "123" is stored.
327300

328-
**Data type to suffix map**
301+
##### Data type to suffix map
329302

330303
Here's a map of all the property data types and their suffix representations in the analytical store:
331304

@@ -352,6 +325,63 @@ Here's a map of all the property data types and their suffix representations in
352325
* Spark pools in Azure Synapse will represent these columns as `undefined`.
353326
* SQL serverless pools in Azure Synapse will represent these columns as `NULL`.
354327

328+
##### Working with the MongoDB `_id` field
329+
330+
the MongoDB `_id` field is fundamental to every collection in MongoDB and originally has a hexadecimal representation. As you can see in the table above, `Full Fidelity Schema` will preserve its characteristics, creating a challenge for its vizualiation in Azure Synapse Analytics. For correct visualization, you must convert the `_id` datatype as below:
331+
332+
###### Spark
333+
334+
```Python
335+
import org.apache.spark.sql.types._
336+
val simpleSchema = StructType(Array(
337+
    StructField("_id", StructType(Array(StructField("objectId",BinaryType,true)) ),true),
338+
    StructField("id", StringType, true)
339+
  ))
340+
341+
df = spark.read.format("cosmos.olap")\
342+
.option("spark.synapse.linkedService", "<enter linked service name>")\
343+
.option("spark.cosmos.container", "<enter container name>")\
344+
.schema(simpleSchema)
345+
.load()
346+
347+
df.select("id""_id.objectId").show()
348+
```
349+
###### SQL
350+
351+
```SQL
352+
SELECT TOP 100 id=CAST(_id as VARBINARY(1000))
353+
FROM OPENROWSET('CosmosDB',
354+
                'Your-account;Database=your-database;Key=your-key',
355+
                HTAP) WITH (_id VARCHAR(1000)) as HTAP
356+
```
357+
358+
#### Full fidelity schema for SQL API accounts
359+
360+
It's possible to use full fidelity Schema for SQL (Core) API accounts, instead of the default option, by setting the schema type when enabling Synapse Link on a Cosmos DB account for the first time. Here are the considerations about changing the default schema representation type:
361+
362+
* This option is only valid for accounts that **don't** have Synapse Link already enabled.
363+
* It isn't possible to reset the schema representation type, from well-defined to full fidelity or vice-versa.
364+
* Currently Azure Cosmos DB API for MongoDB isn't compatible with this possibility of changing the schema representation. All MongoDB accounts will always have full fidelity schema representation type.
365+
* Currently this change can't be made through the Azure portal. All database accounts that have Synapse Link enabled by the Azure portal will have the default schema representation type, well-defined schema.
366+
367+
The schema representation type decision must be made at the same time that Synapse Link is enabled on the account, using Azure CLI or PowerShell.
368+
369+
With the Azure CLI:
370+
```cli
371+
az cosmosdb create --name MyCosmosDBDatabaseAccount --resource-group MyResourceGroup --subscription MySubscription --analytical-storage-schema-type "FullFidelity" --enable-analytical-storage true
372+
```
373+
374+
> [!NOTE]
375+
> In the command above, replace `create` with `update` for existing accounts.
376+
377+
With the PowerShell:
378+
```
379+
New-AzCosmosDBAccount -ResourceGroupName MyResourceGroup -Name MyCosmosDBDatabaseAccount -EnableAnalyticalStorage true -AnalyticalStorageSchemaType "FullFidelity"
380+
```
381+
382+
> [!NOTE]
383+
> In the command above, replace `New-AzCosmosDBAccount` with `Update-AzCosmosDBAccount` for existing accounts.
384+
>
355385
## <a id="analytical-ttl"></a> Analytical Time-to-Live (TTL)
356386
357387
Analytical TTL (ATTL) indicates how long data should be retained in your analytical store, for a container.

0 commit comments

Comments
 (0)