Skip to content

Commit ee94601

Browse files
authored
Update analytical-store-introduction.md
1 parent aa5b16f commit ee94601

File tree

1 file changed

+54
-30
lines changed

1 file changed

+54
-30
lines changed

articles/cosmos-db/analytical-store-introduction.md

Lines changed: 54 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -222,35 +222,6 @@ There are two types of schema representation in the analytical store. These type
222222
* Well-defined schema representation, default option for SQL (CORE) API accounts.
223223
* Full fidelity schema representation, default option for Azure Cosmos DB API for MongoDB accounts.
224224

225-
#### Full fidelity schema for SQL API accounts
226-
227-
It's possible to use full fidelity Schema for SQL (Core) API accounts, instead of the default option, by setting the schema type when enabling Synapse Link on a Cosmos DB account for the first time. Here are the considerations about changing the default schema representation type:
228-
229-
* This option is only valid for accounts that **don't** have Synapse Link already enabled.
230-
* It isn't possible to reset the schema representation type, from well-defined to full fidelity or vice-versa.
231-
* Currently Azure Cosmos DB API for MongoDB isn't compatible with this possibility of changing the schema representation. All MongoDB accounts will always have full fidelity schema representation type.
232-
* Currently this change can't be made through the Azure portal. All database accounts that have Synapse Link enabled by the Azure portal will have the default schema representation type, well-defined schema.
233-
234-
The schema representation type decision must be made at the same time that Synapse Link is enabled on the account, using Azure CLI or PowerShell.
235-
236-
With the Azure CLI:
237-
```cli
238-
az cosmosdb create --name MyCosmosDBDatabaseAccount --resource-group MyResourceGroup --subscription MySubscription --analytical-storage-schema-type "FullFidelity" --enable-analytical-storage true
239-
```
240-
241-
> [!NOTE]
242-
> In the command above, replace `create` with `update` for existing accounts.
243-
244-
With the PowerShell:
245-
```
246-
New-AzCosmosDBAccount -ResourceGroupName MyResourceGroup -Name MyCosmosDBDatabaseAccount -EnableAnalyticalStorage true -AnalyticalStorageSchemaType "FullFidelity"
247-
```
248-
249-
> [!NOTE]
250-
> In the command above, replace `New-AzCosmosDBAccount` with `Update-AzCosmosDBAccount` for existing accounts.
251-
252-
253-
254225
#### Well-defined schema representation
255226

256227
The well-defined schema representation creates a simple tabular representation of the schema-agnostic data in the transactional store. The well-defined schema representation has the following considerations:
@@ -325,7 +296,7 @@ salary: 1000000
325296

326297
The leaf property `streetNo` within the nested object `address` will be represented in the analytical store schema as a column `address.object.streetNo.int32`. The datatype is added as a suffix to the column. This way, if another document is added to the transactional store where the value of leaf property `streetNo` is "123" (note it's a string), the schema of the analytical store automatically evolves without altering the type of a previously written column. A new column added to the analytical store as `address.object.streetNo.string` where this value of "123" is stored.
327298

328-
**Data type to suffix map**
299+
##### Data type to suffix map
329300

330301
Here's a map of all the property data types and their suffix representations in the analytical store:
331302

@@ -352,6 +323,59 @@ Here's a map of all the property data types and their suffix representations in
352323
* Spark pools in Azure Synapse will represent these columns as `undefined`.
353324
* SQL serverless pools in Azure Synapse will represent these columns as `NULL`.
354325

326+
##### Working with the MongoDB `_id` field
327+
328+
the MongoDB `_id` field is fundamental to every collection in MongoDB and originally has a hexadecimal representation. As you can see in the table above, `Full Fidelity Schema` will preserve its characteristics, creating a challenge for its vizualiation in Azure Synapse Analytics. For correct visualization, you must convert the `_id` datatype as below:
329+
330+
###### Spark
331+
```scala
332+
import org.apache.spark.sql.types._
333+
val simpleSchema = StructType(Array(
334+
    StructField("_id"StructType(Array(StructField("objectId",BinaryType,true)) ),true),
335+
    StructField("id"StringTypetrue)
336+
  ))
337+
338+
var df = spark.read.format("cosmos.olap").option("spark.synapse.linkedService""CosmosDbMongoDbApi2").option("spark.cosmos.container""HTAP").schema(simpleSchema).load()
339+
340+
df.select("id""_id.objectId").show()
341+
![image](https://user-images.githubusercontent.com/11827523/185008672-e6a98513-2a1d-410b-aeb5-de67ec4e984f.png)
342+
```
343+
###### SQL
344+
345+
```SQL
346+
SELECT TOP 100 id=CAST(_id as VARBINARY(1000))
347+
FROM OPENROWSET('CosmosDB',
348+
                'Account=your-account;Database=your-database;Key=your-key',
349+
                HTAP) WITH (_id VARCHAR(1000)) as HTAP
350+
```
351+
352+
#### Full fidelity schema for SQL API accounts
353+
354+
It's possible to use full fidelity Schema for SQL (Core) API accounts, instead of the default option, by setting the schema type when enabling Synapse Link on a Cosmos DB account for the first time. Here are the considerations about changing the default schema representation type:
355+
356+
* This option is only valid for accounts that **don't** have Synapse Link already enabled.
357+
* It isn't possible to reset the schema representation type, from well-defined to full fidelity or vice-versa.
358+
* Currently Azure Cosmos DB API for MongoDB isn't compatible with this possibility of changing the schema representation. All MongoDB accounts will always have full fidelity schema representation type.
359+
* Currently this change can't be made through the Azure portal. All database accounts that have Synapse Link enabled by the Azure portal will have the default schema representation type, well-defined schema.
360+
361+
The schema representation type decision must be made at the same time that Synapse Link is enabled on the account, using Azure CLI or PowerShell.
362+
363+
With the Azure CLI:
364+
```cli
365+
az cosmosdb create --name MyCosmosDBDatabaseAccount --resource-group MyResourceGroup --subscription MySubscription --analytical-storage-schema-type "FullFidelity" --enable-analytical-storage true
366+
```
367+
368+
> [!NOTE]
369+
> In the command above, replace `create` with `update` for existing accounts.
370+
371+
With the PowerShell:
372+
```
373+
New-AzCosmosDBAccount -ResourceGroupName MyResourceGroup -Name MyCosmosDBDatabaseAccount -EnableAnalyticalStorage true -AnalyticalStorageSchemaType "FullFidelity"
374+
```
375+
376+
> [!NOTE]
377+
> In the command above, replace `New-AzCosmosDBAccount` with `Update-AzCosmosDBAccount` for existing accounts.
378+
>
355379
## <a id="analytical-ttl"></a> Analytical Time-to-Live (TTL)
356380
357381
Analytical TTL (ATTL) indicates how long data should be retained in your analytical store, for a container.

0 commit comments

Comments
 (0)