You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cosmos-db/analytical-store-introduction.md
+60-30Lines changed: 60 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -211,6 +211,8 @@ df = spark.read\
211
211
* Spark pools in Azure Synapse will represent these columns as `string`.
212
212
* SQL serverless pools in Azure Synapse will represent these columns as `varchar(8000)`.
213
213
214
+
* Properties with `UNIQUEIDENTIFIER (guid)` types are represented as `string` in analytical store and should be converted to `VARCHAR` in **SQL** or to `string` in **Spark** for correct visualization.
215
+
214
216
* SQL serverless pools in Azure Synapse support result sets with up to 1000 columns, and exposing nested columns also counts towards that limit. Please consider this information when designing your data architecture and modeling your transactional data.
215
217
216
218
* If you rename a property, in one or many documents, it will be considered a new column. If you execute the same rename in all documents in the collection, all data will be migrated to the new column and the old column will be represented with `NULL` values.
@@ -222,35 +224,6 @@ There are two types of schema representation in the analytical store. These type
222
224
* Well-defined schema representation, default option for SQL (CORE) API accounts.
223
225
* Full fidelity schema representation, default option for Azure Cosmos DB API for MongoDB accounts.
224
226
225
-
#### Full fidelity schema for SQL API accounts
226
-
227
-
It's possible to use full fidelity Schema for SQL (Core) API accounts, instead of the default option, by setting the schema type when enabling Synapse Link on a Cosmos DB account for the first time. Here are the considerations about changing the default schema representation type:
228
-
229
-
* This option is only valid for accounts that **don't** have Synapse Link already enabled.
230
-
* It isn't possible to reset the schema representation type, from well-defined to full fidelity or vice-versa.
231
-
* Currently Azure Cosmos DB API for MongoDB isn't compatible with this possibility of changing the schema representation. All MongoDB accounts will always have full fidelity schema representation type.
232
-
* Currently this change can't be made through the Azure portal. All database accounts that have Synapse Link enabled by the Azure portal will have the default schema representation type, well-defined schema.
233
-
234
-
The schema representation type decision must be made at the same time that Synapse Link is enabled on the account, using Azure CLI or PowerShell.
> In the command above, replace `New-AzCosmosDBAccount` with `Update-AzCosmosDBAccount` for existing accounts.
251
-
252
-
253
-
254
227
#### Well-defined schema representation
255
228
256
229
The well-defined schema representation creates a simple tabular representation of the schema-agnostic data in the transactional store. The well-defined schema representation has the following considerations:
@@ -325,7 +298,7 @@ salary: 1000000
325
298
326
299
The leaf property `streetNo` within the nested object `address` will be represented in the analytical store schema as a column `address.object.streetNo.int32`. The datatype is added as a suffix to the column. This way, if another document is added to the transactional store where the value of leaf property `streetNo` is "123" (note it's a string), the schema of the analytical store automatically evolves without altering the type of a previously written column. A new column added to the analytical store as `address.object.streetNo.string` where this value of "123" is stored.
327
300
328
-
**Data type to suffix map**
301
+
##### Data type to suffix map
329
302
330
303
Here's a map of all the property data types and their suffix representations in the analytical store:
331
304
@@ -352,6 +325,63 @@ Here's a map of all the property data types and their suffix representations in
352
325
* Spark pools in Azure Synapse will represent these columns as `undefined`.
353
326
* SQL serverless pools in Azure Synapse will represent these columns as `NULL`.
354
327
328
+
##### Working with the MongoDB `_id` field
329
+
330
+
the MongoDB `_id` field is fundamental to every collection in MongoDB and originally has a hexadecimal representation. As you can see in the table above, `Full Fidelity Schema` will preserve its characteristics, creating a challenge for its vizualiation in Azure Synapse Analytics. For correct visualization, you must convert the `_id` datatype as below:
It's possible to use full fidelity Schema for SQL (Core) API accounts, instead of the default option, by setting the schema type when enabling Synapse Link on a Cosmos DB account for the first time. Here are the considerations about changing the default schema representation type:
361
+
362
+
* This option is only valid for accounts that **don't** have Synapse Link already enabled.
363
+
* It isn't possible to reset the schema representation type, from well-defined to full fidelity or vice-versa.
364
+
* Currently Azure Cosmos DB API for MongoDB isn't compatible with this possibility of changing the schema representation. All MongoDB accounts will always have full fidelity schema representation type.
365
+
* Currently this change can't be made through the Azure portal. All database accounts that have Synapse Link enabled by the Azure portal will have the default schema representation type, well-defined schema.
366
+
367
+
The schema representation type decision must be made at the same time that Synapse Link is enabled on the account, using Azure CLI or PowerShell.
0 commit comments