Skip to content

Commit 9082389

Browse files
Update analytical-store-introduction.md
1 parent c2b4298 commit 9082389

File tree

1 file changed

+15
-15
lines changed

1 file changed

+15
-15
lines changed

articles/cosmos-db/analytical-store-introduction.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -43,13 +43,13 @@ When you enable analytical store on an Azure Cosmos DB container, a new column-s
4343

4444
## Column store for analytical workloads on operational data
4545

46-
Analytical workloads typically involve aggregations and sequential scans of selected fields. By storing the data in a column-major order, the analytical store allows a group of values for each field to be serialized together. This format reduces the IOPS required to scan or compute statistics over specific fields. It dramatically improves the query response times for scans over large data sets.
46+
Analytical workloads typically involve aggregations and sequential scans of selected fields. The data analytical store is stored in a column-major order, allowing values of each field to be serialized together, where applicable. This format reduces the IOPS required to scan or compute statistics over specific fields. It dramatically improves the query response times for scans over large data sets.
4747

4848
For example, if your operational tables are in the following format:
4949

5050
:::image type="content" source="./media/analytical-store-introduction/sample-operational-data-table.png" alt-text="Example operational table" border="false":::
5151

52-
The row store persists the above data in a serialized format, per row, on the disk. This format allows for faster transactional reads, writes, and operational queries, such as, "Return information about Product1". However, as the dataset grows large and if you want to run complex analytical queries on the data it can be expensive. For example, if you want to get "the sales trends for a product under the category named 'Equipment' across different business units and months", you need to run a complex query. Large scans on this dataset can get expensive in terms of provisioned throughput and can also impact the performance of the transactional workloads powering your real-time applications and services.
52+
The row store persists the above data in a serialized format, per row, on the disk. This format allows for faster transactional reads, writes, and operational queries, such as, "Return information about Product 1". However, as the dataset grows large and if you want to run complex analytical queries on the data it can be expensive. For example, if you want to get "the sales trends for a product under the category named 'Equipment' across different business units and months", you need to run a complex query. Large scans on this dataset can get expensive in terms of provisioned throughput and can also impact the performance of the transactional workloads powering your real-time applications and services.
5353

5454
Analytical store, which is a column store, is better suited for such queries because it serializes similar fields of data together and reduces the disk IOPS.
5555

@@ -79,7 +79,7 @@ At the end of each execution of the automatic sync process, your transactional d
7979
8080
## Scalability & elasticity
8181

82-
By using horizontal partitioning, Azure Cosmos DB transactional store can elastically scale the storage and throughput without any downtime. Horizontal partitioning in the transactional store provides scalability & elasticity in auto-sync to ensure data is synced to the analytical store in near real time. The data sync happens regardless of the transactional traffic throughput, whether it's 1000 operations/sec or 1 million operations/sec, and it doesn't impact the provisioned throughput in the transactional store.
82+
Azure Cosmos DB transactional store uses horizontal partitioning to elastically scale the storage and throughput without any downtime. Horizontal partitioning in the transactional store provides scalability & elasticity in auto-sync to ensure data is synced to the analytical store in near real time. The data sync happens regardless of the transactional traffic throughput, whether it's 1000 operations/sec or 1 million operations/sec, and it doesn't impact the provisioned throughput in the transactional store.
8383

8484
## <a id="analytical-schema"></a>Automatically handle schema updates
8585

@@ -409,7 +409,7 @@ sql_results.show()
409409

410410
##### Using full fidelity schema with SQL
411411

412-
Considering the same documents of the Spark example above, customers can use the following syntax example:
412+
You can use the following syntax example, with the same documents of the Spark example above,:
413413

414414
```SQL
415415
SELECT rating,timestamp_string,timestamp_utc
@@ -425,7 +425,7 @@ timestamp_utc float '$.timestamp.float64'
425425
WHERE timestamp is not null or timestamp_utc is not null
426426
```
427427
428-
Starting from the query above, customers can implement transformations using `cast`, `convert` or any other T-SQL function to manipulate your data. Customers can also hide complex datatype structures by using views.
428+
You can implement transformations using `cast`, `convert` or any other T-SQL function to manipulate your data. You can also hide complex datatype structures by using views.
429429
430430
```SQL
431431
create view MyView as
@@ -453,11 +453,11 @@ WHERE timestamp_string is not null
453453
```
454454
455455
456-
##### Working with the MongoDB `_id` field
456+
##### Working with MongoDB `_id` field
457457
458-
the MongoDB `_id` field is fundamental to every collection in MongoDB and originally has a hexadecimal representation. As you can see in the table above, full fidelity schema will preserve its characteristics, creating a challenge for its visualization in Azure Synapse Analytics. For correct visualization, you must convert the `_id` datatype as below:
458+
MongoDB `_id` field is fundamental to every collection in MongoDB and originally has a hexadecimal representation. As you can see in the table above, full fidelity schema will preserve its characteristics, creating a challenge for its visualization in Azure Synapse Analytics. For correct visualization, you must convert the `_id` datatype as below:
459459
460-
###### Working with the MongoDB `_id` field in Spark
460+
###### Working with MongoDB `_id` field in Spark
461461
462462
The example below works on Spark 2.x and 3.x versions:
463463
@@ -478,7 +478,7 @@ val dfConverted = df.withColumn("objectId", col("_id.objectId")).withColumn("con
478478
display(dfConverted)
479479
```
480480
481-
###### Working with the MongoDB `_id` field in SQL
481+
###### Working with MongoDB `_id` field in SQL
482482
483483
```SQL
484484
SELECT TOP 100 id=CAST(_id as VARBINARY(1000))
@@ -494,7 +494,7 @@ It's possible to use full fidelity Schema for API for NoSQL accounts, instead of
494494
* Currently, if you enable Synapse Link in your NoSQL API account using the Azure portal, it will be enabled as well-defined schema.
495495
* Currently, if you want to use full fidelity schema with NoSQL or Gremlin API accounts, you have to set it at account level in the same CLI or PowerShell command that will enable Synapse Link at account level.
496496
* Currently Azure Cosmos DB for MongoDB isn't compatible with this possibility of changing the schema representation. All MongoDB accounts have full fidelity schema representation type.
497-
* Full Fidelity schema data types map mentioned above isn't valid for NoSQL API accounts, that use JSON datatypes. As an example, `float` and `integer` values are represented as `num` in analytical store.
497+
* Full Fidelity schema data types map mentioned above isn't valid for NoSQL API accounts that use JSON datatypes. As an example, `float` and `integer` values are represented as `num` in analytical store.
498498
* It's not possible to reset the schema representation type, from well-defined to full fidelity or vice-versa.
499499
* Currently, containers schemas in analytical store are defined when the container is created, even if Synapse Link has not been enabled in the database account.
500500
* Containers or graphs created before Synapse Link was enabled with full fidelity schema at account level will have well-defined schema.
@@ -556,7 +556,7 @@ Data tiering refers to the separation of data between storage infrastructures op
556556
After the analytical store is enabled, based on the data retention needs of the transactional workloads, you can configure `transactional TTL` property to have records automatically deleted from the transactional store after a certain time period. Similarly, the `analytical TTL` allows you to manage the lifecycle of data retained in the analytical store, independent from the transactional store. By enabling analytical store and configuring transactional and analytical `TTL` properties, you can seamlessly tier and define the data retention period for the two stores.
557557
558558
> [!NOTE]
559-
> When `analytical TTL` is bigger than `transactional TTL`, your container will have data that only exists in analytical store. This data is read only and currently we don't support document level `TTL` in analytical store. If your container data may need an update or a delete at some point in time in the future, don't use `analytical TTL` bigger than `transactional TTL`. This capability is recommended for data that won't need updates or deletes in the future.
559+
> When `analytical TTL` is set to a value larger than `transactional TTL` value, your container will have data that only exists in analytical store. This data is read only and currently we don't support document level `TTL` in analytical store. If your container data may need an update or a delete at some point in time in the future, don't use `analytical TTL` bigger than `transactional TTL`. This capability is recommended for data that won't need updates or deletes in the future.
560560

561561
> [!NOTE]
562562
> If your scenario doesn't demand physical deletes, you can adopt a logical delete/update approach. Insert in transactional store another version of the same document that only exists in analytical store but needs a logical delete/update. Maybe with a flag indicating that it's a delete or an update of an expired document. Both versions of the same document will co-exist in analytical store, and your application should only consider the last one.
@@ -567,9 +567,9 @@ After the analytical store is enabled, based on the data retention needs of the
567567
Analytical store relies on Azure Storage and offers the following protection against physical failure:
568568

569569
* By default, Azure Cosmos DB database accounts allocate analytical store in Locally Redundant Storage (LRS) accounts. LRS provides at least 99.999999999% (11 nines) durability of objects over a given year.
570-
* If any geo-region of the database account is configured for zone-redundancy, it is allocated in Zone-redundant Storage (ZRS) accounts. Customers need to enable Availability Zones on a region of their Azure Cosmos DB database account to have analytical data of that region stored in Zone-redundant Storage. ZRS offers durability for storage resources of at least 99.9999999999% (12 9's) over a given year.
570+
* If any geo-region of the database account is configured for zone-redundancy, it is allocated in Zone-redundant Storage (ZRS) accounts. You need to enable Availability Zones on a region of their Azure Cosmos DB database account to have analytical data of that region stored in Zone-redundant Storage. ZRS offers durability for storage resources of at least 99.9999999999% (12 9's) over a given year.
571571
572-
For more information about Azure Storage durability, click [here](/azure/storage/common/storage-redundancy).
572+
For more information about Azure Storage durability, see [this link.](/azure/storage/common/storage-redundancy)
573573
574574
## Backup
575575
@@ -582,7 +582,7 @@ Synapse Link, and analytical store by consequence, has different compatibility l
582582

583583
* Periodic backup mode is fully compatible with Synapse Link and these 2 features can be used in the same database account.
584584
* Synapse Link for database accounts using continuous backup mode is GA.
585-
* Continuous backup mode for Synapse Link enabled accounts is in public preview. Currently, customers that disabled Synapse Link from containers can't migrate to continuous backup.
585+
* Continuous backup mode for Synapse Link enabled accounts is in public preview. Currently, you can't migrate to continuous backup if you disabled Synapse Link on any of your collections in a Cosmos DB account.
586586
587587
### Backup policies
588588
@@ -645,7 +645,7 @@ Analytical store partitioning is completely independent of partitioning in
645645

646646
The analytical store is optimized to provide scalability, elasticity, and performance for analytical workloads without any dependency on the compute run-times. The storage technology is self-managed to optimize your analytics workloads without manual efforts.
647647

648-
By decoupling the analytical storage system from the analytical compute system, data in Azure Cosmos DB analytical store can be queried simultaneously from the different analytics runtimes supported by Azure Synapse Analytics. As of today, Azure Synapse Analytics supports Apache Spark and serverless SQL pool with Azure Cosmos DB analytical store.
648+
Data in Azure Cosmos DB analytical store can be queried simultaneously from the different analytics runtimes supported by Azure Synapse Analytics. Azure Synapse Analytics supports Apache Spark and serverless SQL pool with Azure Cosmos DB analytical store.
649649

650650
> [!NOTE]
651651
> You can only read from analytical store using Azure Synapse Analytics runtimes. And the opposite is also true, Azure Synapse Analytics runtimes can only read from analytical store. Only the auto-sync process can change data in analytical store. You can write data back to Azure Cosmos DB transactional store using Azure Synapse Analytics Spark pool, using the built-in Azure Cosmos DB OLTP SDK.

0 commit comments

Comments
 (0)