You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cosmos-db/nosql/how-to-model-partition-example.md
+37-34Lines changed: 37 additions & 34 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,10 @@ This article builds on several Azure Cosmos DB concepts like [data modeling](../
20
20
21
21
If you usually work with relational databases, you have probably built habits and intuitions on how to design a data model. Because of the specific constraints, but also the unique strengths of Azure Cosmos DB, most of these best practices don't translate well and may drag you into suboptimal solutions. The goal of this article is to guide you through the complete process of modeling a real-world use-case on Azure Cosmos DB, from item modeling to entity colocation and container partitioning.
22
22
23
-
[Download or view a community-generated source code](https://github.com/jwidmer/AzureCosmosDbBlogExample) that illustrates the concepts from this article. This code sample was contributed by a community contributor and Azure Cosmos DB team doesn't support its maintenance.
23
+
[Download or view a community-generated source code](https://github.com/jwidmer/AzureCosmosDbBlogExample) that illustrates the concepts from this article.
24
+
25
+
> [!IMPORTANT]
26
+
> A community contributor contributed this code sample and the Azure Cosmos DB team doesn't support its maintenance.
24
27
25
28
## The scenario
26
29
@@ -41,9 +44,9 @@ Adding more requirements to our specification:
41
44
42
45
To start, we give some structure to our initial specification by identifying our solution's access patterns. When designing a data model for Azure Cosmos DB, it's important to understand which requests our model has to serve to make sure that the model serves those requests efficiently.
43
46
44
-
To make the overall process easier to follow, we categorize those different requests as either commands or queries, borrowing some vocabulary from [CQRS](https://en.wikipedia.org/wiki/Command%E2%80%93query_separation#Command_query_responsibility_segregation) where commands are write requests (that is, intents to update the system) and queries are read-only requests.
47
+
To make the overall process easier to follow, we categorize those different requests as either commands or queries, borrowing some vocabulary from [CQRS](https://en.wikipedia.org/wiki/Command%E2%80%93query_separation#Command_query_responsibility_segregation). In CQRTS, commands are write requests (that is, intents to update the system) and queries are read-only requests.
45
48
46
-
Here's the list of requests that our platform have to expose:
49
+
Here's the list of requests that our platform exposes:
47
50
48
51
-**[C1]** Create/edit a user
49
52
-**[Q1]** Retrieve a user
@@ -56,7 +59,7 @@ Here's the list of requests that our platform have to expose:
56
59
-**[Q5]** List a post's likes
57
60
-**[Q6]** List the *x* most recent posts created in short form (feed)
58
61
59
-
At this stage, we haven't thought about the details of what each entity (user, post etc.) will contain. This step is usually among the first ones to be tackled when designing against a relational store, because we have to figure out how those entities translate in terms of tables, columns, foreign keys etc. It's much less of a concern with a document database that doesn't enforce any schema at write.
62
+
At this stage, we haven't thought about the details of what each entity (user, post etc.) contains. This step is usually among the first ones to be tackled when designing against a relational store. We start with this step first because we have to figure out how those entities translate in terms of tables, columns, foreign keys etc. It's much less of a concern with a document database that doesn't enforce any schema at write.
60
63
61
64
The main reason why it's important to identify our access patterns from the beginning, is because this list of requests is going to be our test suite. Every time we iterate over our data model, we go through each of the requests and check its performance and scalability. We calculate the request units consumed in each model and optimize them. All these models use the default indexing policy and you can override it by indexing specific properties, which can further improve the RU consumption and latency.
62
65
@@ -75,7 +78,7 @@ This container only stores user items:
75
78
}
76
79
```
77
80
78
-
We partition this container by `id`, which means that each logical partition within that container will only contain one item.
81
+
We partition this container by `id`, which means that each logical partition within that container only contains one item.
79
82
80
83
### Posts container
81
84
@@ -127,7 +130,7 @@ It's now time to assess the performance and scalability of our first version. Fo
127
130
128
131
### [C1] Create/edit a user
129
132
130
-
This request is straightforward to implement as we just create or update an item in the `users` container. The requests will nicely spread across all partitions thanks to the `id` partition key.
133
+
This request is straightforward to implement as we just create or update an item in the `users` container. The requests nicely spread across all partitions thanks to the `id` partition key.
131
134
132
135
:::image type="content" source="./media/how-to-model-partition-example/V1-C1.png" alt-text="Diagram of writing a single item to the users' container." border="false":::
133
136
@@ -157,7 +160,7 @@ Similarly to **[C1]**, we just have to write to the `posts` container.
157
160
158
161
### [Q2] Retrieve a post
159
162
160
-
We start by retrieving the corresponding document from the `posts` container. But that's not enough, as per our specification we also have to aggregate the username of the post's author and the counts of how many comments and how many like entities this post has, which requires 3 more SQL queries to be issued.
163
+
We start by retrieving the corresponding document from the `posts` container. But that's not enough, as per our specification we also have to aggregate the username of the post's author, counts of comments, and counts of likes for the post. The aggregations listed require 3 more SQL queries to be issued.
161
164
162
165
:::image type="content" source="./media/how-to-model-partition-example/V1-Q2.png" alt-text="Diagram of retrieving a post and aggregating additional data." border="false":::
163
166
@@ -247,7 +250,7 @@ Let's resolve each of those problems, starting with the first one.
247
250
248
251
## V2: Introducing denormalization to optimize read queries
249
252
250
-
The reason why we have to issue more requests in some cases is because the results of the initial request don't contain all the data we need to return. When working with a non-relational data store like Azure Cosmos DB, this kind of issue is commonly solved by denormalizing data across our data set.
253
+
The reason why we have to issue more requests in some cases is because the results of the initial request don't contain all the data we need to return. Denormalizing data solves this kind of issue across our data set when working with a non-relational data store like Azure Cosmos DB.
251
254
252
255
In our example, we modify post items to add the username of the post's author, the count of comments and the count of likes:
253
256
@@ -291,9 +294,9 @@ We also modify comment and like items to add the username of the user who has cr
291
294
292
295
### Denormalizing comment and like counts
293
296
294
-
What we want to achieve is that every time we add a comment or a like, we also increment the `commentCount` or the `likeCount` in the corresponding post. As our `posts` container is partitioned by `postId`, the new item (comment or like) and its corresponding post sit in the same logical partition. As a result, we can use a [stored procedure](stored-procedures-triggers-udfs.md) to perform that operation.
297
+
What we want to achieve is that every time we add a comment or a like, we also increment the `commentCount` or the `likeCount` in the corresponding post. As `postId` partitions our `posts` container, the new item (comment or like), and its corresponding post sit in the same logical partition. As a result, we can use a [stored procedure](stored-procedures-triggers-udfs.md) to perform that operation.
295
298
296
-
Now when creating a comment (**[C3]**), instead of just adding a new item in the `posts` container we call the following stored procedure on that container:
299
+
When you create a comment (**[C3]**), instead of just adding a new item in the `posts` container we call the following stored procedure on that container:
297
300
298
301
```javascript
299
302
functioncreateComment(postId, comment) {
@@ -329,7 +332,7 @@ This stored procedure takes the ID of the post and the body of the new comment a
329
332
- replaces the post
330
333
- adds the new comment
331
334
332
-
As stored procedures are executed as atomic transactions, the value of `commentCount` and the actual number of comments will always stay in sync.
335
+
As stored procedures are executed as atomic transactions, the value of `commentCount` and the actual number of comments always stays in sync.
333
336
334
337
We obviously call a similar stored procedure when adding new likes to increment the `likeCount`.
335
338
@@ -409,7 +412,7 @@ Exact same situation when listing the likes.
409
412
410
413
## V3: Making sure all requests are scalable
411
414
412
-
Looking at our overall performance improvements, there are still two requests that we haven't fully optimized:**[Q3]** and **[Q6]**. They're the requests involving queries that don't filter on the partition key of the containers they target.
415
+
There are still two requests that we haven't fully optimized when looking at our overall performance improvements. These requests are**[Q3]** and **[Q6]**. They're the requests involving queries that don't filter on the partition key of the containers they target.
413
416
414
417
### [Q3] List a user's posts in short form
415
418
@@ -421,9 +424,9 @@ But the remaining query is still not filtering on the partition key of the `post
421
424
422
425
The way to think about this situation is simple:
423
426
424
-
1. This request *has* to filter on the `userId` because we want to fetch all posts for a particular user
425
-
1. It doesn't perform well because it's executed against the `posts` container, which isn't partitioned by `userId`
426
-
1. Stating the obvious, we would solve our performance problem by executing this request against a container that *is*partitioned by`userId`
427
+
1. This request *has* to filter on the `userId` because we want to fetch all posts for a particular user.
428
+
1. It doesn't perform well because it's executed against the `posts` container, which doesn't have `userId` partitioning it.
429
+
1. Stating the obvious, we would solve our performance problem by executing this request against a container partitioned with`userId`.
427
430
1. It turns out that we already have such a container: the `users` container!
428
431
429
432
So we introduce a second level of denormalization by duplicating entire posts to the `users` container. By doing that, we effectively get a copy of our posts, only partitioned along a different dimension, making them way more efficient to retrieve by their `userId`.
@@ -452,10 +455,10 @@ The `users` container now contains two kinds of items:
452
455
}
453
456
```
454
457
455
-
Note that:
458
+
In this example:
456
459
457
-
-we've introduced a `type` field in the user item to distinguish users from posts,
458
-
-we've also added a `userId` field in the user item, which is redundant with the `id` field but is required as the `users` container is now partitioned by`userId` (and not `id` as previously)
460
+
-We've introduced a `type` field in the user item to distinguish users from posts,
461
+
-We've also added a `userId` field in the user item, which is redundant with the `id` field but is required as the `users` container is now partitioned with`userId` (and not `id` as previously)
459
462
460
463
To achieve that denormalization, we once again use the change feed. This time, we react on the change feed of the `posts` container to dispatch any new or updated post to the `users` container. And because listing posts doesn't require to return their full content, we can truncate them in the process.
461
464
@@ -475,7 +478,7 @@ We have to deal with a similar situation here: even after sparing the more queri
475
478
476
479
:::image type="content" source="./media/how-to-model-partition-example/V2-Q6.png" alt-text="Diagram that shows the query to list the x most recent posts created in short form." border="false":::
477
480
478
-
Following the same approach, maximizing this request's performance and scalability requires that it only hits one partition. This is conceivable because we only have to return a limited number of items; in order to populate our blogging platform's home page, we just need to get the 100 most recent posts, without the need to paginate through the entire data set.
481
+
Following the same approach, maximizing this request's performance and scalability requires that it only hits one partition. Only hitting a single partition is conceivable because we only have to return a limited number of items. In order to populate our blogging platform's home page, we just need to get the 100 most recent posts, without the need to paginate through the entire data set.
479
482
480
483
So to optimize this last request, we introduce a third container to our design, entirely dedicated to serving this request. We denormalize our posts to that new `feed` container:
481
484
@@ -494,9 +497,9 @@ So to optimize this last request, we introduce a third container to our design,
494
497
}
495
498
```
496
499
497
-
This container is partitioned by `type`, which will always be`post` in our items. Doing that ensures that all the items in this container will sit in the same partition.
500
+
The `type` field partitions this container, which is always `post` in our items. Doing that ensures that all the items in this container will sit in the same partition.
498
501
499
-
To achieve the denormalization, we just have to hook on the change feed pipeline we have previously introduced to dispatch the posts to that new container. One important thing to bear in mind is that we need to make sure that we only store the 100 most recent posts; otherwise, the content of the container may grow beyond the maximum size of a partition. This is done by calling a [post-trigger](stored-procedures-triggers-udfs.md#triggers) every time a document is added in the container:
502
+
To achieve the denormalization, we just have to hook on the change feed pipeline we have previously introduced to dispatch the posts to that new container. One important thing to bear in mind is that we need to make sure that we only store the 100 most recent posts; otherwise, the content of the container may grow beyond the maximum size of a partition. This limitation can be implemented by calling a [post-trigger](stored-procedures-triggers-udfs.md#triggers) every time a document is added in the container:
500
503
501
504
:::image type="content" source="./media/how-to-model-partition-example/denormalization-3.png" alt-text="Diagram of denormalizing posts into the feed container." border="false":::
502
505
@@ -561,30 +564,30 @@ Let's have a look at the overall performance and scalability improvements we've
561
564
562
565
|| V1 | V2 | V3 |
563
566
| --- | --- | --- | --- |
564
-
|**[C1]**|7 ms / 5.71 RU |7 ms / 5.71 RU |7 ms / 5.71 RU |
565
-
|**[Q1]**|2 ms / 1 RU |2 ms / 1 RU |2 ms / 1 RU |
566
-
|**[C2]**|9 ms / 8.76 RU |9 ms / 8.76 RU |9 ms / 8.76 RU |
567
-
|**[Q2]**|9 ms / 19.54 RU |2 ms / 1 RU |2 ms / 1 RU |
568
-
|**[Q3]**| 130 ms / 619.41 RU |28 ms / 201.54 RU |4 ms / 6.46 RU |
569
-
|**[C3]**|7 ms / 8.57 RU |7 ms / 15.27 RU |7 ms / 15.27 RU |
570
-
|**[Q4]**|23 ms / 27.72 RU |4 ms / 7.72 RU |4 ms / 7.72 RU |
571
-
|**[C4]**|6 ms / 7.05 RU |7 ms / 14.67 RU |7 ms / 14.67 RU |
572
-
|**[Q5]**|59 ms / 58.92 RU |4 ms / 8.92 RU |4 ms / 8.92 RU |
573
-
|**[Q6]**| 306 ms / 2063.54 RU |83 ms / 532.33 RU |9 ms / 16.97 RU |
567
+
|**[C1]**|`7` ms / `5.71` RU |`7` ms / `5.71` RU |`7` ms / `5.71` RU |
568
+
|**[Q1]**|`2` ms / `1` RU |`2` ms / `1` RU |`2` ms / `1` RU |
569
+
|**[C2]**|`9` ms / `8.76` RU |`9` ms / `8.76` RU |`9` ms / `8.76` RU |
570
+
|**[Q2]**|`9` ms / `19.54` RU |`2` ms / `1` RU |`2` ms / `1` RU |
571
+
|**[Q3]**|`130` ms / `619.41` RU |`28` ms / `201.54` RU |`4` ms / `6.46` RU |
572
+
|**[C3]**|`7` ms / `8.57` RU |`7` ms / `15.27` RU |`7` ms / `15.27` RU |
573
+
|**[Q4]**|`23` ms / `27.72` RU |`4` ms / `7.72` RU |`4` ms / `7.72` RU |
574
+
|**[C4]**|`6` ms / `7.05` RU |`7` ms / `14.67` RU |`7` ms / `14.67` RU |
575
+
|**[Q5]**|`59` ms / `58.92` RU |`4` ms / `8.92` RU |`4` ms / `8.92` RU |
576
+
|**[Q6]**|`306` ms / `2063.54` RU |`83` ms / `532.33` RU |`9` ms / `16.97` RU |
574
577
575
578
### We've optimized a read-heavy scenario
576
579
577
580
You may have noticed that we've concentrated our efforts towards improving the performance of read requests (queries) at the expense of write requests (commands). In many cases, write operations now trigger subsequent denormalization through change feeds, which makes them more computationally expensive and longer to materialize.
578
581
579
-
This is justified by the fact that a blogging platform (like most social apps) is read-heavy, which means that the amount of read requests it has to serve is usually orders of magnitude higher than the number of write requests. So it makes sense to make write requests more expensive to execute in order to let read requests be cheaper and better performing.
582
+
We justify this focus on read performance by the fact that a blogging platform (like most social apps) is read-heavy. A read-heavy workload indicates that the amount of read requests it has to serve is usually orders of magnitude higher than the number of write requests. So it makes sense to make write requests more expensive to execute in order to let read requests be cheaper and better performing.
580
583
581
584
If we look at the most extreme optimization we've done, **[Q6]** went from 2000+ RUs to just 17 RUs; we've achieved that by denormalizing posts at a cost of around 10 RUs per item. As we would serve a lot more feed requests than creation or updates of posts, the cost of this denormalization is negligible considering the overall savings.
582
585
583
586
### Denormalization can be applied incrementally
584
587
585
-
The scalability improvements we've explored in this article involve denormalization and duplication of data across the data set. It should be noted that these optimizations don't have to be put in place at day 1. Queries that filter on partition keys perform better at scale, but cross-partition queries can be acceptable if they're called rarely or against a limited data set. If you're just building a prototype, or launching a product with a small and controlled user base, you can probably spare those improvements for later; what's important then is to [monitor](../use-metrics.md) your model's performance so you can decide if and when it's time to bring them in.
588
+
The scalability improvements we've explored in this article involve denormalization and duplication of data across the data set. It should be noted that these optimizations don't have to be put in place at day 1. Queries that filter on partition keys perform better at scale, but cross-partition queries can be acceptable if they're called rarely or against a limited data set. If you're just building a prototype, or launching a product with a small and controlled user base, you can probably spare those improvements for later. What's important then is to [monitor](../use-metrics.md) your model's performance so you can decide if and when it's time to bring them in.
586
589
587
-
The change feed that we use to distribute updates to other containers store all those updates persistently. This makes it possible to request all updates since the creation of the container and bootstrap denormalized views as a one-time catch-up operation even if your system already has many data.
590
+
The change feed that we use to distribute updates to other containers store all those updates persistently. This persistence makes it possible to request all updates since the creation of the container and bootstrap denormalized views as a one-time catch-up operation even if your system already has many data.
0 commit comments