Skip to content

Commit 1c878f6

Browse files
authored
Merge pull request #216392 from jonels-msft/cosmospg-quickstart-single-node
Note single-node use case in quickstart
2 parents 46840b6 + 54ee73a commit 1c878f6

File tree

4 files changed

+91
-2
lines changed

4 files changed

+91
-2
lines changed

articles/cosmos-db/postgresql/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,8 @@
166166
href: howto-scale-initial.md
167167
- name: Scale cluster
168168
href: howto-scale-grow.md
169+
- name: Choose shard count
170+
href: howto-shard-count.md
169171
- name: Rebalance shards
170172
href: howto-scale-rebalance.md
171173
- name: Change compute quotas
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
---
2+
title: Choose shard count - Azure Cosmos DB for PostgreSQL
3+
description: Pick the right shard count for distributed tables
4+
ms.author: jonels
5+
author: jonels-msft
6+
ms.service: cosmos-db
7+
ms.subservice: postgresql
8+
ms.topic: how-to
9+
ms.date: 11/01/2022
10+
---
11+
12+
# Choose shard count
13+
14+
[!INCLUDE [PostgreSQL](../includes/appliesto-postgresql.md)]
15+
16+
Choosing the shard count for each distributed table is a balance between the
17+
flexibility of having more shards, and the overhead for query planning and
18+
execution across them. If you decide to change the shard count of a table after
19+
distributing, you can use the
20+
[alter_distributed_table](reference-functions.md#alter_distributed_table)
21+
function.
22+
23+
## Multi-tenant SaaS use case
24+
25+
The optimal choice varies depending on your access patterns for the data. For
26+
instance, in the Multi-Tenant SaaS Database use-case we recommend choosing
27+
between **32 - 128** shards. For smaller workloads say <100 GB, you could start with
28+
32 shards and for larger workloads you could choose 64 or 128. This choice gives you
29+
the leeway to scale from 32 to 128 worker machines.
30+
31+
## Real-time analytics use case
32+
33+
In the Real-Time Analytics use-case, shard count should be related to the total
34+
number of cores on the workers. To ensure maximum parallelism, you should create
35+
enough shards on each node such that there is at least one shard per CPU core.
36+
We typically recommend creating a high number of initial shards, for example,
37+
**2x or 4x the number of current CPU cores**. Having more shards allows for
38+
future scaling if you add more workers and CPU cores.
39+
40+
Keep in mind that, for each query, Azure Cosmos DB for PostgreSQL opens one
41+
database connection per shard, and that these connections are limited. Be
42+
careful to keep the shard count small enough that distributed queries won’t
43+
often have to wait for a connection. Put another way, the connections needed,
44+
`(max concurrent queries * shard count)`, shouldn't exceed the total
45+
connections possible in the system, `(number of workers * max_connections per
46+
worker)`.
47+
48+
## Next steps
49+
50+
- Learn more about cluster [performance options](resources-compute.md).
51+
- [Scale a cluster](howto-scale-grow.md) up or out
52+
- [Rebalance shards](howto-scale-rebalance.md)

articles/cosmos-db/postgresql/quickstart-distribute-tables.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: cosmos-db
88
ms.subservice: postgresql
99
ms.custom: mvc, mode-ui, ignite-2022
1010
ms.topic: quickstart
11-
ms.date: 10/14/2022
11+
ms.date: 11/01/2022
1212
---
1313

1414
# Create and distribute tables
@@ -71,6 +71,13 @@ provides to distribute tables and use resources across multiple machines. The
7171
function decomposes tables into shards, which can be spread across nodes for
7272
increased storage and compute performance.
7373

74+
> [!NOTE]
75+
>
76+
> In real applications, when your workload fits in 64 vCores, 256GB RAM and 2TB
77+
> storage, you can use a single-node cluster. In this case, distributing tables
78+
> is optional. Later, you can distribute tables as needed using
79+
> [create_distributed_table_concurrently](reference-functions.md#create_distributed_table_concurrently).
80+
7481
Let's distribute the tables:
7582

7683
```sql

articles/cosmos-db/postgresql/reference-functions.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: jonels-msft
66
ms.service: cosmos-db
77
ms.subservice: postgresql
88
ms.topic: reference
9-
ms.date: 02/24/2022
9+
ms.date: 11/01/2022
1010
---
1111

1212
# Azure Cosmos DB for PostgreSQL functions
@@ -67,6 +67,15 @@ table shards will be moved together unnecessarily in a \"cascade.\"
6767
If a new distributed table isn't related to other tables, it's best to
6868
specify `colocate_with => 'none'`.
6969

70+
**shard\_count:** (Optional) the number of shards to create for the new
71+
distributed table. When specifying `shard_count` you can’t specify a value of
72+
`colocate_with` other than none. To change the shard count of an existing table
73+
or colocation group, use the [alter_distributed_table](#alter_distributed_table
74+
function.
75+
76+
Possible values for `shard_count` are between 1 and 64000. For guidance on
77+
choosing the optimal value, see [Shard Count](howto-shard-count.md).
78+
7079
#### Return Value
7180

7281
N/A
@@ -84,6 +93,25 @@ SELECT create_distributed_table('github_events', 'repo_id',
8493
colocate_with => 'github_repo');
8594
```
8695

96+
### create\_distributed\_table\_concurrently
97+
98+
This function has the same interface and purpose as
99+
[create_distributed_function](#create_distributed_table), but doesn't block
100+
writes during table distribution.
101+
102+
However, `create_distributed_table_concurrently` has a few limitations:
103+
104+
* You can't use the function in a transaction block, which means you can only
105+
distribute one table at a time. (You *can* use the function on
106+
time-partitioned tables, though.)
107+
* You can't use `create_distributed_table_concurrently` when the table is
108+
referenced by a foreign key, or references another local table. However,
109+
foreign keys to reference tables work, and you can create foreign keys to other
110+
distributed tables after table distribution completes.
111+
* If you don't have a primary key or replica identity on your table, then
112+
update and delete commands will fail during the table distribution due to
113+
limitations on logical replication.
114+
87115
### truncate\_local\_data\_after\_distributing\_table
88116

89117
Truncate all local rows after distributing a table, and prevent constraints

0 commit comments

Comments
 (0)