Merge pull request #216392 from jonels-msft/cosmospg-quickstart-single-node

tamarakhader · web-flow · commit 1c878f6dacda · 2022-11-01T14:17:16.000-04:00
Note single-node use case in quickstart
diff --git a/articles/cosmos-db/postgresql/TOC.yml b/articles/cosmos-db/postgresql/TOC.yml
@@ -166,6 +166,8 @@
               href: howto-scale-initial.md
             - name: Scale cluster
               href: howto-scale-grow.md
+        - name: Choose shard count
+          href: howto-shard-count.md
         - name: Rebalance shards
           href: howto-scale-rebalance.md
         - name: Change compute quotas
diff --git a/articles/cosmos-db/postgresql/howto-shard-count.md b/articles/cosmos-db/postgresql/howto-shard-count.md
@@ -0,0 +1,52 @@
+---
+title: Choose shard count - Azure Cosmos DB for PostgreSQL
+description: Pick the right shard count for distributed tables
+ms.author: jonels
+author: jonels-msft
+ms.service: cosmos-db
+ms.subservice: postgresql
+ms.topic: how-to
+ms.date: 11/01/2022
+---
+
+# Choose shard count
+
+[!INCLUDE [PostgreSQL](../includes/appliesto-postgresql.md)]
+
+Choosing the shard count for each distributed table is a balance between the
+flexibility of having more shards, and the overhead for query planning and
+execution across them. If you decide to change the shard count of a table after
+distributing, you can use the
+[alter_distributed_table](reference-functions.md#alter_distributed_table)
+function.
+
+## Multi-tenant SaaS use case
+
+The optimal choice varies depending on your access patterns for the data. For
+instance, in the Multi-Tenant SaaS Database use-case we recommend choosing
+between **32 - 128** shards. For smaller workloads say <100 GB, you could start with
+32 shards and for larger workloads you could choose 64 or 128. This choice gives you
+the leeway to scale from 32 to 128 worker machines.
+
+## Real-time analytics use case
+
+In the Real-Time Analytics use-case, shard count should be related to the total
+number of cores on the workers. To ensure maximum parallelism, you should create
+enough shards on each node such that there is at least one shard per CPU core.
+We typically recommend creating a high number of initial shards, for example,
+**2x or 4x the number of current CPU cores**. Having more shards allows for
+future scaling if you add more workers and CPU cores.
+
+Keep in mind that, for each query, Azure Cosmos DB for PostgreSQL opens one
+database connection per shard, and that these connections are limited. Be
+careful to keep the shard count small enough that distributed queries won’t
+often have to wait for a connection. Put another way, the connections needed,
+`(max concurrent queries * shard count)`, shouldn't exceed the total
+connections possible in the system, `(number of workers * max_connections per
+worker)`.
+
+## Next steps
+
+- Learn more about cluster [performance options](resources-compute.md).
+- [Scale a cluster](howto-scale-grow.md) up or out
+- [Rebalance shards](howto-scale-rebalance.md)
diff --git a/articles/cosmos-db/postgresql/quickstart-distribute-tables.md b/articles/cosmos-db/postgresql/quickstart-distribute-tables.md
@@ -8,7 +8,7 @@ ms.service: cosmos-db
 ms.subservice: postgresql
 ms.custom: mvc, mode-ui, ignite-2022
 ms.topic: quickstart
-ms.date: 10/14/2022
+ms.date: 11/01/2022
 ---
 
 # Create and distribute tables
@@ -71,6 +71,13 @@ provides to distribute tables and use resources across multiple machines.  The
 function decomposes tables into shards, which can be spread across nodes for
 increased storage and compute performance.
 
+> [!NOTE]
+>
+> In real applications, when your workload fits in 64 vCores, 256GB RAM and 2TB
+> storage, you can use a single-node cluster. In this case, distributing tables
+> is optional. Later, you can distribute tables as needed using
+> [create_distributed_table_concurrently](reference-functions.md#create_distributed_table_concurrently).
+
 Let's distribute the tables:
 
 ```sql
diff --git a/articles/cosmos-db/postgresql/reference-functions.md b/articles/cosmos-db/postgresql/reference-functions.md
@@ -6,7 +6,7 @@ author: jonels-msft
 ms.service: cosmos-db
 ms.subservice: postgresql
 ms.topic: reference
-ms.date: 02/24/2022
+ms.date: 11/01/2022
 ---
 
 # Azure Cosmos DB for PostgreSQL functions
@@ -67,6 +67,15 @@ table shards will be moved together unnecessarily in a \"cascade.\"
 If a new distributed table isn't related to other tables, it's best to
 specify `colocate_with => 'none'`.
 
+**shard\_count:** (Optional) the number of shards to create for the new
+distributed table. When specifying `shard_count` you can’t specify a value of
+`colocate_with` other than none. To change the shard count of an existing table
+or colocation group, use the [alter_distributed_table](#alter_distributed_table
+function.
+
+Possible values for `shard_count` are between 1 and 64000. For guidance on
+choosing the optimal value, see [Shard Count](howto-shard-count.md).
+
 #### Return Value
 
 N/A
@@ -84,6 +93,25 @@ SELECT create_distributed_table('github_events', 'repo_id',
                                 colocate_with => 'github_repo');
 ```
 
+### create\_distributed\_table\_concurrently
+
+This function has the same interface and purpose as
+[create_distributed_function](#create_distributed_table), but doesn't block
+writes during table distribution.
+
+However, `create_distributed_table_concurrently` has a few limitations:
+
+* You can't use the function in a transaction block, which means you can only
+  distribute one table at a time. (You *can* use the function on
+  time-partitioned tables, though.)
+* You can't use `create_distributed_table_concurrently` when the table is
+  referenced by a foreign key, or references another local table. However,
+  foreign keys to reference tables work, and you can create foreign keys to other
+  distributed tables after table distribution completes.
+* If you don't have a primary key or replica identity on your table, then
+  update and delete commands will fail during the table distribution due to
+  limitations on logical replication.
+
 ### truncate\_local\_data\_after\_distributing\_table
 
 Truncate all local rows after distributing a table, and prevent constraints