MicrosoftDocs
diff --git a/‎articles/postgresql/TOC.yml
Lines changed: 14 additions & 0 deletions b/‎articles/postgresql/TOC.yml
Lines changed: 14 additions & 0 deletions
diff --git a/‎articles/postgresql/hyperscale/howto-build-scalable-apps-classify.md
Lines changed: 74 additions & 0 deletions b/‎articles/postgresql/hyperscale/howto-build-scalable-apps-classify.md
Lines changed: 74 additions & 0 deletions
diff --git a/‎articles/postgresql/hyperscale/howto-build-scalable-apps-concepts.md
Lines changed: 107 additions & 0 deletions b/‎articles/postgresql/hyperscale/howto-build-scalable-apps-concepts.md
Lines changed: 107 additions & 0 deletions
diff --git a/‎articles/postgresql/hyperscale/howto-build-scalable-apps-model-high-throughput.md
Lines changed: 61 additions & 0 deletions b/‎articles/postgresql/hyperscale/howto-build-scalable-apps-model-high-throughput.md
Lines changed: 61 additions & 0 deletions
@@ -684,6 +684,20 @@
       href: hyperscale/concepts-columnar.md
   - name: How-to guides
     items:
+    - name: Build scalable apps
+      items:
+      - name: Overview
+        href: hyperscale/howto-build-scalable-apps-overview.md
+      - name: Fundamental concepts
+        href: hyperscale/howto-build-scalable-apps-concepts.md
+      - name: Classify workload
+        href: hyperscale/howto-build-scalable-apps-classify.md
+      - name: Model multi-tenant apps
+        href: hyperscale/howto-build-scalable-apps-model-multi-tenant.md
+      - name: Model real-time apps
+        href: hyperscale/howto-build-scalable-apps-model-real-time.md
+      - name: Model high-throughput apps
+        href: hyperscale/howto-build-scalable-apps-model-high-throughput.md
     - name: Server group size
       items:
       - name: Pick initial size
 
@@ -0,0 +1,74 @@
+---
+title: Classify application workload - Hyperscale (Citus) - Azure Database for PostgreSQL
+description: Classify workload for scalable application
+ms.author: jonels
+author: jonels-msft
+ms.service: postgresql
+ms.subservice: hyperscale-citus
+ms.topic: how-to
+ms.date: 04/28/2022
+---
+
+# Classify application workload
+
+Here are common characteristics of the workloads that are the best fit for
+Hyperscale (Citus).
+
+## Prerequisites
+
+This article assumes you know the [fundamental concepts for
+scaling](howto-build-scalable-apps-concepts.md). If you haven't read about
+them, take a moment to do so.
+
+## Characteristics of multi-tenant SaaS
+
+* Tenants see their own data; they can't see other tenants' data.
+* Most B2B SaaS apps are multi-tenant. Examples include Salesforce or Shopify.
+* In most B2B SaaS apps, there are hundreds to tens of thousands of tenants, and
+  more tenants keep joining.
+* Multi-tenant SaaS apps are primarily operational/transactional, with single
+  digit millisecond latency requirements for their database queries.
+* These apps have a classic relational data model, and are built using ORMs –
+  like RoR, Hibernate, Django etc.
+  <br><br>
+  > [!VIDEO https://www.youtube.com/embed/7gAW08du6kk]
+
+## Characteristics of real-time operational analytics
+
+* These apps have a customer/user facing interactive analytics dashboard, with
+  a subsecond query latency requirement.
+* High concurrency required - at least 20 users.
+* Analyzes data that's fresh, within the last one second to few minutes.
+* Most have time series data such as events, logs, etc.
+* Common data models in these apps include:
+	* Star Schema - few large/fact tables, the rest being small/dimension tables
+	* Mostly fewer than 20 major tables
+  <br><br>
+  > [!VIDEO https://www.youtube.com/embed/xGWVVTva434]
+
+## Characteristics of high-throughput transactional
+
+* Run NoSQL/document style workloads, but require PostgreSQL features such as
+  transactions, foreign/primary keys, triggers, extension like PostGIS, etc.
+* The workload is based on a single key. It has CRUD and lookups based on that
+  key.
+* These apps have high throughput requirements: thousands to hundreds of thousands of
+  TPS.
+* Query latency in single-digit milliseconds, with a high concurrency
+  requirement.
+* Time series data, such as internet of things.
+  <br><br>
+  > [!VIDEO https://www.youtube.com/embed/A9q7w96yO_E]
+
+## Next steps
+
+Choose whichever fits your application the best:
+
+> [!div class="nextstepaction"]
+> [Model multi-tenant SaaS app >](howto-build-scalable-apps-model-multi-tenant.md)
+
+> [!div class="nextstepaction"]
+> [Model real-time analytics app](howto-build-scalable-apps-model-real-time.md)
+
+> [!div class="nextstepaction"]
+> [Model high-throughput app](howto-build-scalable-apps-model-high-throughput.md)
@@ -0,0 +1,107 @@
+---
+title: Fundamental concepts for scaling - Hyperscale (Citus) - Azure Database for PostgreSQL
+description: Ideas you need to know to build relational apps that scale
+ms.author: jonels
+author: jonels-msft
+ms.service: postgresql
+ms.subservice: hyperscale-citus
+ms.topic: how-to
+ms.date: 04/28/2022
+---
+
+# Fundamental concepts for scaling
+
+Before we investigate the steps of building a new app, it's helpful to see a
+quick overview of the terms and concepts involved.
+
+## Architectural overview
+
+Hyperscale (Citus) gives you the power to distribute tables across multiple
+machines in a server group and transparently query them the same you query
+plain PostgreSQL:
+
+![Diagram of the coordinator node sharding a table onto worker nodes.](../media/howto-hyperscale-build-scalable-apps/architecture.png)
+
+In the Hyperscale (Citus) architecture, there are multiple kinds of nodes:
+
+* The **coordinator** node stores distributed table metadata and is responsible
+  for distributed planning.
+* By contrast, the **worker** nodes store the actual data and do the computation.
+* Both the coordinator and workers are plain PostgreSQL databases, with the
+  `citus` extension loaded.
+
+To distribute a normal PostgreSQL table, like `campaigns` in the diagram above,
+run a command called `create_distributed_table()`.  Once you run this
+command, Hyperscale (Citus) transparently creates shards for the table across
+worker nodes. In the diagram, shards are represented as blue boxes.
+
+> [!NOTE]
+>
+> On the basic tier, shards of distributed tables are on the coordinator node,
+> not worker nodes.
+
+Shards are plain (but specially named) PostgreSQL tables that hold slices of
+your data. In our example, because we distributed `campaigns` by `company_id`,
+the shards hold campaigns, where the campaigns of different companies are
+assigned to different shards.
+
+## Distribution column (also known as shard key)
+
+`create_distributed_table()` is the magic function that Hyperscale (Citus)
+provides to distribute tables and use resources across multiple machines.
+
+```postgresql
+SELECT create_distributed_table(
+	'table_name',
+	'distribution_column');
+```
+
+The second argument above picks a column from the table as a **distribution
+column**. It can be any column with a native PostgreSQL type (with integer and
+text being most common). The value of the distribution column determines which
+rows go into which shards, which is why the distribution column is also called
+the **shard key**.
+
+Hyperscale (Citus) decides how to run queries based on their use of the shard
+key:
+
+| Query involves | Where it runs |
+|----------------|---------------|
+| just one shard key | on the worker node that holds its shard |
+| multiple shard keys | parallelized across multiple nodes |
+
+The choice of shard key dictates the performance and scalability of your
+applications.
+
+* Uneven data distribution per shard keys (also known as *data skew*) isn't optimal
+  for performance. For example, don’t choose a column for which a single value
+  represents 50% of data.
+* Shard keys with low cardinality can affect scalability. You can use only as
+  many shards as there are distinct key values. Choose a key with cardinality
+  in the hundreds to thousands.
+* Joining two large tables with different shard keys can be slow. Choose a
+  common shard key across large tables. Learn more in
+  [colocation](#colocation).
+
+## Colocation
+
+Another concept closely related to shard key is *colocation*. Tables sharded by
+the same distribution column values are colocated - The shards of colocated
+tables are stored together on the same workers.
+
+Below are two tables sharded by the same key, `site_id`. They're colocated.
+
+![Diagram of tables http_request and http_request_1min colocated by site_id.](../media/howto-hyperscale-build-scalable-apps/colocation.png)
+
+Hyperscale (Citus) ensures that rows with a matching `site_id` value in both
+tables are stored on the same worker node.  You can see that, for both tables,
+rows with `site_id=1` are stored on worker 1. Similarly for other site IDs.
+
+Colocation helps optimize JOINs across these tables. If you join the two tables
+on `site_id`, Hyperscale (Citus) can perform the join locally on worker nodes
+without shuffling data between nodes.
+
+## Next steps
+
+> [!div class="nextstepaction"]
+> [Classify application workload >](howto-build-scalable-apps-classify.md)
@@ -0,0 +1,61 @@
+---
+title: Model high throughput apps - Hyperscale (Citus) - Azure Database for PostgreSQL
+description: Techniques for scalable high-throughput transactional apps
+ms.author: jonels
+author: jonels-msft
+ms.service: postgresql
+ms.subservice: hyperscale-citus
+ms.topic: how-to
+ms.date: 04/28/2022
+---
+
+# Model high-throughput transactional apps
+
+## Common filter as shard key
+
+To pick the shard key for a high-throughput transactional (HTAP) application,
+follow these guidelines:
+
+* Choose a column that is used for point lookups and is present in most
+  create, read, update, and delete operations.
+* Choose a column that is a natural dimension in the data, or a central piece
+  of the application. For example:
+  * In an IOT workload, `device_id` is a good distribution column.
+
+The choice of a good shard key helps optimize network hops, while taking
+advantage of memory and compute to achieve millisecond latency.
+
+## Optimal data model for high-throughput apps
+
+Below is an example of a sample data-model for an IoT app that captures
+telemetry (time series data) from devices. There are two tables for capturing
+telemetry: `devices` and `events`. There could be other tables, but they're not
+covered in this example.
+
+![Diagram of events and devices tables, and partitions of events.](../media/howto-hyperscale-build-scalable-apps/high-throughput-data-model.png)
+
+When building a high-throughput app, keep some optimization in mind.
+
+* Distribute large tables on a common column that is central piece of the app,
+  and the column that your app mostly queries. In the above example of an IOT
+  app, `device_id` is that column, and it co-locates the events and devices
+  tables.
+* The rest of the small tables can be reference tables.
+* As IOT apps have a time dimension, partition your distributed tables based on
+  time. You can use native Hyperscale (Citus) time series capabilities to
+  create and maintain partitions.
+  * Partitioning helps efficiently filter data for queries with time filters.
+  * Expiring old data is also fast, using the DROP vs DELETE command.
+  * The events table in our example is partitioned by month.
+* Use the JSONB datatype to store semi-structured data. Device telemetry
+  data is typically not structured, every device has its own metrics.
+  * In our example,  the events table has a `detail` column, which is JSONB.
+* If your IoT app requires geospatial features, you can use the PostGIS
+  extension, which Hyperscale (Citus) supports natively.
+
+## Next steps
+
+We've completed the how-to for building scalable apps.
+
+* You may now want to know how to [scale a server group](howto-scale-grow.md)
+  to give your app more nodes and hardware capacity.