Skip to content

Commit 3d51701

Browse files
authored
Merge pull request #187100 from jonels-msft/hsc-quicker-quick
Make hyperscale quickstart even easier
2 parents 5fb3638 + bfbcabd commit 3d51701

File tree

14 files changed

+263
-108
lines changed

14 files changed

+263
-108
lines changed

articles/postgresql/TOC.yml

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -611,18 +611,11 @@
611611
- name: Quickstart
612612
items:
613613
- name: Create server group
614-
items:
615-
- name: Azure portal
616-
href: hyperscale/quickstart-create-portal.md
617-
displayName: portal, create hyperscale
614+
href: hyperscale/quickstart-create-portal.md
618615
- name: Connect
619-
items:
620-
- name: psql
621-
href: hyperscale/quickstart-connect-psql.md
616+
href: hyperscale/quickstart-connect-psql.md
622617
- name: Model and load data
623-
items:
624-
- name: Create and distribute tables
625-
href: hyperscale/quickstart-distribute-tables.md
618+
href: hyperscale/quickstart-distribute-tables.md
626619
- name: Run queries
627620
href: hyperscale/quickstart-run-queries.md
628621
- name: Tutorials

articles/postgresql/hyperscale/index.yml

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ metadata:
55
description: "Azure Database for PostgreSQL is a relational database service in the Microsoft cloud that is built for developers based on the open-source PostgreSQL database engine."
66
author: jonels-msft
77
ms.author: jonels
8-
ms.date: 01/03/2022
8+
ms.date: 02/02/2022
99
ms.service: postgresql
1010
ms.subservice: hyperscale-citus
1111
ms.topic: landing-page
@@ -20,6 +20,16 @@ landingContent:
2020
links:
2121
- text: What is Hyperscale (Citus)?
2222
url: overview.md
23+
- linkListType: quickstart
24+
links:
25+
- text: Create a server group
26+
url: quickstart-create-portal.md
27+
- text: Connect with psql
28+
url: quickstart-connect-psql.md
29+
- text: Model and load data
30+
url: quickstart-distribute-tables.md
31+
- text: Run queries
32+
url: quickstart-run-queries.md
2333
- linkListType: concept
2434
links:
2535
- text: Nodes and distributed tables
@@ -32,10 +42,6 @@ landingContent:
3242
url: howto-scale-initial.md
3343
- text: Regions and resources
3444
url: concepts-configuration-options.md
35-
- linkListType: quickstart
36-
links:
37-
- text: Create a server group
38-
url: quickstart-create-basic-tier.md
3945
- linkListType: video
4046
links:
4147
- text: How Citus distributes PostgreSQL

articles/postgresql/hyperscale/quickstart-connect-psql.md

Lines changed: 38 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.service: postgresql
77
ms.subservice: hyperscale-citus
88
ms.custom: mvc, mode-ui
99
ms.topic: quickstart
10-
ms.date: 01/24/2022
10+
ms.date: 02/09/2022
1111
---
1212

1313
# Connect to a Hyperscale (Citus) server group with psql
@@ -20,26 +20,50 @@ To follow this quickstart, you'll first need to:
2020

2121
## Connect
2222

23-
When you create your Azure Database for PostgreSQL server, a default database named **citus** is created. To connect to your database server, you need a connection string and the admin password.
23+
When you create your Hyperscale (Citus) server group, a default database named **citus** is created. To connect to your database server, you need a connection string and the admin password.
2424

25-
1. Obtain the connection string. In the server group page, select the **Connection strings** menu item. (It's under **Settings**.) Find the string marked **psql**. It will be of the form, `psql "host=hostname.postgres.database.azure.com port=5432 dbname=citus user=citus password={your_password} sslmode=require"`
25+
1. Obtain the connection string. In the server group page, select the
26+
**Connection strings** menu item.
2627

27-
Copy the string. You’ll need to replace "{your\_password}" with the administrative password you chose earlier. The system doesn't store your plaintext password and so can't display it for you in the connection string.
28+
![get connection string](../media/quickstart-connect-psql/get-connection-string.png)
2829

29-
2. Open a terminal window on your local computer.
30+
Find the string marked **psql**. It will be of the form, `psql
31+
"host=c.servergroup.postgres.database.azure.com port=5432 dbname=citus
32+
user=citus password={your_password} sslmode=require"`
3033

31-
3. At the prompt, connect to your Azure Database for PostgreSQL server with the [psql](https://www.postgresql.org/docs/current/app-psql.html) utility. Pass your connection string in quotes, being sure it contains your password:
32-
```bash
33-
psql "host=..."
34-
```
34+
* Copy the string.
35+
* Replace "{your\_password}" with the administrative password you chose earlier.
36+
* Notice the hostname starts with a `c.`, for instance
37+
`c.demo.postgres.database.azure.com`. This prefix indicates the
38+
coordinator node of the server group.
39+
* The default dbname and username is `citus` and can't be changed.
40+
41+
2. Open the Azure Cloud Shell. Select the **Cloud Shell** icon in the Azure portal.
42+
43+
![cloud shell icon](../media/quickstart-connect-psql/open-cloud-shell.png)
44+
45+
If prompted, choose an Azure subscription in which to store Cloud Shell data.
3546

36-
For example, the following command connects to the coordinator node of the server group **mydemoserver**:
47+
3. In the shell, paste the psql connection string, *substituting your password
48+
for the string `{your_password}`*, then press enter. For example:
49+
50+
![run psql in cloud
51+
shell](../media/quickstart-connect-psql/cloud-shell-run-psql.png)
52+
53+
When psql successfully connects to the database, you'll see a new prompt:
3754

38-
```bash
39-
psql "host=mydemoserver-c.postgres.database.azure.com port=5432 dbname=citus user=citus password={your_password} sslmode=require"
55+
```
56+
psql (13.0 (Debian 13.0-1.pgdg100+1), server 13.5)
57+
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
58+
Type "help" for help.
59+
60+
citus=>
4061
```
4162

4263
## Next steps
4364

44-
* [Troubleshoot connection problems](howto-troubleshoot-common-connection-issues.md).
45-
* Learn to [create and distribute tables](quickstart-distribute-tables.md).
65+
Now that you've connected to the server group, the next step is to create
66+
tables and shard them for horizontal scaling.
67+
68+
> [!div class="nextstepaction"]
69+
> [Create and distribute tables](quickstart-distribute-tables.md)

articles/postgresql/hyperscale/quickstart-create-portal.md

Lines changed: 54 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,26 +7,68 @@ ms.service: postgresql
77
ms.subservice: hyperscale-citus
88
ms.custom: mvc, mode-ui
99
ms.topic: quickstart
10-
ms.date: 01/24/2022
10+
ms.date: 02/09/2022
1111
#Customer intent: As a developer, I want to provision a hyperscale server group so that I can run queries quickly on large datasets.
1212
---
1313

1414
# Create a Hyperscale (Citus) server group in the Azure portal
1515

16-
Azure Database for PostgreSQL is a managed service that you use to run, manage, and scale highly available PostgreSQL databases in the cloud. This Quickstart shows you how to create an Azure Database for PostgreSQL - Hyperscale (Citus) server group using the Azure portal. You'll explore distributed data: sharding tables across nodes, ingesting sample data, and running queries that execute on multiple nodes.
16+
Azure Database for PostgreSQL - Hyperscale (Citus) is a managed service that
17+
you to run horizontally scalable PostgreSQL databases in the cloud. This
18+
Quickstart shows you how to create a Hyperscale (Citus) server group using the
19+
Azure portal. You'll explore distributed data: sharding tables across nodes,
20+
generating sample data, and running queries that execute on multiple nodes.
1721

22+
## Prerequisites
1823

19-
Azure Database for PostgreSQL - Hyperscale (Citus) is a managed service that
20-
you use to run, manage, and scale highly available PostgreSQL databases in the
21-
cloud. Its [basic tier](concepts-server-group.md#tiers) is a convenient
22-
deployment option for initial development and testing.
24+
To follow this quickstart, you'll first need to:
25+
26+
* Create a [free account](https://azure.microsoft.com/free/) (If you don't have
27+
an Azure subscription).
28+
* Sign in to the [Azure portal](https://portal.azure.com).
29+
30+
## Create server group
31+
32+
1. Select **Create a resource** (+) in the upper-left corner of the portal.
33+
2. Select **Databases** > **Azure Database for PostgreSQL**.
34+
![create a resource menu](../media/quickstart-hyperscale-create-portal/database-service.png)
35+
3. Select the **Hyperscale (Citus) server group** deployment option.
36+
![deployment options](../media/quickstart-hyperscale-create-portal/deployment-option.png)
37+
4. Fill out the **Basics** form with the following information:
38+
![basic info form](../media/quickstart-hyperscale-create-portal/basics.png)
39+
40+
| Setting | Description |
41+
|-------------------|-------------------|
42+
| Subscription | The Azure subscription that you want to use for your server. If you have multiple subscriptions, choose the subscription in which you'd like to be billed for the resource. |
43+
| Resource group | A new resource group name or an existing one from your subscription. |
44+
| Server group name | A unique name that identifies your Hyperscale server group. The domain name postgres.database.azure.com is appended to the server group name you provide. The server can contain only lowercase letters, numbers, and the hyphen (-) character. It must contain fewer than 40 characters. |
45+
| Location | The location that is closest to you. |
46+
| Admin username | Currently required to be the value `citus`, and can't be changed. |
47+
| Password | A new password for the server admin account. It must contain between 8 and 128 characters. Your password must contain characters from three of the following categories: English uppercase letters, English lowercase letters, numbers (0 through 9), and non-alphanumeric characters (!, $, #, %, etc.). |
48+
| Version | The latest PostgreSQL major version, unless you have specific requirements. |
49+
| Compute + storage | The compute, storage, and Tier configurations for your new server. Select **Configure server group**. |
50+
51+
![compute and storage](../media/quickstart-hyperscale-create-portal/compute.png)
52+
53+
5. For this quickstart, you can accept the default value of **Basic** for
54+
**Tiers**. The other option, standard tier, creates worker nodes for
55+
greater total data capacity and query parallelism. See
56+
[tiers](concepts-server-group.md#tiers) for a more in-depth comparison.
57+
6. Select **Next : Networking >** at the bottom of the screen.
58+
7. In the **Networking** tab, select **Allow public access from Azure services
59+
and resources within Azure to this server group**.
60+
61+
![networking configuration](../media/quickstart-hyperscale-create-portal/networking.png)
2362

24-
This quickstart shows you how to create a Hyperscale (Citus) basic tier server
25-
group using the Azure portal. You'll create the server group and verify that
26-
you can connect to it to run queries.
63+
8. Select **Review + create** and then **Create** to create the server.
64+
Provisioning takes a few minutes.
65+
9. The page will redirect to monitor deployment. When the live status changes
66+
from **Deployment is in progress** to **Your deployment is complete**.
67+
After this transition, select **Go to resource**.
2768

28-
[!INCLUDE [azure-postgresql-hyperscale-create-db](../../../includes/azure-postgresql-hyperscale-create-db.md)]
69+
## Next steps
2970

30-
**Next steps**
71+
With your server group created, it's time to connect with a SQL client.
3172

32-
* [Connect to your server group](quickstart-connect-psql.md) with psql.
73+
> [!div class="nextstepaction"]
74+
> [Connect to your server group](quickstart-connect-psql.md)

articles/postgresql/hyperscale/quickstart-distribute-tables.md

Lines changed: 76 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -7,25 +7,24 @@ ms.service: postgresql
77
ms.subservice: hyperscale-citus
88
ms.custom: mvc, mode-ui
99
ms.topic: quickstart
10-
ms.date: 01/24/2022
10+
ms.date: 02/09/2022
1111
---
1212

13-
# Create and distribute tables
13+
# Model and load data
1414

15-
Within Hyperscale (Citus) servers there are three types of tables:
15+
Within Hyperscale (Citus) servers, there are three types of tables:
1616

1717
* **Distributed Tables** - Distributed across worker nodes (scaled out).
18-
Generally large tables should be distributed tables to improve performance.
18+
Large tables should be distributed tables to improve performance.
1919
* **Reference tables** - Replicated to all nodes. Enables joins with
2020
distributed tables. Typically used for small tables like countries or product
2121
categories.
2222
* **Local tables** - Tables that reside on coordinator node. Administration
2323
tables are good examples of local tables.
2424

25-
In this quickstart, we'll primarily focus on distributed tables, and getting
26-
familiar with them.
27-
28-
The data model we're going to work with is simple: user and event data from GitHub. Events include fork creation, git commits related to an organization, and more.
25+
In this quickstart, we'll focus on distributed tables, and get familiar with
26+
them. The data model we're going to work with is simple: an HTTP request log
27+
for multiple websites, sharded by site.
2928

3029
## Prerequisites
3130

@@ -37,73 +36,105 @@ To follow this quickstart, you'll first need to:
3736

3837
## Create tables
3938

40-
Once you've connected via psql, let's create our tables. In the psql console run:
39+
Once you've connected via psql, let's create our table. Copy and paste the
40+
following commands into the psql terminal window, and hit enter to run:
4141

4242
```sql
43-
CREATE TABLE github_events
43+
CREATE TABLE github_users
4444
(
45-
event_id bigint,
46-
event_type text,
47-
event_public boolean,
48-
repo_id bigint,
49-
payload jsonb,
50-
repo jsonb,
51-
user_id bigint,
52-
org jsonb,
53-
created_at timestamp
45+
user_id bigint,
46+
url text,
47+
login text,
48+
avatar_url text,
49+
gravatar_id text,
50+
display_login text
5451
);
5552

56-
CREATE TABLE github_users
53+
CREATE TABLE github_events
5754
(
58-
user_id bigint,
59-
url text,
60-
login text,
61-
avatar_url text,
62-
gravatar_id text,
63-
display_login text
55+
event_id bigint,
56+
event_type text,
57+
event_public boolean,
58+
repo_id bigint,
59+
payload jsonb,
60+
repo jsonb,
61+
user_id bigint,
62+
org jsonb,
63+
created_at timestamp
6464
);
65-
```
66-
67-
The `payload` field of `github_events` has a JSONB datatype. JSONB is the JSON datatype in binary form in Postgres. The datatype makes it easy to store a flexible schema in a single column.
6865

69-
Postgres can create a `GIN` index on this type, which will index every key and value within it. With an index, it becomes fast and easy to query the payload with various conditions. Let's go ahead and create a couple of indexes before we load our data. In psql:
70-
71-
```sql
7266
CREATE INDEX event_type_index ON github_events (event_type);
7367
CREATE INDEX payload_index ON github_events USING GIN (payload jsonb_path_ops);
7468
```
7569

7670
## Shard tables across worker nodes
7771

78-
Next we’ll take those Postgres tables on the coordinator node and tell Hyperscale (Citus) to shard them across the workers. To do so, we’ll run a query for each table specifying the key to shard it on. In the current example we’ll shard both the events and users table on `user_id`:
72+
Next, we’ll tell Hyperscale (Citus) to shard the tables. If your server group
73+
is running on the Standard Tier (meaning it has worker nodes), then the table
74+
shards will be created on workers. If the server group is running on the Basic
75+
Tier, then the shards will all be stored on the coordinator node.
76+
77+
To shard and distribute the tables, call `create_distributed_table()` and
78+
specify the table and key to shard it on.
7979

8080
```sql
81-
SELECT create_distributed_table('github_events', 'user_id');
8281
SELECT create_distributed_table('github_users', 'user_id');
82+
SELECT create_distributed_table('github_events', 'user_id');
8383
```
8484

8585
[!INCLUDE [azure-postgresql-hyperscale-dist-alert](../../../includes/azure-postgresql-hyperscale-dist-alert.md)]
8686

87+
By default, `create_distributed_table()` splits tables into 32 shards. We can
88+
verify using the `citus_shards` view:
89+
90+
```sql
91+
SELECT table_name, count(*)
92+
FROM citus_shards
93+
GROUP BY 1;
94+
```
95+
96+
```
97+
table_name | count
98+
---------------+-------
99+
github_events | 32
100+
github_users | 32
101+
(2 rows)
102+
```
103+
87104
## Load data into distributed tables
88105

89-
We're ready to load data. In psql still, shell out to download the files:
106+
We're ready to fill the tables with sample data. For this quickstart, we'll use
107+
a dataset previously captured from the GitHub API.
90108

91-
```sql
92-
\! curl -O https://examples.citusdata.com/users.csv
93-
\! curl -O https://examples.citusdata.com/events.csv
109+
```
110+
\COPY github_users FROM PROGRAM 'curl https://examples.citusdata.com/users.csv' WITH (FORMAT CSV)
111+
\COPY github_events FROM PROGRAM 'curl https://examples.citusdata.com/events.csv' WITH (FORMAT CSV)
94112
```
95113

96-
Next, load the data from the files into the distributed tables:
114+
We can confirm the shards now hold data:
97115

98116
```sql
99-
SET CLIENT_ENCODING TO 'utf8';
117+
SELECT table_name, pg_size_pretty(sum(shard_size))
118+
FROM citus_shards
119+
GROUP BY 1;
120+
```
100121

101-
\copy github_events from 'events.csv' WITH CSV
102-
\copy github_users from 'users.csv' WITH CSV
103122
```
123+
table_name | pg_size_pretty
124+
---------------+----------------
125+
github_users | 38 MB
126+
github_events | 95 MB
127+
(2 rows)
128+
```
129+
130+
If you created your server group in the Basic Tier, all shards are stored on
131+
one node, the coordinator. Otherwise, if the server group is in the Standard
132+
Tier, it has multiple worker nodes that store the shards.
104133

105134
## Next steps
106135

107-
* [Run queries](quickstart-run-queries.md) on the distributed tables you
108-
created in this quickstart.
109-
* Learn more about [sharding data](tutorial-shard.md).
136+
Now we have a table sharded and loaded with data. Next, let's try running
137+
queries across the data in these shards.
138+
139+
> [!div class="nextstepaction"]
140+
> [Run distributed queries](quickstart-run-queries.md)

0 commit comments

Comments
 (0)