You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: 'Quickstart: connect to a server group with psql - Hyperscale (Citus) - Azure Database for PostgreSQL'
3
+
description: Quickstart to connect psql to Azure Database for PostgreSQL - Hyperscale (Citus).
4
+
author: jonels-msft
5
+
ms.author: jonels
6
+
ms.service: postgresql
7
+
ms.subservice: hyperscale-citus
8
+
ms.custom: mvc, mode-ui
9
+
ms.topic: quickstart
10
+
ms.date: 01/24/2022
11
+
---
12
+
13
+
# Connect to a Hyperscale (Citus) server group with psql
14
+
15
+
## Prerequisites
16
+
17
+
To follow this quickstart, you'll first need to:
18
+
19
+
*[Create a server group](quickstart-create-portal.md) in the Azure portal.
20
+
21
+
## Connect
22
+
23
+
When you create your Azure Database for PostgreSQL server, a default database named **citus** is created. To connect to your database server, you need a connection string and the admin password.
24
+
25
+
1. Obtain the connection string. In the server group page, select the **Connection strings** menu item. (It's under **Settings**.) Find the string marked **psql**. It will be of the form, `psql "host=hostname.postgres.database.azure.com port=5432 dbname=citus user=citus password={your_password} sslmode=require"`
26
+
27
+
Copy the string. You’ll need to replace "{your\_password}" with the administrative password you chose earlier. The system doesn't store your plaintext password and so can't display it for you in the connection string.
28
+
29
+
2. Open a terminal window on your local computer.
30
+
31
+
3. At the prompt, connect to your Azure Database for PostgreSQL server with the [psql](https://www.postgresql.org/docs/current/app-psql.html) utility. Pass your connection string in quotes, being sure it contains your password:
32
+
```bash
33
+
psql "host=..."
34
+
```
35
+
36
+
For example, the following command connects to the coordinator node of the server group **mydemoserver**:
#Customer intent: As a developer, I want to provision a hyperscale server group so that I can run queries quickly on large datasets.
12
12
---
13
13
14
-
# Quickstart: create a Hyperscale (Citus) server group in the Azure portal
14
+
# Create a Hyperscale (Citus) server group in the Azure portal
15
15
16
16
Azure Database for PostgreSQL is a managed service that you use to run, manage, and scale highly available PostgreSQL databases in the cloud. This Quickstart shows you how to create an Azure Database for PostgreSQL - Hyperscale (Citus) server group using the Azure portal. You'll explore distributed data: sharding tables across nodes, ingesting sample data, and running queries that execute on multiple nodes.
Once connected to the hyperscale coordinator node using psql, you can complete some basic tasks.
23
-
24
-
Within Hyperscale (Citus) servers there are three types of tables:
25
-
26
-
- Distributed or sharded tables (spread out to help scaling for performance and parallelization)
27
-
- Reference tables (multiple copies maintained)
28
-
- Local tables (often used for internal admin tables)
29
-
30
-
In this quickstart, we'll primarily focus on distributed tables and getting familiar with them.
31
-
32
-
The data model we're going to work with is simple: user and event data from GitHub. Events include fork creation, git commits related to an organization, and more.
33
-
34
-
Once you've connected via psql, let's create our tables. In the psql console run:
35
-
36
-
```sql
37
-
CREATETABLEgithub_events
38
-
(
39
-
event_id bigint,
40
-
event_type text,
41
-
event_public boolean,
42
-
repo_id bigint,
43
-
payload jsonb,
44
-
repo jsonb,
45
-
user_id bigint,
46
-
org jsonb,
47
-
created_at timestamp
48
-
);
49
-
50
-
CREATETABLEgithub_users
51
-
(
52
-
user_id bigint,
53
-
url text,
54
-
login text,
55
-
avatar_url text,
56
-
gravatar_id text,
57
-
display_login text
58
-
);
59
-
```
60
-
61
-
The `payload` field of `github_events` has a JSONB datatype. JSONB is the JSON datatype in binary form in Postgres. The datatype makes it easy to store a flexible schema in a single column.
62
-
63
-
Postgres can create a `GIN` index on this type, which will index every key and value within it. With an index, it becomes fast and easy to query the payload with various conditions. Let's go ahead and create a couple of indexes before we load our data. In psql:
CREATEINDEXpayload_indexON github_events USING GIN (payload jsonb_path_ops);
68
-
```
69
-
70
-
Next we’ll take those Postgres tables on the coordinator node and tell Hyperscale (Citus) to shard them across the workers. To do so, we’ll run a query for each table specifying the key to shard it on. In the current example we’ll shard both the events and users table on `user_id`:
Next, load the data from the files into the distributed tables:
87
-
88
-
```sql
89
-
SET CLIENT_ENCODING TO 'utf8';
90
-
91
-
\copy github_events from'events.csv' WITH CSV
92
-
\copy github_users from'users.csv' WITH CSV
93
-
```
94
-
95
-
## Run queries
96
-
97
-
Now it's time for the fun part, actually running some queries. Let's start with a simple `count (*)` to see how much data we loaded:
98
-
99
-
```sql
100
-
SELECTcount(*) from github_events;
101
-
```
102
-
103
-
That worked nicely. We'll come back to that sort of aggregation in a bit, but for now let’s look at a few other queries. Within the JSONB `payload` column there's a good bit of data, but it varies based on event type. `PushEvent` events contain a size that includes the number of distinct commits for the push. We can use it to find the total number of commits per hour:
104
-
105
-
```sql
106
-
SELECT date_trunc('hour', created_at) AS hour,
107
-
sum((payload->>'distinct_size')::int) AS num_commits
108
-
FROM github_events
109
-
WHERE event_type ='PushEvent'
110
-
GROUP BY hour
111
-
ORDER BY hour;
112
-
```
113
-
114
-
So far the queries have involved the github\_events exclusively, but we can combine this information with github\_users. Since we sharded both users and events on the same identifier (`user_id`), the rows of both tables with matching user IDs will be [colocated](concepts-colocation.md) on the same database nodes and can easily be joined.
115
-
116
-
If we join on `user_id`, Hyperscale (Citus) can push the join execution down into shards for execution in parallel on worker nodes. For example, let's find the users who created the greatest number of repositories:
117
-
118
-
```sql
119
-
SELECTgu.login, count(*)
120
-
FROM github_events ge
121
-
JOIN github_users gu
122
-
ONge.user_id=gu.user_id
123
-
WHEREge.event_type='CreateEvent'
124
-
ANDge.payload @>'{"ref_type": "repository"}'
125
-
GROUP BYgu.login
126
-
ORDER BYcount(*) DESC;
127
-
```
128
-
129
-
## Clean up resources
130
-
131
-
In the preceding steps, you created Azure resources in a server group. If you don't expect to need these resources in the future, delete the server group. Press the **Delete** button in the **Overview** page for your server group. When prompted on a pop-up page, confirm the name of the server group and click the final **Delete** button.
In this quickstart, you learned how to provision a Hyperscale (Citus) server group. You connected to it with psql, created a schema, and distributed data.
30
+
**Next steps**
136
31
137
-
- Follow a tutorial to [build scalable multi-tenant
0 commit comments