Skip to content

Commit 52b1d9b

Browse files
committed
add Alibaba Cloud Hologres
1 parent a1389f4 commit 52b1d9b

19 files changed

+715
-0
lines changed

hologres/README.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
Hologres is an all-in-one real-time data warehouse engine that is compatible with PostgreSQL. It supports online analytical processing (OLAP) and ad hoc analysis of PB-scale data. Hologres supports online data serving at high concurrency and low latency.
2+
3+
To evaluate the performance of Hologres, follow these guidelines to set up and execute the benchmark tests.
4+
5+
### 1. Create an Alibaba Cloud Account and Provide Your UID
6+
Please first create an Alibaba Cloud account. After registration, kindly provide us with your **UID** (Account ID), which you can find by:
7+
- Clicking on your profile icon in the top-right corner → **Account Center**
8+
We will issue you an **Alibaba Cloud coupon** to support your testing, so please share your UID with us.
9+
10+
---
11+
12+
### 2. Purchase an Alibaba Cloud Hologres and ECS Instance
13+
Refer to the [Alibaba Cloud Hologres TPC-H Testing Documentation](https://www.alibabacloud.com/help/en/hologres/user-guide/test-plan?spm=a2c63.p38356.help-menu-113622.d_2_14_0_0.54e14f70oTAEXO) for details on purchasing Hologres and ECS instances. Both instances must be purchased within the same region and same zone.
14+
15+
#### 2.1 When creating the Hologres instance, please use the following configuration:
16+
17+
- **Region**: `China (Beijing)`
18+
*(The new version is in gray-scale release in China (Beijing). Choosing this region ensures you can access the latest features)*
19+
- **Specifications**: ✅ **Compute Group Type**
20+
- **Zone**: `Zone L`
21+
- **Gateway Nodes**: `2 Pieces`
22+
- **Reserved Computing Resources of Virtual Warehouse**: `32 CU`
23+
*(This is the actual compute unit (CU) value used in the JSON result files.)*
24+
- **Allocate to Initial Virtual Warehouse**: `Yes`
25+
- **Enable Serverless Computing**: ✅ **True (Enabled)**
26+
- **Storage Redundancy Type**: `LRS `
27+
- **VPC & vSwitch**:
28+
- You need to **create a new VPC**.
29+
- Region: `China (Beijing)`
30+
- Name: Any name you prefer
31+
- IPv4 CIDR Block: Select "Manually enter" and use one of the recommended values
32+
- IPv6 CIDR Block: `Do Not Assign`
33+
- During VPC creation, you’ll also create a **vSwitch**:
34+
- Name: Any name
35+
- Zone: `Beijing Zone L`
36+
- IPv4 CIDR: Automatically filled based on VPC CIDR
37+
> 💡 A **VPC (Virtual Private Cloud)** is a private network in the cloud. The **vSwitch** is a subnet within the VPC. We need both Hologres and ECS instances in the same VPC for fast internal communication.
38+
- **Instance Name**: Choose any name
39+
- **Service-linked Role**: Click **Create**
40+
41+
Once everything is configured and you’ve received the coupon, click **Buy Now** to proceed.
42+
43+
#### 2.2 When creating the ECS instance, please use the following configuration:
44+
- **Billing Method**: `Pay-as-you-go` (you can release it after testing)
45+
- **Region**: `China (Beijing)`
46+
- **Network & Security Group**:
47+
- VPC: Select the one you just created
48+
- vSwitch: Automatically populated
49+
- **Instance Type**:
50+
- Series: `Compute Optimized c9i`
51+
- Instance: `ecs.c9i.4xlarge` (16 vCPUs, 32 GiB RAM)
52+
*(This is not performance-critical — it only runs the client script.)*
53+
- **Image**:
54+
- `Alibaba Cloud Linux``Alibaba Cloud Linux 3.2104 LTS 64-bit`
55+
- **System Disk**:
56+
- Size: `2048 GiB`
57+
- Performance: `PL3`
58+
*(Larger and faster disk improves import speed since we’re loading ~70GB of TSV data. IO on the ECS can be a bottleneck.)*
59+
- **Public IP Address**: ✅ Assign Public IPv4 Address
60+
- **Management Settings**:
61+
- Logon Credential: `Custom Password`
62+
- Username: `root`
63+
- Set a secure password
64+
65+
Click **Create Order** to launch the instance.
66+
67+
---
68+
69+
### 3. Connect to the ECS and Run the Benchmark
70+
71+
After the ECS instance is ready:
72+
73+
1. SSH into the ECS instance.
74+
2. Install Git and clone the repo:
75+
```bash
76+
yum -y install git
77+
git clone https://github.com/ClickHouse/JSONBench.git
78+
cd JSONBench/hologres
79+
```
80+
3. Run the benchmark script:
81+
```
82+
export PG_USER={AccessKeyID};export PG_PASSWORD={AccessKeySecret};export PG_HOSTNAME={Host};export PG_PORT={Port}
83+
./main.sh 5 {your_bluesky_data_dir}
84+
```
85+
86+
- **AccessKeyID & AccessKeySecret**:
87+
Go to the Alibaba Cloud Console → Profile Icon → **AccessKey** → Create one if needed.
88+
89+
You can also create a hologres user (Click your instance to enter instance detail page -> click "Account Management" -> "Create Custom User" -> Choose "Superuser") and use the username and password for PG_USER and PG_PASSWORD.
90+
- **Host & Port**:
91+
In the Hologres console, click your instance ID → Copy the **VPC Endpoint** (e.g., `hgxxx-cn-beijing-vpc.hologres.aliyuncs.com:xxxx`).
92+
- `Host` = domain without port (e.g., `hgxxx-cn-beijing-vpc.hologres.aliyuncs.com`)
93+
- `Port` = the number after `:`
94+
95+
---
96+

hologres/benchmark.sh

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
#!/bin/bash
2+
3+
# Check if the required arguments are provided
4+
if [[ $# -lt 1 ]]; then
5+
echo "Usage: $0 <DB_NAME> [RESULT_FILE]"
6+
exit 1
7+
fi
8+
9+
# Arguments
10+
DB_NAME="$1"
11+
RESULT_FILE="${2:-}"
12+
13+
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") START"
14+
15+
# Construct the query log file name using $DB_NAME
16+
# QUERY_LOG_FILE="${OUTPUT_PREFIX}_query_log_${DB_NAME}.txt"
17+
QUERY_LOG_FILE="${OUTPUT_PREFIX}_${DB_NAME}.query_log"
18+
19+
# Print the database name
20+
echo "Running queries on database: $DB_NAME"
21+
22+
# Run queries and log the output
23+
./run_queries.sh "$DB_NAME" 2>&1 | tee "$QUERY_LOG_FILE"
24+
25+
# Process the query log and prepare the result
26+
RESULT=$(cat "$QUERY_LOG_FILE" | grep -oP 'Time: \d+\.\d+ ms' | sed -r -e 's/Time: ([0-9]+\.[0-9]+) ms/\1/' | \
27+
awk '{ if (i % 3 == 0) { printf "[" }; printf $1 / 1000; if (i % 3 != 2) { printf "," } else { print "]," }; ++i; }')
28+
29+
# Output the result
30+
if [[ -n "$RESULT_FILE" ]]; then
31+
echo "$RESULT" > "$RESULT_FILE"
32+
echo "Result written to $RESULT_FILE"
33+
else
34+
echo "$RESULT"
35+
fi
36+
37+
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") DONE"

hologres/count.sh

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/bash
2+
3+
# Check if the required arguments are provided
4+
if [[ $# -lt 2 ]]; then
5+
echo "Usage: $0 <DB_NAME> <TABLE_NAME>"
6+
exit 1
7+
fi
8+
9+
# Arguments
10+
DB_NAME="$1"
11+
TABLE_NAME="$2"
12+
13+
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") START"
14+
15+
# Corrected SQL query
16+
$HOLOGRES_PSQL -d "$DB_NAME" -t -c "SELECT count(*) from $TABLE_NAME"
17+
18+
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") DONE"

hologres/create_and_load.sh

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
#!/bin/bash
2+
3+
# set -e
4+
5+
# Check if the required arguments are provided
6+
if [[ $# -lt 7 ]]; then
7+
echo "Usage: $0 <DB_NAME> <TABLE_NAME> <DDL_FILE> <DATA_DIRECTORY> <NUM_FILES> <SUCCESS_LOG> <ERROR_LOG>"
8+
exit 1
9+
fi
10+
11+
# Arguments
12+
DB_NAME="$1"
13+
TABLE_NAME="$2"
14+
DDL_FILE="$3"
15+
DATA_DIRECTORY="$4"
16+
NUM_FILES="$5"
17+
SUCCESS_LOG="$6"
18+
ERROR_LOG="$7"
19+
20+
# Validate arguments
21+
[[ ! -f "$DDL_FILE" ]] && { echo "Error: DDL file '$DDL_FILE' does not exist."; exit 1; }
22+
[[ ! -d "$DATA_DIRECTORY" ]] && { echo "Error: Data directory '$DATA_DIRECTORY' does not exist."; exit 1; }
23+
[[ ! "$NUM_FILES" =~ ^[0-9]+$ ]] && { echo "Error: NUM_FILES must be a positive integer."; exit 1; }
24+
25+
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") START"
26+
27+
echo "Drop and create database"
28+
$HOLOGRES_PSQL -c "DROP DATABASE IF EXISTS $DB_NAME" -c "CREATE DATABASE $DB_NAME"
29+
echo "Disable result cache."
30+
$HOLOGRES_PSQL -c "ALTER DATABASE $DB_NAME SET hg_experimental_enable_result_cache TO off;"
31+
32+
echo "Execute DDL"
33+
$HOLOGRES_PSQL -d "$DB_NAME" -t < "$DDL_FILE"
34+
35+
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Load data"
36+
./load_data.sh "$DATA_DIRECTORY" "$DB_NAME" "$TABLE_NAME" "$NUM_FILES" "$SUCCESS_LOG" "$ERROR_LOG"
37+
38+
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Vacuum analyze the table"
39+
$HOLOGRES_PSQL -d "$DB_NAME" -c '\timing' -c "VACUUM $TABLE_NAME"
40+
$HOLOGRES_PSQL -d "$DB_NAME" -c '\timing' -c "ANALYZE $TABLE_NAME"
41+
$HOLOGRES_PSQL -d "$DB_NAME" -c '\timing' -c "select hologres.hg_full_compact_table('$TABLE_NAME')"
42+
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") DONE"

hologres/ddl.sql

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
set hg_experimental_enable_nullable_clustering_key = true;
2+
CREATE TABLE bluesky (
3+
data JSONB NOT NULL,
4+
sort_key TEXT GENERATED ALWAYS AS (
5+
-- col1: kind
6+
CASE
7+
WHEN data ->> 'kind' IS NULL THEN '[NULL]'
8+
ELSE '[VAL]' || (data ->> 'kind')
9+
END || '|__COL1__|' ||
10+
11+
-- col2: operation
12+
CASE
13+
WHEN data -> 'commit' ->> 'operation' IS NULL THEN '[NULL]'
14+
ELSE '[VAL]' || (data -> 'commit' ->> 'operation')
15+
END || '|__COL2__|' ||
16+
17+
-- col3: collection
18+
CASE
19+
WHEN data -> 'commit' ->> 'collection' IS NULL THEN '[NULL]'
20+
ELSE '[VAL]' || (data -> 'commit' ->> 'collection')
21+
END || '|__COL3__|' ||
22+
23+
-- col4: did
24+
CASE
25+
WHEN data ->> 'did' IS NULL THEN '[NULL]'
26+
ELSE '[VAL]' || (data ->> 'did')
27+
END
28+
) STORED
29+
) WITH (clustering_key='sort_key');
30+
31+
ALTER TABLE bluesky ALTER COLUMN data SET (enable_columnar_type = ON);
32+
CALL set_table_property('bluesky', 'dictionary_encoding_columns', 'data:auto');
33+
CALL set_table_property('bluesky', 'bitmap_columns', 'data:auto');

hologres/drop_tables.sh

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
#!/bin/bash
2+
3+
# Check if the required arguments are provided
4+
if [[ $# -lt 1 ]]; then
5+
echo "Usage: $0 <DB_NAME>"
6+
exit 1
7+
fi
8+
9+
# Arguments
10+
DB_NAME="$1"
11+
12+
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") START"
13+
14+
# echo "Dropping database"
15+
$HOLOGRES_PSQL -c "DROP DATABASE $DB_NAME"
16+
17+
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") DONE"

hologres/index_usage.sh

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
#!/bin/bash
2+
3+
# Check if the required arguments are provided
4+
if [[ $# -lt 1 ]]; then
5+
echo "Usage: $0 <DB_NAME>"
6+
exit 1
7+
fi
8+
9+
# Arguments
10+
DB_NAME="$1"
11+
EXPLAIN_CMD="$2"
12+
13+
QUERY_NUM=1
14+
15+
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") START"
16+
17+
cat queries.sql | while read -r query; do
18+
19+
# Print the query number
20+
echo "------------------------------------------------------------------------------------------------------------------------"
21+
echo "Index usage for query Q$QUERY_NUM:"
22+
echo
23+
24+
$HOLOGRES_PSQL -d "$DB_NAME" -t -c "$EXPLAIN_CMD $query"
25+
26+
# Increment the query number
27+
QUERY_NUM=$((QUERY_NUM + 1))
28+
29+
done;
30+
31+
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") DONE"

hologres/install.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
#!/bin/bash
2+
3+
# https://www.postgresql.org/download/linux/ubuntu/
4+
5+
sudo apt-get update
6+
sudo apt-get install -y postgresql-common postgresql-16

0 commit comments

Comments
 (0)