ClickHouse · wangmj17 · Sep 15, 2025 · Oct 9, 2025 · Oct 9, 2025 · Oct 9, 2025
diff --git a/hologres/README.md b/hologres/README.md
@@ -0,0 +1,96 @@
+Hologres is an all-in-one real-time data warehouse engine that is compatible with PostgreSQL. It supports online analytical processing (OLAP) and ad hoc analysis of PB-scale data. Hologres supports online data serving at high concurrency and low latency.
+
+To evaluate the performance of Hologres, follow these guidelines to set up and execute the benchmark tests.
+
+### 1. Create an Alibaba Cloud Account and Provide Your UID  
+Please first create an Alibaba Cloud account. After registration, kindly provide us with your **UID** (Account ID), which you can find by:  
+- Clicking on your profile icon in the top-right corner → **Account Center** 
+We will issue you an **Alibaba Cloud coupon** to support your testing, so please share your UID with us.
+
+---
+
+### 2. Purchase an Alibaba Cloud Hologres and ECS Instance  
+Refer to the [Alibaba Cloud Hologres TPC-H Testing Documentation](https://www.alibabacloud.com/help/en/hologres/user-guide/test-plan?spm=a2c63.p38356.help-menu-113622.d_2_14_0_0.54e14f70oTAEXO) for details on purchasing Hologres and ECS instances. Both instances must be purchased within the same region and same zone.
+
+#### 2.1 When creating the Hologres instance, please use the following configuration:
+
+- **Region**: `China (Beijing)`  
+  *(The new version is in gray-scale release in China (Beijing). Choosing this region ensures you can access the latest features)*
+- **Specifications**: ✅ **Compute Group Type**  
+- **Zone**: `Zone L`  
+- **Gateway Nodes**: `2 Pieces`  
+- **Reserved Computing Resources of Virtual Warehouse**: `32 CU`  
+  *(This is the actual compute unit (CU) value used in the JSON result files.)*
+- **Allocate to Initial Virtual Warehouse**: `Yes`
+- **Enable Serverless Computing**: ✅ **True (Enabled)**  
+- **Storage Redundancy Type**: `LRS `
+- **VPC & vSwitch**:  
+  - You need to **create a new VPC**.  
+    - Region: `China (Beijing)`  
+    - Name: Any name you prefer  
+    - IPv4 CIDR Block: Select "Manually enter" and use one of the recommended values  
+    - IPv6 CIDR Block: `Do Not Assign`  
+  - During VPC creation, you’ll also create a **vSwitch**:  
+    - Name: Any name  
+    - Zone: `Beijing Zone L`  
+    - IPv4 CIDR: Automatically filled based on VPC CIDR  
+  > 💡 A **VPC (Virtual Private Cloud)** is a private network in the cloud. The **vSwitch** is a subnet within the VPC. We need both Hologres and ECS instances in the same VPC for fast internal communication.
+- **Instance Name**: Choose any name
+- **Service-linked Role**: Click **Create**
+
+Once everything is configured and you’ve received the coupon, click **Buy Now** to proceed.
+
+#### 2.2 When creating the ECS instance, please use the following configuration:
+- **Billing Method**: `Pay-as-you-go` (you can release it after testing)
+- **Region**: `China (Beijing)`
+- **Network & Security Group**:  
+  - VPC: Select the one you just created  
+  - vSwitch: Automatically populated
+- **Instance Type**:  
+  - Series: `Compute Optimized c9i`  
+  - Instance: `ecs.c9i.4xlarge` (16 vCPUs, 32 GiB RAM)  
+  *(This is not performance-critical — it only runs the client script.)*
+- **Image**:  
+  - `Alibaba Cloud Linux` → `Alibaba Cloud Linux 3.2104 LTS 64-bit`  
+- **System Disk**:  
+  - Size: `2048 GiB`  
+  - Performance: `PL3`  
+  *(Larger and faster disk improves import speed since we’re loading ~70GB of TSV data. IO on the ECS can be a bottleneck.)*
+- **Public IP Address**: ✅ Assign Public IPv4 Address
+- **Management Settings**:  
+  - Logon Credential: `Custom Password`  
+  - Username: `root`  
+  - Set a secure password
+
+Click **Create Order** to launch the instance.
+
+---
+
+### 3. Connect to the ECS and Run the Benchmark  
+
+After the ECS instance is ready:
+
+1. SSH into the ECS instance.
+2. Install Git and clone the repo:
+   ```bash
+   yum -y install git
+   git clone https://github.com/ClickHouse/JSONBench.git
+   cd JSONBench/hologres
+   ```
+3. Run the benchmark script:
+   ```
+    export PG_USER={AccessKeyID};export PG_PASSWORD={AccessKeySecret};export PG_HOSTNAME={Host};export PG_PORT={Port}
+   ./main.sh 5 {your_bluesky_data_dir}
+   ```
+
+   - **AccessKeyID & AccessKeySecret**:  
+     Go to the Alibaba Cloud Console → Profile Icon → **AccessKey** → Create one if needed.
+
+     You can also create a hologres user (Click your instance to enter instance detail page -> click "Account Management"  -> "Create Custom User" -> Choose "Superuser") and use the username and password for PG_USER and PG_PASSWORD.
+   - **Host & Port**:  
+     In the Hologres console, click your instance ID → Copy the **VPC Endpoint** (e.g., `hgxxx-cn-beijing-vpc.hologres.aliyuncs.com:xxxx`).  
+     - `Host` = domain without port (e.g., `hgxxx-cn-beijing-vpc.hologres.aliyuncs.com`)  
+     - `Port` = the number after `:`
+
+---
+
diff --git a/hologres/benchmark.sh b/hologres/benchmark.sh
@@ -0,0 +1,37 @@
+#!/bin/bash
+
+# Check if the required arguments are provided
+if [[ $# -lt 1 ]]; then
+    echo "Usage: $0 <DB_NAME> [RESULT_FILE]"
+    exit 1
+fi
+
+# Arguments
+DB_NAME="$1"
+RESULT_FILE="${2:-}"
+
+echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") START"
+
+# Construct the query log file name using $DB_NAME
+# QUERY_LOG_FILE="${OUTPUT_PREFIX}_query_log_${DB_NAME}.txt"
+QUERY_LOG_FILE="${OUTPUT_PREFIX}_${DB_NAME}.query_log"
+
+# Print the database name
+echo "Running queries on database: $DB_NAME"
+
+# Run queries and log the output
+./run_queries.sh "$DB_NAME" 2>&1 | tee "$QUERY_LOG_FILE"
+
+# Process the query log and prepare the result
+RESULT=$(cat "$QUERY_LOG_FILE" | grep -oP 'Time: \d+\.\d+ ms' | sed -r -e 's/Time: ([0-9]+\.[0-9]+) ms/\1/' | \
+awk '{ if (i % 3 == 0) { printf "[" }; printf $1 / 1000; if (i % 3 != 2) { printf "," } else { print "]," }; ++i; }')
+
+# Output the result
+if [[ -n "$RESULT_FILE" ]]; then
+    echo "$RESULT" > "$RESULT_FILE"
+    echo "Result written to $RESULT_FILE"
+else
+    echo "$RESULT"
+fi
+
+echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") DONE"
diff --git a/hologres/count.sh b/hologres/count.sh
@@ -0,0 +1,18 @@
+#!/bin/bash
+
+# Check if the required arguments are provided
+if [[ $# -lt 2 ]]; then
+    echo "Usage: $0 <DB_NAME> <TABLE_NAME>"
+    exit 1
+fi
+
+# Arguments
+DB_NAME="$1"
+TABLE_NAME="$2"
+
+echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") START"
+
+# Corrected SQL query
+$HOLOGRES_PSQL -d "$DB_NAME" -t -c "SELECT count(*) from $TABLE_NAME"
+
+echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") DONE"
diff --git a/hologres/create_and_load.sh b/hologres/create_and_load.sh
@@ -0,0 +1,42 @@
+#!/bin/bash
+
+# set -e
+
+# Check if the required arguments are provided
+if [[ $# -lt 7 ]]; then
+    echo "Usage: $0 <DB_NAME> <TABLE_NAME> <DDL_FILE> <DATA_DIRECTORY> <NUM_FILES> <SUCCESS_LOG> <ERROR_LOG>"
+    exit 1
+fi
+
+# Arguments
+DB_NAME="$1"
+TABLE_NAME="$2"
+DDL_FILE="$3"
+DATA_DIRECTORY="$4"
+NUM_FILES="$5"
+SUCCESS_LOG="$6"
+ERROR_LOG="$7"
+
+# Validate arguments
+[[ ! -f "$DDL_FILE" ]] && { echo "Error: DDL file '$DDL_FILE' does not exist."; exit 1; }
+[[ ! -d "$DATA_DIRECTORY" ]] && { echo "Error: Data directory '$DATA_DIRECTORY' does not exist."; exit 1; }
+[[ ! "$NUM_FILES" =~ ^[0-9]+$ ]] && { echo "Error: NUM_FILES must be a positive integer."; exit 1; }
+
+echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") START"
+
+echo "Drop and create database"
+$HOLOGRES_PSQL -c "DROP DATABASE IF EXISTS $DB_NAME" -c "CREATE DATABASE $DB_NAME"
+echo "Disable result cache."
+$HOLOGRES_PSQL -c "ALTER DATABASE $DB_NAME SET hg_experimental_enable_result_cache TO off;"
+
+echo "Execute DDL"
+$HOLOGRES_PSQL -d "$DB_NAME" -t < "$DDL_FILE"
+
+echo "[$(date '+%Y-%m-%d %H:%M:%S')] Load data"
+./load_data.sh "$DATA_DIRECTORY" "$DB_NAME" "$TABLE_NAME" "$NUM_FILES" "$SUCCESS_LOG" "$ERROR_LOG"
+
+echo "[$(date '+%Y-%m-%d %H:%M:%S')] Vacuum analyze the table"
+$HOLOGRES_PSQL -d "$DB_NAME" -c '\timing' -c "VACUUM $TABLE_NAME"
+$HOLOGRES_PSQL -d "$DB_NAME" -c '\timing' -c "ANALYZE $TABLE_NAME"
+$HOLOGRES_PSQL -d "$DB_NAME" -c '\timing' -c "select hologres.hg_full_compact_table('$TABLE_NAME')"
+echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") DONE"
diff --git a/hologres/ddl.sql b/hologres/ddl.sql
@@ -0,0 +1,33 @@
+set hg_experimental_enable_nullable_clustering_key = true;
+CREATE TABLE bluesky (
+    data JSONB NOT NULL,
+    sort_key TEXT GENERATED ALWAYS AS (
+        -- col1: kind
+        CASE 
+            WHEN data ->> 'kind' IS NULL THEN '[NULL]'
+            ELSE '[VAL]' || (data ->> 'kind')
+        END || '|__COL1__|' ||
+
+        -- col2: operation
+        CASE 
+            WHEN data -> 'commit' ->> 'operation' IS NULL THEN '[NULL]'
+            ELSE '[VAL]' || (data -> 'commit' ->> 'operation')
+        END || '|__COL2__|' ||
+
+        -- col3: collection
+        CASE 
+            WHEN data -> 'commit' ->> 'collection' IS NULL THEN '[NULL]'
+            ELSE '[VAL]' || (data -> 'commit' ->> 'collection')
+        END || '|__COL3__|' ||
+
+        -- col4: did
+        CASE 
+            WHEN data ->> 'did' IS NULL THEN '[NULL]'
+            ELSE '[VAL]' || (data ->> 'did')
+        END
+    ) STORED
+) WITH (clustering_key='sort_key');
+
+ALTER TABLE bluesky ALTER COLUMN data SET (enable_columnar_type = ON);
+CALL set_table_property('bluesky', 'dictionary_encoding_columns', 'data:auto');
+CALL set_table_property('bluesky', 'bitmap_columns', 'data:auto');
diff --git a/hologres/drop_tables.sh b/hologres/drop_tables.sh
@@ -0,0 +1,17 @@
+#!/bin/bash
+
+# Check if the required arguments are provided
+if [[ $# -lt 1 ]]; then
+    echo "Usage: $0 <DB_NAME>"
+    exit 1
+fi
+
+# Arguments
+DB_NAME="$1"
+
+echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") START"
+
+# echo "Dropping database"
+$HOLOGRES_PSQL -c "DROP DATABASE $DB_NAME"
+
+echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") DONE"
diff --git a/hologres/index_usage.sh b/hologres/index_usage.sh
@@ -0,0 +1,31 @@
+#!/bin/bash
+
+# Check if the required arguments are provided
+if [[ $# -lt 1 ]]; then
+    echo "Usage: $0 <DB_NAME>"
+    exit 1
+fi
+
+# Arguments
+DB_NAME="$1"
+EXPLAIN_CMD="$2"
+
+QUERY_NUM=1
+
+echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") START"
+
+cat queries.sql | while read -r query; do
+
+    # Print the query number
+    echo "------------------------------------------------------------------------------------------------------------------------"
+    echo "Index usage for query Q$QUERY_NUM:"
+    echo
+
+    $HOLOGRES_PSQL -d "$DB_NAME" -t -c "$EXPLAIN_CMD $query"
+
+    # Increment the query number
+    QUERY_NUM=$((QUERY_NUM + 1))
+
+done;
+
+echo "[$(date '+%Y-%m-%d %H:%M:%S')] $(basename "$0") DONE"
diff --git a/hologres/install.sh b/hologres/install.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+
+# https://www.postgresql.org/download/linux/ubuntu/
+
+sudo apt-get update
+sudo apt-get install -y postgresql-common postgresql-16