Skip to content

Conversation

wangmj17
Copy link

This pull request introduces benchmark results for Hologres.

Hologres is a commercial real-time data warehouse product developed by Alibaba Cloud. Unfortunately, we cannot provide deployment options on other machines. We apologize that verifying the performance requires purchasing a Hologres instance on Alibaba Cloud. Testing methods are described in README.md, and we can offer vouchers to assist with performance testing.

@CLAassistant
Copy link

CLAassistant commented Sep 15, 2025

CLA assistant check
All committers have signed the CLA.

@TimothyDing
Copy link

@rschu1ze Hello,Could you help us review it?

echo "[$(date '+%Y-%m-%d %H:%M:%S')] Vacuum analyze the table"
$HOLOGRES_PSQL -d "$DB_NAME" -c '\timing' -c "VACUUM $TABLE_NAME"
$HOLOGRES_PSQL -d "$DB_NAME" -c '\timing' -c "ANALYZE $TABLE_NAME"
$HOLOGRES_PSQL -d "$DB_NAME" -c '\timing' -c "select hologres.hg_full_compact_table('$TABLE_NAME')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey boy, extra commands should not be used, like VACUUM, ANALYZE, compact

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is allowed, you can see https://github.com/ClickHouse/JSONBench/blob/main/postgresql/create_and_load.sh

Also in ClickBench a lot of Postgres-based systems use commands like "vacuum", "analyze"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, hg_full_compact_table performs essentially the same function as VACUUM, with the added benefit of ensuring that all files are fully compacted and compressed using ZSTD. Without this step, some files might be compressed with ZSTD while others are not, which could lead to inconsistencies in performance stability and overall storage size. That said, if @rschu1ze strongly prefers that we remove it, we can do so—there is no significant impact on the results.

# https://www.postgresql.org/download/linux/ubuntu/

sudo apt-get update
sudo apt-get install -y postgresql-common postgresql-16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would suggest print all settings of the db after installation, so that everyone can reproduce the test result of this saas product

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you are saying, this is just installing standard postgresql client, it has nothing to do with settings. The scripts in pull request already provide everything needed to reproduce result.


ALTER TABLE bluesky ALTER COLUMN data SET (enable_columnar_type = ON);
CALL set_table_property('bluesky', 'dictionary_encoding_columns', 'data:auto');
CALL set_table_property('bluesky', 'bitmap_columns', 'data:auto');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An extra index, such as bitmap_columns, would be considered a form of manual tuning.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule is very clear: "It is allowed to apply various indexing methods whenever appropriate." Bitmap is a very common indexing methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified here: #95

@rschu1ze
Copy link
Member

rschu1ze commented Oct 7, 2025

I will need some help to reproduce the measurements:
ClickHouse/ClickBench#626 (comment)

@wangmj17
Copy link
Author

wangmj17 commented Oct 9, 2025

Sorry for the delay, recently it was our national holiday and we had a long vacation. My colleague replied you about the detailed steps in the ClickBench PR. The details for JsonBench is similar, the main difference is choosing 32CU when buying Hologres instance

1. Create an Alibaba Cloud Account and Provide Your UID

Please first create an Alibaba Cloud account. After registration, kindly provide us with your UID (Account ID), which you can find by:

  • Clicking on your profile icon in the top-right corner → Account Center
    We will issue you an Alibaba Cloud coupon to support your testing, so please share your UID with us.

2. Purchase an Alibaba Cloud Hologres Instance

When creating the Hologres instance, please use the following configuration:

  • Region: China (Beijing)
    (The new version is in gray-scale release in China (Beijing). Choosing this region ensures you can access the latest features)
  • Specifications: ✅ Compute Group Type
  • Zone: Zone L
  • Gateway Nodes: 2 Pieces
  • Reserved Computing Resources of Virtual Warehouse: 32 CU
    (This is the actual compute unit (CU) value used in the JSON result files.)
  • Allocate to Initial Virtual Warehouse: Yes
  • Enable Serverless Computing: ✅ True (Enabled)
  • Storage Redundancy Type: LRS
  • VPC & vSwitch:
    • You need to create a new VPC.
      • Region: China (Beijing)
      • Name: Any name you prefer
      • IPv4 CIDR Block: Select "Manually enter" and use one of the recommended values
      • IPv6 CIDR Block: Do Not Assign
    • During VPC creation, you’ll also create a vSwitch:
      • Name: Any name
      • Zone: Beijing Zone L
      • IPv4 CIDR: Automatically filled based on VPC CIDR

    💡 A VPC (Virtual Private Cloud) is a private network in the cloud. The vSwitch is a subnet within the VPC. We need both Hologres and ECS instances in the same VPC for fast internal communication.

  • Instance Name: Choose any name
  • Service-linked Role: Click Create

Once everything is configured and you’ve received the coupon, click Buy Now to proceed.


3. Purchase an ECS Instance (as Client Machine)

This ECS instance acts as a client to download data, run queries, and load data into Hologres.

  • Billing Method: Pay-as-you-go (you can release it after testing)
  • Region: China (Beijing)
  • Network & Security Group:
    • VPC: Select the one you just created
    • vSwitch: Automatically populated
  • Instance Type:
    • Series: Compute Optimized c9i
    • Instance: ecs.c9i.4xlarge (16 vCPUs, 32 GiB RAM)
      (This is not performance-critical — it only runs the client script.)
  • Image:
    • Alibaba Cloud LinuxAlibaba Cloud Linux 3.2104 LTS 64-bit
  • System Disk:
    • Size: 2048 GiB
    • Performance: PL3
      (Larger and faster disk improves import speed since we’re loading ~70GB of TSV data. IO on the ECS can be a bottleneck.)
  • Public IP Address: ✅ Assign Public IPv4 Address
  • Management Settings:
    • Logon Credential: Custom Password
    • Username: root
    • Set a secure password

Click Create Order to launch the instance.


4. Connect to the ECS and Run the Benchmark

After the ECS instance is ready:

  1. SSH into the ECS instance.

  2. Install Git and clone the repo:

    yum -y install git
    git clone https://github.com/ClickHouse/JSONBench.git
    cd JSONBench/hologres
  3. Run the benchmark script:

     export PG_USER={AccessKeyID};export PG_PASSWORD={AccessKeySecret};export PG_HOSTNAME={Host};export PG_PORT={Port}
    ./main.sh 5 {your_bluesky_data_dir}
    
    • AccessKeyID & AccessKeySecret:
      Go to the Alibaba Cloud Console → Profile Icon → AccessKey → Create one if needed.
      You can also create a hologres user (Click your instance to enter instance detail page -> click "Account Management" -> "Create Custom User" -> Choose "Superuser") and use the username and password for PG_USER and PG_PASSWORD.
    • Host & Port:
      In the Hologres console, click your instance ID → Copy the VPC Endpoint (e.g., hgxxx-cn-beijing-vpc.hologres.aliyuncs.com:xxxx).
      • Host = domain without port (e.g., hgxxx-cn-beijing-vpc.hologres.aliyuncs.com)
      • Port = the number after :

✅ That’s it! You’re all set to run the benchmark.
Let us know if you encounter any issues — we’re happy to help. Also, we’ll update the README.md shortly with these instructions for future users.

Thank you again for your valuable feedback and contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants