vortex-data
diff --git a/‎.github/workflows/generate-results.yml‎
Lines changed: 5 additions & 4 deletions b/‎.github/workflows/generate-results.yml‎
Lines changed: 5 additions & 4 deletions
diff --git a/‎README.md‎
Lines changed: 46 additions & 94 deletions b/‎README.md‎
Lines changed: 46 additions & 94 deletions
diff --git a/‎alloydb/README.md‎
Lines changed: 2 additions & 2 deletions b/‎alloydb/README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎alloydb/benchmark.sh‎
100644100755 b/‎alloydb/benchmark.sh‎
100644100755
diff --git a/‎alloydb/results/gcp.128GB_tuned.json‎
Lines changed: 46 additions & 43 deletions b/‎alloydb/results/gcp.128GB_tuned.json‎
Lines changed: 46 additions & 43 deletions
@@ -1,7 +1,8 @@
-name: "Generate index.html"
+name: "Build the website"
 on:
+  workflow_dispatch:  # This allows manual trigger from the UI
   push:
-    branches:    
+    branches:
       - main
 
 permissions:
@@ -10,8 +11,8 @@ permissions:
 jobs:
   build:
     runs-on: ubuntu-latest
-    env: 
-      CI_COMMIT_MESSAGE: "[bot] update index.html"
+    env:
+      CI_COMMIT_MESSAGE: "[bot] Build the website"
       CI_COMMIT_AUTHOR: github
     steps:
     - uses: actions/checkout@v3
 
@@ -54,12 +54,12 @@ TLDR: *All Benchmarks Are ~~Bastards~~ Liars*.
 
 To introduce a new system, simply copy-paste one of the directories and edit the files accordingly:
 
-- `benchmark.sh`: this is the main script to run the benchmark on a fresh VM; Ubuntu 22.04 or newer should be used by default, or any other system if specified in the comments. The script may not necessarily run in a fully automated manner - it is recommended always to copy-paste the commands one by one and observe the results. For managed databases, if the setup requires clicking in the UI, write a `README.md` instead.
+- `benchmark.sh`: this is the main script to run the benchmark on a fresh VM; Ubuntu 24.04 or newer should be used by default. For databases that could be installed locally, the script should be able to run in a fully automated manner, so it can be used as a cloud-init script. It should output the results in the following format: - one or more lines `Load time: 1234` with the time in seconds; - a line `Data size: 1234567890` with the data size in bytes; the data size should include indexes and transaction logs if applicable; - 43 consecutive lines in the form of `[1.234, 5.678, 9.012],` for the runtimes of every query; - the output may include other lines with the logs, that are not used for the report. For managed databases, if the setup requires clicking in the UI, write a `README.md` instead.
 - `README.md`: contains comments and observations if needed. For managed databases, it can describe the setup procedure to be used instead of a shell script.
 - `create.sql`: a CREATE TABLE statement. If it's a NoSQL system, another file like `wtf.json` can be presented.
 - `queries.sql`: contains 43 queries to run;
 - `run.sh`: a loop for running the queries; every query is run three times; if it's a database with local on-disk storage, the first query should be run after dropping the page cache;
-- `results`: put the .json files with the results for every hardware configuration there.
+- `results`: put the .json files with the results for every hardware configuration there. Please double-check that each file is valid JSON (e.g., no comma errors).
 
 To introduce a new result for an existing system on different hardware configurations, add a new file to `results`.
 
@@ -144,7 +144,7 @@ We allow but do not recommend creating scoreboards from this benchmark or saying
 
 There is a web page to navigate across benchmark results and present a summary report. It allows filtering out some systems, setups, or queries. For example, if you found some subset of the 43 queries are irrelevant, you can simply exclude them from the calculation and share the report without these queries.
 
-You can select the summary metric from one of the following: "Cold Run", "Hot Run", "Load Time", and "Data Size". If you select the "Load Time" or "Data Size", the entries will be simply ordered from best to worst, and additionally, the ratio to the best non-zero result will be shown (the number of times one system is worse than the best system in this metric). Load time can be zero for stateless query engines like `clickhouse-local` or `Amazon Athena`.
+You can select the summary metric from one of the following: "Cold Run", "Hot Run", "Load Time", "Data Size", and "Combined". If you select the "Load Time" or "Data Size", the entries will be simply ordered from best to worst, and additionally, the ratio to the best non-zero result will be shown (the number of times one system is worse than the best system in this metric). Load time can be zero for stateless query engines like `clickhouse-local` or `Amazon Athena`.
 
 If you select "Cold Run" or "Hot Run", the aggregation across the queries is performed in the following way:
 
@@ -170,6 +170,7 @@ For example, one system crashed while trying to run a query which can highlight
 
 Why geometric mean? The ratios can only be naturally averaged in this way. Imagine there are two queries and two systems. The first system ran the first query in 1s and the second query in 20s. The second system ran the first query in 2s and the second query in 10s. So, the first system is two times faster on the first query and two times slower on the second query and vice-versa. The final score should be identical for these systems.
 
+The "Combined" metric summarizes all the results as a weighted geometric mean with the following weights: load time: 10%, data size: 10%, cold runtime: 20%, hot runtime: 60%.
 
 ## History and Motivation
 
@@ -201,109 +202,60 @@ We also introduced the [Hardware Benchmark](https://benchmark.clickhouse.com/har
 
 ## Systems Included
 
-- [x] ClickHouse
-- [x] ClickHouse on local Parquet files
-- [x] ClickHouse operating like "Athena" on remote Parquet files
-- [x] ClickHouse on a VFS over HTTPs on CDN
-- [x] MySQL InnoDB
-- [x] MySQL MyISAM
-- [x] MariaDB
-- [x] MariaDB ColumnStore
-- [x] MemSQL/SingleStore
-- [x] PostgreSQL
-- [x] Greenplum
-- [x] TimescaleDB
-- [x] Citus
-- [x] Vertica (without publishing)
-- [x] QuestDB
-- [x] chdb
-- [x] DuckDB
-- [x] DuckDB over local Parquet files
+ClickBench provides [publicly available benchmark results for over 60 database management systems](https://benchmark.clickhouse.com/).
+
+By default, all tests are run on c6a.4xlarge VM in AWS with 500 GB gp2.
+
+In addition, there are also systems where the code to run the benchmark is provided, but the results cannot be published.
+Currently, this includes
+
+- Vertica
+
+Please help us add more systems and run the benchmarks on more types of VMs:
+
+- [ ] Actian Vector
+- [ ] Apache Ignite
+- [ ] Apache Kudu
+- [ ] Apache Kylin
+- [ ] Azure Synapse
+- [ ] Boilingdata
+- [ ] CockroachDB Serverless
+- [ ] Databricks
+- [ ] DolphinDB
+- [ ] Dremio (without publishing)
 - [ ] DuckDB operating like "Athena" on remote Parquet files
-- [x] MonetDB
-- [x] mapD/Omnisci/HeavyAI
-- [x] Databend
-- [x] DataFusion
-- [x] ByteHouse
-- [x] Doris/PALO
-- [x] SelectDB
-- [x] Druid
-- [x] Pinot
-- [x] CrateDB
-- [x] Spark SQL
-- [x] Starrocks
-- [ ] ShitholeDB
+- [ ] EventQL
+- [ ] Exasol
 - [ ] Hive
-- [x] Hydra
+- [ ] Hydrolix
 - [ ] Impala
-- [x] Hyper
-- [x] Umbra
-- [x] SQLite
-- [x] Redshift
-- [x] Redshift Serverless
-- [ ] Redshift Spectrum
-- [ ] Presto
-- [ ] Trino
-- [x] Amazon Athena
-- [x] Bigquery (without publishing)
-- [x] Snowflake
-- [ ] Rockset
-- [x] CockroachDB
-- [ ] CockroachDB Serverless
-- [ ] Databricks
-- [ ] Planetscale (without publishing)
-- [ ] TiDB (TiFlash)
-- [x] Amazon RDS Aurora for MySQL
-- [x] Amazon RDS Aurora for Postgres
 - [ ] InfluxDB
-- [ ] TDEngine
-- [x] MongoDB
-- [ ] Cassandra
-- [ ] ScyllaDB
-- [x] Elasticsearch
-- [ ] Apache Ignite
-- [x] Motherduck
-- [x] Infobright
-- [ ] Actian Vector
+- [ ] LocustDB
 - [ ] Manticore Search
-- [x] Vertica (without publishing)
-- [ ] Azure Synapse
-- [ ] Starburst Galaxy
 - [ ] MS SQL Server with Column Store Index (without publishing)
-- [ ] Dremio (without publishing)
-- [ ] Exasol
-- [ ] LocustDB
-- [ ] EventQL
-- [x] Apache Drill
-- [ ] Apache Kudu
-- [ ] Apache Kylin
-- [x] S3 select command in AWS
-- [x] Kinetica
-- [ ] YDB
 - [ ] OceanBase
-- [ ] Boilingdata
-- [x] Byteconity
-- [ ] DolphinDB
-- [x] Oxla
+- [ ] Planetscale (without publishing)
+- [ ] Presto
 - [ ] Quickwit
-- [x] AlloyDB
-- [x] ParadeDB
-- [x] GlareDB
+- [ ] Redshift Spectrum
+- [ ] Rockset 
 - [ ] Seafowl
+- [ ] ShitholeDB
 - [ ] Sneller
-- [x] Tablespace
-- [x] Tembo
-- [x] Cloudberry
-- [x] Daft
-- [x] Pandas
-- [x] Polars
-- [x] OctoSQL
-- [x] VictoriaLogs
-- [x] Hologres
+- [ ] Starburst Galaxy
+- [ ] Trino
+- [ ] TDEngine
 
-By default, all tests are run on c6a.4xlarge VM in AWS with 500 GB gp2.
+The list above _may_ include systems that cannot run ClickBench for various limitations.
+Systems that have been identified to have known limitations or issues and could not be benchmarked are:
 
-Please help us add more systems and run the benchmarks on more types of VMs.
+- Cassandra (see [discussion](https://github.com/ClickHouse/ClickBench/issues/384))
+- csvq (see [README](https://github.com/ClickHouse/ClickBench/tree/main/csvq))
+- dsq (see [README](https://github.com/ClickHouse/ClickBench/tree/main/dsq))
+- Hydrolix (see [README](https://github.com/ClickHouse/ClickBench/tree/main/hydrolix))
+- LoctusDB (see [README](https://github.com/ClickHouse/ClickBench/tree/main/locustdb))
+- ScyllaDB (see [discussion](https://github.com/ClickHouse/ClickBench/issues/384))
+- S3 select command in AWS (see [README](https://github.com/ClickHouse/ClickBench/tree/main/s3select))
 
 ## Similar Projects
 
 
@@ -10,7 +10,7 @@ Note: As of current date, AlloyDB can only be accessed by setting up Alloy Auth
 2. Setup a EC2 instance with 30gb disk
 	a. SSH in and download Alloy Auth Proxy https://cloud.google.com/alloydb/docs/auth-proxy/overview
 	```bash
-	wget https://storage.googleapis.com/alloydb-auth-proxy/v1.5.0/alloydb-auth-proxy.linux.amd64 -O alloydb-auth-proxy
+	wget --continue --progress=dot:giga https://storage.googleapis.com/alloydb-auth-proxy/v1.5.0/alloydb-auth-proxy.linux.amd64 -O alloydb-auth-proxy
 
 	chmod +x alloydb-auth-proxy
 	```
@@ -26,7 +26,7 @@ Note: As of current date, AlloyDB can only be accessed by setting up Alloy Auth
 	  
 4. Download public dataset and required scripts
 ```bash
- wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
+wget --continue --progress=dot:giga 'https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz'
 ```
 Load scripts in this repo
 
 
@@ -4,54 +4,57 @@
     "machine": "16 vCPU 128GB",
     "cluster_size": "serverless",
     "comment": "",
+    "proprietary": "yes",
+    "tuned": "no",
 
     "tags": ["C", "column-oriented", "PostgreSQL compatible", "managed", "gcp"],
 
+    "load_time": 0,
     "data_size": 9941379875,
 
     "result": [
-        [3.58975, 3.57992, 3.54662],
-        [0.08625, 0.0838, 0.08574],
-        [3.10807, 2.97731, 2.9985],
-        [2.44647, 2.36909, 2.39708],
-        [41.17192, 39.8976, 41.06776],
-        [133.67873, 131.0997, 128.0839],
-        [4.12758, 4.0239, 4.08013],
-        [0.09695, 0.09319, 0.09563],
-        [36.86525, 35.32732, 35.25588],
-        [48.45764, 47.93144, 46.52182],
-        [2.97389, 2.89167, 2.95393],
-        [2.72543, 2.63553, 2.69429],
-        [23.60888, 23.60363, 23.46187],
-        [33.28765, 32.03713, 31.81454],
-        [12.95648, 12.47305, 12.33908],
-        [21.37397, 21.35981, 21.19967],
-        [46.84838, 46.70314, 46.10616],
-        [0.06719, 0.06616, 0.06635],
-        [49.55343, 49.13112, 48.82092],
-        [0.06027, 0.05733, 0.05961],
-        [7.33196, 7.22324, 7.31305],
-        [7.30749, 7.03247, 7.22055],
-        [2.55208, 2.53827, 2.4771],
-        [1.20815, 1.16618, 1.19975],
-        [0.76076, 0.73289, 0.75499],
-        [6.91366, 6.74895, 6.64693],
-        [1.02818, 1.0217, 0.98556],
-        [6.67123, 6.46139, 6.44516],
-        [73.94566, 71.68655, 73.74145],
-        [0.00645, 0.00614, 0.00639],
-        [11.49935, 11.35433, 10.95156],
-        [23.93414, 23.63846, 22.78162],
-        [204.59582, 195.91745, 203.92405],
-        [198.93847, 190.58213, 191.488],
-        [197.07735, 193.70621, 190.22602],
-        [21.8236, 21.72214, 21.08265],
-        [0.5763, 0.55371, 0.57517],
-        [0.15114, 0.14738, 0.14813],
-        [0.10535, 0.1045, 0.10124],
-        [61.08416, 59.11649, 60.05224],
-        [0.15439, 0.1529, 0.15313],
-        [0.10687, 0.10602, 0.10481],
-        [0.06382, 0.06367, 0.06166]
+        [3.589, 3.579, 3.546],
+        [0.086, 0.083, 0.085],
+        [3.108, 2.977, 2.998],
+        [2.446, 2.369, 2.397],
+        [41.171, 39.897, 41.067],
+        [133.678, 131.099, 128.083],
+        [4.127, 4.023, 4.080],
+        [0.096, 0.093, 0.095],
+        [36.865, 35.327, 35.255],
+        [48.457, 47.931, 46.521],
+        [2.973, 2.891, 2.953],
+        [2.725, 2.635, 2.694],
+        [23.608, 23.603, 23.461],
+        [33.287, 32.037, 31.814],
+        [12.956, 12.473, 12.339],
+        [21.373, 21.359, 21.199],
+        [46.848, 46.703, 46.106],
+        [0.067, 0.066, 0.066],
+        [49.553, 49.131, 48.820],
+        [0.060, 0.057, 0.059],
+        [7.331, 7.223, 7.313],
+        [7.307, 7.032, 7.220],
+        [2.552, 2.538, 2.477],
+        [1.208, 1.166, 1.199],
+        [0.760, 0.732, 0.754],
+        [6.913, 6.748, 6.646],
+        [1.028, 1.021, 0.985],
+        [6.671, 6.461, 6.445],
+        [73.945, 71.686, 73.741],
+        [0.006, 0.006, 0.006],
+        [11.499, 11.354, 10.951],
+        [23.934, 23.638, 22.781],
+        [204.595, 195.917, 203.924],
+        [198.938, 190.582, 191.488],
+        [197.077, 193.706, 190.226],
+        [21.823, 21.722, 21.082],
+        [0.576, 0.553, 0.575],
+        [0.151, 0.147, 0.148],
+        [0.105, 0.104, 0.101],
+        [61.084, 59.116, 60.052],
+        [0.154, 0.152, 0.153],
+        [0.106, 0.106, 0.104],
+        [0.063, 0.063, 0.061]
     ]
 }