Skip to content

Commit 6927a8e

Browse files
committed
Clarify rules regarding indexes
1 parent 4f10784 commit 6927a8e

File tree

1 file changed

+25
-7
lines changed

1 file changed

+25
-7
lines changed

README.md

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -29,15 +29,33 @@ JSONBench tests various aspects of the hardware as well: some queries require hi
2929

3030
### Fairness
3131

32-
Best efforts should be taken to understand the details of every tested system for a fair comparison.
33-
It is allowed to apply various [indexing methods](https://clickhouse.com/blog/json-bench-clickhouse-vs-mongodb-elasticsearch-duckdb-postgresql#some-json-paths-can-be-used-for-indexes-and-data-sorting) whenever appropriate.
32+
Databases must be benchmarked using their default settings.
33+
As an exception, it is okay to specify non-default settings if they are a prerequisite for running the benchmark (example: increasing the maximum JVM heap size).
34+
Non-mandatory settings, especially settings related to workload tuning, are not allowed.
3435

35-
It is [not allowed](https://clickhouse.com/blog/json-bench-clickhouse-vs-mongodb-elasticsearch-duckdb-postgresql#no-query-results-cache) to use query results caching or flatten JSON into multiple non-JSON colums at insertion time.
36-
37-
Some databases do have a JSON data type but they flatten nested JSON documents at insertion time to a single level (typically using `.` as separator between levels).
36+
Some databases provide a native JSON data type that flattens nested JSON documents at insertion time to a single level, typically using `.` as separator between levels.
3837
We consider this a grey zone.
39-
On the one hand, this removes the possibility to restore the original documents, on the other hand, flattening may in many practical situations be acceptable.
40-
The dashboard allows to filter out databases which do not retain the document structure (i.e. which flatten).
38+
On the one hand, flattening removes the possibility to restore the original documents.
39+
On the other hand, flattening is in many practical situations acceptable.
40+
The dashboard provides a toggle which allows to show or hide databases that use flattening.
41+
In the scope of JSONBench, we generally discourage flattening.
42+
43+
Other forms of flattening, in particular flattening JSON into multiple non-JSON colums at insertion time, are disallowed.
44+
45+
It is allowed to index the data using clustered indexes (= specifying the table sort order) or non-clustered indexes (= additional data structures, e.g. B-trees).
46+
We recognize that there are pros and cons of this approach.
47+
48+
Pros:
49+
- The JSON documents in JSONBench expose a common and rather static structure. Many real-world use cases expose similar patterns. It is a widely used practice to create indexes based on the anticipated data structure.
50+
- The original [blog post](https://clickhouse.com/blog/json-bench-clickhouse-vs-mongodb-elasticsearch-duckdb-postgresql#some-json-paths-can-be-used-for-indexes-and-data-sorting) made use of indexes. Disallowing clustered indexes entirely would invalidate the original measurements.
51+
52+
Cons:
53+
- There may be other real-world use cases where the JSON documents are highly dynamic (they share no common structure). In these cases, indexes are not useful.
54+
- Many databases use indexes to prune the set of scanned data ranges or retrieve result rows directly (no scan). As a result, the benchmark indirectly also measures the effectiveness of such access path optimization techniques.
55+
- Likewise, clustered indexes impact how well the data can be compressed. Again, this affects query runtimes indirectly.
56+
- In some databases, clustered indexes must be build on top of flattened (e.g. concatenated and materialized) JSON documents. This technically contradicts the previous statement that flattening is discouraged.
57+
58+
It is not allowed to use cache query results (or generally intermediate results at the end of the query processing pipeline) between hot runs.
4159

4260
## Goals
4361

0 commit comments

Comments
 (0)