You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+25-7Lines changed: 25 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,15 +29,33 @@ JSONBench tests various aspects of the hardware as well: some queries require hi
29
29
30
30
### Fairness
31
31
32
-
Best efforts should be taken to understand the details of every tested system for a fair comparison.
33
-
It is allowed to apply various [indexing methods](https://clickhouse.com/blog/json-bench-clickhouse-vs-mongodb-elasticsearch-duckdb-postgresql#some-json-paths-can-be-used-for-indexes-and-data-sorting) whenever appropriate.
32
+
Databases must be benchmarked using their default settings.
33
+
As an exception, it is okay to specify non-default settings if they are a prerequisite for running the benchmark (example: increasing the maximum JVM heap size).
34
+
Non-mandatory settings, especially settings related to workload tuning, are not allowed.
34
35
35
-
It is [not allowed](https://clickhouse.com/blog/json-bench-clickhouse-vs-mongodb-elasticsearch-duckdb-postgresql#no-query-results-cache) to use query results caching or flatten JSON into multiple non-JSON colums at insertion time.
36
-
37
-
Some databases do have a JSON data type but they flatten nested JSON documents at insertion time to a single level (typically using `.` as separator between levels).
36
+
Some databases provide a native JSON data type that flattens nested JSON documents at insertion time to a single level, typically using `.` as separator between levels.
38
37
We consider this a grey zone.
39
-
On the one hand, this removes the possibility to restore the original documents, on the other hand, flattening may in many practical situations be acceptable.
40
-
The dashboard allows to filter out databases which do not retain the document structure (i.e. which flatten).
38
+
On the one hand, flattening removes the possibility to restore the original documents.
39
+
On the other hand, flattening is in many practical situations acceptable.
40
+
The dashboard provides a toggle which allows to show or hide databases that use flattening.
41
+
In the scope of JSONBench, we generally discourage flattening.
42
+
43
+
Other forms of flattening, in particular flattening JSON into multiple non-JSON colums at insertion time, are disallowed.
44
+
45
+
It is allowed to index the data using clustered indexes (= specifying the table sort order) or non-clustered indexes (= additional data structures, e.g. B-trees).
46
+
We recognize that there are pros and cons of this approach.
47
+
48
+
Pros:
49
+
- The JSON documents in JSONBench expose a common and rather static structure. Many real-world use cases expose similar patterns. It is a widely used practice to create indexes based on the anticipated data structure.
50
+
- The original [blog post](https://clickhouse.com/blog/json-bench-clickhouse-vs-mongodb-elasticsearch-duckdb-postgresql#some-json-paths-can-be-used-for-indexes-and-data-sorting) made use of indexes. Disallowing clustered indexes entirely would invalidate the original measurements.
51
+
52
+
Cons:
53
+
- There may be other real-world use cases where the JSON documents are highly dynamic (they share no common structure). In these cases, indexes are not useful.
54
+
- Many databases use indexes to prune the set of scanned data ranges or retrieve result rows directly (no scan). As a result, the benchmark indirectly also measures the effectiveness of such access path optimization techniques.
55
+
- Likewise, clustered indexes impact how well the data can be compressed. Again, this affects query runtimes indirectly.
56
+
- In some databases, clustered indexes must be build on top of flattened (e.g. concatenated and materialized) JSON documents. This technically contradicts the previous statement that flattening is discouraged.
57
+
58
+
It is not allowed to use cache query results (or generally intermediate results at the end of the query processing pipeline) between hot runs.
0 commit comments