-
Notifications
You must be signed in to change notification settings - Fork 21
Add Quickwit benchmark #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| @@ -0,0 +1,18 @@ | |||
| #!/bin/bash | |||
|
|
|||
| # The latest official release of Quickwit is too old, many unsupported tantivy quries. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what stops us from using the latest and greatest Docker builds?
Quickwit hasn't released a new version long time, and many people actually use a nightly build. I used a prebuilt binary here to avoid running Docker, but we can request a new binary release from the Quickwit team before merging this PR.
EDIT: Using Docker is fine, see the starrocks and singlestore submissions in this repository.
It is fine to add |
Some more debugging would be nice but we can again mark Q2 again as |
I don't really understand what that means. Is performance slower than it could be? |
That's good. As per the benchmark rules, as little as possible tuning should be applied (i.e. databases should run with their default settings). |
|
@cometkim I'm interested in merging this - thanks for the PR. Seems more work is needed, please ping me when this is ready. |
Notes:
Quickwit hasn't released a new version long time, and many people actually use a nightly build. I used a prebuilt binary here to avoid running Docker, but we can request a new binary release from the Quickwit team before merging this PR.
Quickwit does not support Q5. Testing this would require additional features, such as ElasticSearch's
bucket_script.The result for Q2 appears to be inconsistent with other engines. It's unclear whether this is a bug, a precision loss, or data corruption.
Quickwit's
termsaggregation does not support unlimited buckets. There is no explicit "return all" option in aggregations, and even if I specify an arbitrarily large number, the maximum number is limited by the searcher'saggregation_bucket_limitsettings.I haven't tuned the settings for the instance size.
This may differ significantly from actual production results. It's more like benchmarking Tantivy. Since Quickwit is typically configured with S3 and a Postgres metastore, I suspect there will be additional overhead by other components and networking.
There were several errors while loading the 1000m data, but they weren't logged, so I don't know the exact cause. I need to run this at least once more to check the data quality.