-
Notifications
You must be signed in to change notification settings - Fork 25.5k
[9.2] Adds DiskBBQ to the BBQ documentation #136371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Pinging @elastic/core-docs (Team:Docs) |
🔍 Preview links for changed docs |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good
Co-authored-by: Benjamin Trent <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Reading the blog (we could also add a link to the blog for interested readers too in a TIP) feels like there's some additional context that would help users:
On the technical details:
- Should we mention hierarchical k-means and SOAR?
- Should we mention the bulk scoring mechanism and off-heap operations that enable reading vectors directly from files for optimized performance?
We could probably mention perf characteristics like the significant indexing speed advantage (see the blog's benchmark):
- graceful performance degradation in low-memory scenarios
- competitive search latency even when entire index fits in memory
Reading the blog the decision-making guidance is clearer:
- when to use hnsw instead: when you have abundant off-heap memory (or budget for it), and perform few index updates where indexing costs are low
- we could be more explicit here about cost-sensitivity as a key factor favoring diskbbq
Also there is an example in the blog that uses the visit_percentage
query parameter for granular control over how many vectors the search considers
WDYT @benwtrent?
@leemthompo I am not sure the goals of these particular document pages. But yes, we could call out some of the design if we wanted to. I would focus on "why/when should I use this vs. other things" so low memory and cost should be the focus. |
I mean the goal is to explain what the thing is, how to use it, and why to use it versus alternatives. The goal of a blog is to make the feature visible and give some additional context about how/why we built it. It shouldn't be the source of truth for users looking to actually use the feature long term, because the blog isn't maintained over time. Any relevant context that would help a user make the decision is pertinent, but maybe some of the low-level technical implementation details would be overkill. We should at the very least provide a link to the blog where we go into more detail and provide more context. @kosabogi I'd make sure all of the information pertaining to memory savings from the blog are replicated here, and also be more explicit about the cost saving factor, we just haven't said that out loud here. Also there's an additional example in the blog that uses the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thanks for iterating! 🚀
This update introduces a new section describing the
bbq_disk
option for dense vector fields.Related issue: elastic/docs-content#3008