Skip to content

Conversation

kosabogi
Copy link
Contributor

This update introduces a new section describing the bbq_disk option for dense vector fields.

Related issue: elastic/docs-content#3008

@kosabogi kosabogi added >docs General docs changes Team:Docs Meta label for docs team labels Oct 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/core-docs (Team:Docs)

Copy link
Contributor

github-actions bot commented Oct 10, 2025

🔍 Preview links for changed docs

Copy link
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@kosabogi kosabogi requested review from benwtrent and removed request for dan-rubinstein October 10, 2025 11:53
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good

Copy link
Contributor

@leemthompo leemthompo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Reading the blog (we could also add a link to the blog for interested readers too in a TIP) feels like there's some additional context that would help users:

On the technical details:

  • Should we mention hierarchical k-means and SOAR?
  • Should we mention the bulk scoring mechanism and off-heap operations that enable reading vectors directly from files for optimized performance?

We could probably mention perf characteristics like the significant indexing speed advantage (see the blog's benchmark):

  • graceful performance degradation in low-memory scenarios
  • competitive search latency even when entire index fits in memory

Reading the blog the decision-making guidance is clearer:

  • when to use hnsw instead: when you have abundant off-heap memory (or budget for it), and perform few index updates where indexing costs are low
  • we could be more explicit here about cost-sensitivity as a key factor favoring diskbbq

Also there is an example in the blog that uses the visit_percentage query parameter for granular control over how many vectors the search considers

WDYT @benwtrent?

@benwtrent
Copy link
Member

@leemthompo I am not sure the goals of these particular document pages.

But yes, we could call out some of the design if we wanted to.

I would focus on "why/when should I use this vs. other things" so low memory and cost should be the focus.

@leemthompo
Copy link
Contributor

@leemthompo I am not sure the goals of these particular document pages.

I mean the goal is to explain what the thing is, how to use it, and why to use it versus alternatives. The goal of a blog is to make the feature visible and give some additional context about how/why we built it. It shouldn't be the source of truth for users looking to actually use the feature long term, because the blog isn't maintained over time.

Any relevant context that would help a user make the decision is pertinent, but maybe some of the low-level technical implementation details would be overkill. We should at the very least provide a link to the blog where we go into more detail and provide more context.

@kosabogi I'd make sure all of the information pertaining to memory savings from the blog are replicated here, and also be more explicit about the cost saving factor, we just haven't said that out loud here. Also there's an additional example in the blog that uses the visit_percentage query parameter, that's definitely the kind of thing that should be in the docs.

Copy link
Contributor

@leemthompo leemthompo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks for iterating! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>docs General docs changes Team:Docs Meta label for docs team v9.2.0 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants