Skip to content

Conversation

benwtrent
Copy link
Member

This adds some more counts for dense_vector field mapping stats. This allows for seeing the number of mappings with a given element type, similarity, or index type.

Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Sep 26, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @benwtrent, I've created a changelog YAML for you.

indexedVectorDimMax = UNSET;
}

DenseVectorFieldStats(StreamInput in) throws IOException {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that this isn't actually used anywhere. I am going to double check that this is ok. But the underlying fieldstats serialization isn't via named writables & consequently the DenseVectorFieldStats is never actually written or read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to remove

…ub.com:benwtrent/elasticsearch into feature/add-dense-vector-details-cluster-stats
@benwtrent
Copy link
Member Author

@elasticmachine update branch

stats = fieldTypes.computeIfAbsent(type, DenseVectorFieldStats::new);
boolean indexed = fieldMapping.containsKey("index") ? (boolean) fieldMapping.get("index") : false;
DenseVectorFieldStats vStats = (DenseVectorFieldStats) stats;
if (fieldMapping.containsKey("similarity")) {
Copy link
Contributor

@mayya-sharipova mayya-sharipova Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought similarity can only be provided when indexed=true? Should we put this under indexed (line 101)?

DenseVectorFieldStats vStats = (DenseVectorFieldStats) stats;
Object indexOptions = fieldMapping.get("index_options");
// NOTE, while the default for `float` is now `int8_hnsw`, that is actually added to the mapping
// if the value is truly missing & we are indexed, we default to hnsw.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what condition this could happen that the value for index_options.type is missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mayya-sharipova when we default to index: true, but the element type is byte. in that scenario we won't provide a value to the mapping and its instead hnsw.

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benwtrent Thanks, nice addition to vector stats. I left a couple of comments.

@benwtrent benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Oct 1, 2024
Copy link
Member

@carlosdelest carlosdelest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@benwtrent
Copy link
Member Author

@elasticmachine update branch

Copy link
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with just a minor comment

Comment on lines +26 to +28
Map<String, Integer> vectorIndexTypeCount; // count of mappings by index type
Map<String, Integer> vectorSimilarityTypeCount; // count of mappings by similarity
Map<String, Integer> vectorElementTypeCount; // count of mappings by element type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be concerned about these becoming huge (and hence OOMs & co)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tteofili I wouldn't think so. We have a static number of element types, index types, and similarity types. Since we are counting each separately, we won't have combinatoric explosion. These each will be hashmaps of < 10 elements each or so.

@elasticsearchmachine elasticsearchmachine merged commit 8ed0df4 into elastic:main Oct 1, 2024
16 checks passed
@benwtrent benwtrent deleted the feature/add-dense-vector-details-cluster-stats branch October 1, 2024 15:59
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Oct 1, 2024
…113607)

This adds some more counts for dense_vector field mapping stats. This
allows for seeing the number of mappings with a given element type,
similarity, or index type.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

elasticsearchmachine pushed a commit that referenced this pull request Oct 3, 2024
…#113884)

This adds some more counts for dense_vector field mapping stats. This
allows for seeing the number of mappings with a given element type,
similarity, or index type.

Co-authored-by: Elastic Machine <[email protected]>
matthewabbott pushed a commit to matthewabbott/elasticsearch that referenced this pull request Oct 4, 2024
…113607)

This adds some more counts for dense_vector field mapping stats. This
allows for seeing the number of mappings with a given element type,
similarity, or index type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.16.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants