Skip to content

Conversation

@pmpailis
Copy link
Contributor

After the recent benchmarking deemed it sensible to do so, this PR changes the default index options for float vector fields with >= 384 dimensions to bbq_hnsw. Otherwise, we keep the int8_hnsw that was set previously in #106836

@pmpailis pmpailis added >enhancement >docs General docs changes :Search Foundations/Mapping Index mappings, including merging and defining field types :Search Relevance/Vectors Vector search labels Jun 23, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jun 23, 2025

🔍 Preview links for changed docs:

🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes.

@pmpailis pmpailis added the test-full-bwc Trigger full BWC version matrix tests label Jun 23, 2025
@pmpailis pmpailis marked this pull request as ready for review June 23, 2025 12:44
@elasticsearchmachine elasticsearchmachine added Team:Docs Meta label for docs team Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Jun 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@pmpailis
Copy link
Contributor Author

Setting it for review, but there are some test cases failing for search.vectors/70_dense_vector_telemetry ; looking into it.

@benwtrent benwtrent self-requested a review June 23, 2025 13:13
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some tests around default values and ensuring that dynamic dims still have the default of bbq hnsw set.

Comment on lines +244 to +246
// This is defined as updatable because it can be updated once, from [null] to a valid dim size,
// by a dynamic mapping update. Once it has been set, however, the value cannot be changed.
this.dims = new Parameter<>("dims", true, () -> null, (n, c, o) -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a sneaky suspicion that if dims are not set, but we then index a vector of 384 dims, and set the dims to 384, the index type will not be set to bbq_hnsw.

Can you write a test to confirm/deny this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, your suspicion was right :) Updating to use the builder's dimension setter when reading the float array., so that we can have access to that when setting up default value.

Added tests in DynamicMappingIT and DynamicMappingTests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmpailis there is still a bug.

PUT vectors
{
	"mappings": {
		"properties": {
			"vector": {
				"type": "dense_vector"
			}
		}
	}
}

POST vectors/_doc/1
{
	"vector": [0.1, 0.2, 0.3, 0.4, 0.5,..., 0.385]
}

GET vectors

See that the mapped type is not bbq_hnsw.

Its possible to create a dense_vector mapping with a null dims. Then later, when the doc is parsed, we update the mapping to apply that dim update.

I think in that same update, we should adjust the default index type to ensure its applied.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the tricky part is knowing if index_options were set by the user (specifically to the defaults, or whatever), or not set at all.

Copy link
Contributor Author

@pmpailis pmpailis Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that's a nice catch. I've tried restoring the default value to null if dims are not present, and resetting this once we have a document in place. So now:

  • if a user provides dims, then we take them into account from the beginning to setup a default value
  • if a user does not provide any dimensions, then the mappings do not initialize any information for the index_options either, and we instead delay this decision for when we know dims as well

For example:

PUT vectors
{
	"mappings": {
		"properties": {
			"vector": {
				"type": "dense_vector"
			}
		}
	}
}
GET _mappings

->

{
    "vectors": {
        "mappings": {
            "properties": {
                "vector": {
                    "type": "dense_vector",
                    "index": true,
                    "similarity": "cosine"
                }
            }
        }
    }
}
PUT vectors/_doc/1
{
  "vector": [1.2, 3.4, ..., 8.9] 
}
GET _mappings

->

{
    "vectors": {
        "mappings": {
            "properties": {
                "vector": {
                    "type": "dense_vector",
                    "dims": 1512,
                    "index": true,
                    "similarity": "cosine",
                    "index_options": {
                        "type": "bbq_hnsw",
                        "m": 16,
                        "ef_construction": 100,
                        "rescore_vector": {
                            "oversample": 3
                        }
                    }
                }
            }
        }
    }
}

Not overly familiar with mappings and how they're updated, so i might be missing something (or if this considered breaking change). Does the above make sense?
Added a take on this in ea7c8a2 and a test for the above scenario in DynamicMappingIT#testBBQDynamicMappingWhenFirstIngestingDoc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(there is some code duplication that we could potentially avoid if we are to go with this)

}

private DenseVectorIndexOptions defaultIndexOptions(boolean defaultInt8Hnsw, boolean defaultBBQHnsw) {
if (this.dims != null && this.dims.isConfigured() && elementType.getValue() == ElementType.FLOAT && this.indexed.getValue()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benwtrent updated this so if dims is not provided, we postpone deciding which IndexOptions to create until we actually know the dimensions (in DenseVectorFieldMapper#parse)

@pmpailis
Copy link
Contributor Author

buildkite test this please

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pmpailis
Copy link
Contributor Author

Thanks for the reviews @benwtrent , @jimczi and @tteofili !

@benwtrent I'll proceed with merging this PR, we can address any leftover comments you may have in a follow up :)

@pmpailis pmpailis merged commit b855266 into elastic:main Jun 24, 2025
27 checks passed
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jun 25, 2025
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Jul 3, 2025
In elastic#129825, we modified the dense_vector field type to delay setting index options until the field's dimensions are known. However, this introduced a discrepancy for indices created before that change, which would previously default to int8_hnsw even when dimensions were not set.

This discrepancy leads to an assertion failure in mixed-version clusters, where the serialized mappings differ between nodes:
```
[2025-07-02T20:37:29,852][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v9.0.4-2] fatal error in thread [elasticsearch[v9.0.4-2][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: provided source [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine"}}}}] differs from mapping [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine","index_options":{"type":"int8_hnsw","m":16,"ef_construction":100}}}}}]
```

This commit resolves the issue by ensuring that indices created before the change continue to default to int8_hnsw index options, even if dimensions remain unset.
jimczi added a commit that referenced this pull request Jul 3, 2025
#130540)

In #129825, we modified the dense_vector field type to delay setting index options until the field's dimensions are known. However, this introduced a discrepancy for indices created before that change, which would previously default to int8_hnsw even when dimensions were not set.

This discrepancy leads to an assertion failure in mixed-version clusters, where the serialized mappings differ between nodes:
```
[2025-07-02T20:37:29,852][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v9.0.4-2] fatal error in thread [elasticsearch[v9.0.4-2][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: provided source [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine"}}}}] differs from mapping [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine","index_options":{"type":"int8_hnsw","m":16,"ef_construction":100}}}}}]
```

This commit resolves the issue by ensuring that indices created before the change continue to default to int8_hnsw index options, even if dimensions remain unset.
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Jul 3, 2025
elastic#130540)

In elastic#129825, we modified the dense_vector field type to delay setting index options until the field's dimensions are known. However, this introduced a discrepancy for indices created before that change, which would previously default to int8_hnsw even when dimensions were not set.

This discrepancy leads to an assertion failure in mixed-version clusters, where the serialized mappings differ between nodes:
```
[2025-07-02T20:37:29,852][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v9.0.4-2] fatal error in thread [elasticsearch[v9.0.4-2][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: provided source [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine"}}}}] differs from mapping [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine","index_options":{"type":"int8_hnsw","m":16,"ef_construction":100}}}}}]
```

This commit resolves the issue by ensuring that indices created before the change continue to default to int8_hnsw index options, even if dimensions remain unset.
@shainaraskas shainaraskas added the docs-missing-applies-tags PRs that are missing docs applies_to tags for an upcoming release. label Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>docs General docs changes docs-missing-applies-tags PRs that are missing docs applies_to tags for an upcoming release. >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types :Search Relevance/Vectors Vector search Team:Docs Meta label for docs team Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch test-full-bwc Trigger full BWC version matrix tests v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants