Skip to content

Conversation

@kderusso
Copy link
Member

@kderusso kderusso commented Jan 10, 2025

Adds index_options support for semantic_text fields using dense models.

Example:

PUT _inference/text_embedding/my-e5-model
{
  "service": "elasticsearch",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": ".multilingual-e5-small"
  }
}

PUT my-semantic-index
{
  "mappings": {
    "properties": {
      "inference_field": {
        "type": "semantic_text",
        "inference_id": "my-e5-model",
        "index_options": {
          "dense_vector": {
            "type": "bbq_hnsw",
            "ef_construction": 100
           }
        }
      }
    }
  }
}

@kderusso kderusso force-pushed the kderusso/semantic-text-index-options branch from e096a61 to 342d769 Compare January 10, 2025 16:17
@kderusso kderusso force-pushed the kderusso/semantic-text-index-options branch from 342d769 to d822301 Compare January 10, 2025 16:29
@kderusso kderusso added >enhancement auto-backport Automatically create backport pull requests when merged :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v8.18.0 labels Jan 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @kderusso, I've created a changelog YAML for you.

@kderusso kderusso added the :Search Relevance/Search Catch all for Search Relevance label Jan 10, 2025
@kderusso kderusso marked this pull request as ready for review January 10, 2025 16:38
@kderusso kderusso requested review from a team, Mikep86 and jimczi January 10, 2025 16:39
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start to this! I have a bunch of comments, but they're mostly interrelated, so it's not as much as it seems.

@kderusso kderusso requested review from Mikep86 and jimczi June 16, 2025 20:16
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment regarding validation but this feels very close @kderusso !

@github-actions
Copy link
Contributor

github-actions bot commented Jun 17, 2025

🔍 Preview links for changed docs:

🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes.

Copy link
Contributor

@Samiul-TheSoccerFan Samiul-TheSoccerFan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice implementation and tests, left a few nitpick comments.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kderusso kderusso merged commit 813814b into elastic:main Jun 17, 2025
27 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 119967

kderusso added a commit to kderusso/elasticsearch that referenced this pull request Jun 17, 2025
* Add index_options parameter to semantic_text field mapping

* Cleanup & tests

* Update docs

* Update docs/changelog/119967.yaml

* Addressed some PR feedbak

* Update yaml tests

* Refactoring

* Cleanup

* Fix some tests

* Hack in inferring text_embedding task type from index options

* [CI] Auto commit changes from spotless

* Fix error inferring model settings

* Update docs

* Update tests

* Update docs/reference/mapping/types/semantic-text.asciidoc

Co-authored-by: Mike Pellegrini <[email protected]>

* Address some minor PR feedback

* Remove partial model_settings with inferred task type

* Cleanup

* Remove unnecessary changes

* Fix errors from merge

* [CI] Auto commit changes from spotless

* Cleanup

* Checkpoint, saving changes before merge

* Update parsing

* [CI] Auto commit changes from spotless

* Stash changes

* Fix compile errors

* [CI] Auto commit changes from spotless

* Cleanup error

* fix test

* fix test

* Fix another test

* A bit of cleanup

* Fix tests

* Spotless

* Respect index options if set over defaults

* Cleanup

* [CI] Auto commit changes from spotless

* Support updating to compatible versions, add some cleanup and validation

* Remove test that can't be done here - needs to be unit test

* Add validation

* Cleanup

* Fix some yaml tests

* [CI] Auto commit changes from spotless

* Happy path early index validation works now; edge cases surrounding default BBQ remain

* Always emit index options, even when using defaults

* Minor cleanup

* Fix test compilation failures

* Fix some tests

* Continue to iterate on test failures

* Remove index options from inference field metadata as it is only needed at field creation time

* Fix some tests

* Remove transport version, no longer needed

* Fix yaml tests

* Add tests

* IndexOptions don't need to implement Writeable

* [CI] Auto commit changes from spotless

* Refactor - move SemanticTextIndexOptions

* Remove writeable

* Move index_options parsing to semantic text field mapper

* Cleanup

* Fix test compilation issue

* Cleanup

* Remove whitespace

* Remove writeables from index options

* Disable merging null options?

* Add docs

* [CI] Auto commit changes from spotless

* Revert "Disable merging null options?"

This reverts commit 2ef8b1d.

* Remove default serialization

* Include default index option type to defaults

* [CI] Auto commit changes from spotless

* Go back to allowing null updateS

* Cleanup

* Fix validation error

* Revert "Include default index option type to defaults"

This reverts commit b08e2a1.

* Update tests

* Revert "Update tests"

This reverts commit aedfafe.

* Better fix for null inputs

* Remove redundant merge validation

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Co-authored-by: Mike Pellegrini <[email protected]>
(cherry picked from commit 813814b)

# Conflicts:
#	docs/reference/elasticsearch/mapping-reference/semantic-text.md
#	server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapperTests.java
@kderusso
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
8.19

Questions ?

Please refer to the Backport tool documentation

@kderusso
Copy link
Member Author

Had to create a manual backport: #129626

kderusso added a commit that referenced this pull request Jun 18, 2025
…129626)

* Add index_options to semantic_text field mappings (#119967)

* Add index_options parameter to semantic_text field mapping

* Cleanup & tests

* Update docs

* Update docs/changelog/119967.yaml

* Addressed some PR feedbak

* Update yaml tests

* Refactoring

* Cleanup

* Fix some tests

* Hack in inferring text_embedding task type from index options

* [CI] Auto commit changes from spotless

* Fix error inferring model settings

* Update docs

* Update tests

* Update docs/reference/mapping/types/semantic-text.asciidoc

Co-authored-by: Mike Pellegrini <[email protected]>

* Address some minor PR feedback

* Remove partial model_settings with inferred task type

* Cleanup

* Remove unnecessary changes

* Fix errors from merge

* [CI] Auto commit changes from spotless

* Cleanup

* Checkpoint, saving changes before merge

* Update parsing

* [CI] Auto commit changes from spotless

* Stash changes

* Fix compile errors

* [CI] Auto commit changes from spotless

* Cleanup error

* fix test

* fix test

* Fix another test

* A bit of cleanup

* Fix tests

* Spotless

* Respect index options if set over defaults

* Cleanup

* [CI] Auto commit changes from spotless

* Support updating to compatible versions, add some cleanup and validation

* Remove test that can't be done here - needs to be unit test

* Add validation

* Cleanup

* Fix some yaml tests

* [CI] Auto commit changes from spotless

* Happy path early index validation works now; edge cases surrounding default BBQ remain

* Always emit index options, even when using defaults

* Minor cleanup

* Fix test compilation failures

* Fix some tests

* Continue to iterate on test failures

* Remove index options from inference field metadata as it is only needed at field creation time

* Fix some tests

* Remove transport version, no longer needed

* Fix yaml tests

* Add tests

* IndexOptions don't need to implement Writeable

* [CI] Auto commit changes from spotless

* Refactor - move SemanticTextIndexOptions

* Remove writeable

* Move index_options parsing to semantic text field mapper

* Cleanup

* Fix test compilation issue

* Cleanup

* Remove whitespace

* Remove writeables from index options

* Disable merging null options?

* Add docs

* [CI] Auto commit changes from spotless

* Revert "Disable merging null options?"

This reverts commit 2ef8b1d.

* Remove default serialization

* Include default index option type to defaults

* [CI] Auto commit changes from spotless

* Go back to allowing null updateS

* Cleanup

* Fix validation error

* Revert "Include default index option type to defaults"

This reverts commit b08e2a1.

* Update tests

* Revert "Update tests"

This reverts commit aedfafe.

* Better fix for null inputs

* Remove redundant merge validation

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Co-authored-by: Mike Pellegrini <[email protected]>
(cherry picked from commit 813814b)

# Conflicts:
#	docs/reference/elasticsearch/mapping-reference/semantic-text.md
#	server/src/main/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldMapper.java
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapperTests.java

* Fix errors in backport merge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending >enhancement :Search Relevance/Search Catch all for Search Relevance :SearchOrg/Relevance Label for the Search (solution/org) Relevance team Team:Search - Relevance The Search organization Search Relevance team Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch Team:SearchOrg Meta label for the Search Org (Enterprise Search) v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants