Skip to content

Conversation

@Samiul-TheSoccerFan
Copy link
Contributor

This PR fixes the shard failure error we get when we request the _inference_fields with legacy semantic_text.

Steps to reproduce:

PUT _inference/sparse_embedding/my-elser-endpoint
{
  "service": "elser",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1
  },
  "task_settings": {}
}

PUT my-index-1
{
  "settings": {
    "index.mapping.semantic_text.use_legacy_format": true
  },
  "mappings": {
    "properties": {
      "inference_field": {
        "type": "semantic_text",
        "inference_id": "my-elser-endpoint"
      },
      "source_field": {
        "type": "text",
        "copy_to": "inference_field"
      }
    }
  }
}

PUT my-index-1/_doc/doc1
{
  "inference_field": "test value",
  "source_field": "source value"
}

And then when we search with _inference_fields:

GET my-index-1/_search
{
  "query": {
    "match_all": {}
  },
  "fields": ["_inference_fields"]
}

{
    "error": {
        "root_cause": [
            {
                "type": "null_pointer_exception",
                "reason": "Cannot invoke \"org.elasticsearch.index.mapper.MappedFieldType.isStored()\" because the return value of \"org.elasticsearch.index.query.SearchExecutionContext.getFieldType(String)\" is null"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "my-index-1",
                "node": "EMYfiOmiSiCWfTU1KOHasQ",
                "reason": {
                    "type": "null_pointer_exception",
                    "reason": "Cannot invoke \"org.elasticsearch.index.mapper.MappedFieldType.isStored()\" because the return value of \"org.elasticsearch.index.query.SearchExecutionContext.getFieldType(String)\" is null"
                }
            }
        ],
        "caused_by": {
            "type": "null_pointer_exception",
            "reason": "Cannot invoke \"org.elasticsearch.index.mapper.MappedFieldType.isStored()\" because the return value of \"org.elasticsearch.index.query.SearchExecutionContext.getFieldType(String)\" is null",
            "caused_by": {
                "type": "null_pointer_exception",
                "reason": "Cannot invoke \"org.elasticsearch.index.mapper.MappedFieldType.isStored()\" because the return value of \"org.elasticsearch.index.query.SearchExecutionContext.getFieldType(String)\" is null"
            }
        }
    },
    "status": 500
}

@Samiul-TheSoccerFan Samiul-TheSoccerFan added >bug auto-backport Automatically create backport pull requests when merged :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v9.0.0 v8.18.0 v8.18.1 v9.0.1 labels Feb 4, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @Samiul-TheSoccerFan, I've created a changelog YAML for you.

@Samiul-TheSoccerFan Samiul-TheSoccerFan marked this pull request as ready for review February 5, 2025 18:50
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@Mikep86 Mikep86 requested review from jimczi and removed request for salvatore-campagna February 5, 2025 19:22
Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start @Samiul-TheSoccerFan !

@Mikep86 has already given all of the feedback I would have, once you address those comments I think this will be in a good place.

jimczi

This comment was marked as outdated.

@jimczi
Copy link
Contributor

jimczi commented Feb 5, 2025

Sorry, I misread the change, it is related to the inference fields.
There's something I don't understand here though, the issue comes from this code:

searchExecutionContext.isMetadataField(fieldAndFormat.field) == false || searchExecutionContext.getFieldType(fieldAndFormat.field).isStored() == false

The inference fields must be detected as a non-metadata field since the legacy format is used, meaning that we should never reach the getFieldType part. The issue here it seems is that isMetadataField returns true but then the field type is null. That should never happen since all metadata fields are mapped.

@jimczi
Copy link
Contributor

jimczi commented Feb 5, 2025

Ok I think I understand now. MapperService#isMetadataField returns whether the field might be a metadata field in the general context instead of checking the actual value of the field in the mapping. So instead of doing:

public boolean isMetadataField(String field) {
        return mapperRegistry.getMetadataMapperParsers(indexVersionCreated).containsKey(field);
    }

We should do something like:

public boolean isMetadataField(String field) {
        var mapper = mappingLookup().getMapper(field);
        return mapper != null && mapper instanceof MetadataFieldMapper;
}

The _inference_fields is a metadata field only on indices where index.mapping.semantic_text.use_legacy_format and current check on MapperService#isMetadataField doesn't account for that.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better 🙌 ! I left one comment that we should address as I think not_exists will return true if any component of the queried path does not exist, meaning that the test could pass if no hits are returned.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thanks for making all of the iterations!

@Mikep86
Copy link
Contributor

Mikep86 commented Feb 6, 2025

@Samiul-TheSoccerFan Looks like these changes broke testIsMetadataField, can you take a look?

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fix the broken test before merging

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good approach to fix the failing test. I added some comments about how we can clean it up.

Copy link
Contributor

@Mikep86 Mikep86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the iterations!

@Samiul-TheSoccerFan
Copy link
Contributor Author

@jimczi Can you please take a look at this PR again and let us know if you have any concerns or suggestions.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Samiul-TheSoccerFan Samiul-TheSoccerFan merged commit 43c841a into elastic:main Feb 10, 2025
17 checks passed
Samiul-TheSoccerFan added a commit to Samiul-TheSoccerFan/elasticsearch that referenced this pull request Feb 10, 2025
…ard failure (elastic#121720)

* Adding condition to verify if the field belongs to an index

* Update docs/changelog/121720.yaml

* Remove unnecessary comma from yaml file

* remove duplicate inference endpoint creation

* updating isMetadata to return true if mapper has the correct type

* remove unnecessary index creation in yaml tests

* Adding check if the document has returned in the yaml test

* Updating test to skip time series  check if index mode is standard

* Refactor tests to consider verifying every metafields with all index modes

* refactoring test to verify for all cases

* Adding assetFalse if not time_series and fields are from time_series

* updating test texts to have better description
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.0
8.18

elasticsearchmachine pushed a commit that referenced this pull request Feb 10, 2025
…ard failure (#121720) (#122177)

* Adding condition to verify if the field belongs to an index

* Update docs/changelog/121720.yaml

* Remove unnecessary comma from yaml file

* remove duplicate inference endpoint creation

* updating isMetadata to return true if mapper has the correct type

* remove unnecessary index creation in yaml tests

* Adding check if the document has returned in the yaml test

* Updating test to skip time series  check if index mode is standard

* Refactor tests to consider verifying every metafields with all index modes

* refactoring test to verify for all cases

* Adding assetFalse if not time_series and fields are from time_series

* updating test texts to have better description
elasticsearchmachine pushed a commit that referenced this pull request Feb 10, 2025
…ard failure (#121720) (#122178)

* Adding condition to verify if the field belongs to an index

* Update docs/changelog/121720.yaml

* Remove unnecessary comma from yaml file

* remove duplicate inference endpoint creation

* updating isMetadata to return true if mapper has the correct type

* remove unnecessary index creation in yaml tests

* Adding check if the document has returned in the yaml test

* Updating test to skip time series  check if index mode is standard

* Refactor tests to consider verifying every metafields with all index modes

* refactoring test to verify for all cases

* Adding assetFalse if not time_series and fields are from time_series

* updating test texts to have better description
@Samiul-TheSoccerFan
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.x

Questions ?

Please refer to the Backport tool documentation

joegallo pushed a commit that referenced this pull request Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v8.18.0 v8.18.1 v8.19.0 v9.0.0 v9.0.1 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants