Skip to content

[BUG] Highlighting issue with annotated highlighter and max_analyzer_offset optionΒ #20669

@anatoly21

Description

@anatoly21

Describe the bug

Adding the max_analyzer_offset option to highlight request causes an error

    "root_cause": [
      {
        "type": "class_cast_exception",
        "reason": "class org.opensearch.search.fetch.subphase.highlight.UnifiedHighlighter$1 cannot be cast to class org.opensearch.index.mapper.annotatedtext.AnnotatedTextFieldMapper$AnnotatedHighlighterAnalyzer (org.opensearch.search.fetch.subphase.highlight.UnifiedHighlighter$1 is in unnamed module of loader 'app'; org.opensearch.index.mapper.annotatedtext.AnnotatedTextFieldMapper$AnnotatedHighlighterAnalyzer is in unnamed module of loader java.net.FactoryURLClassLoader @1248f83)"
      }
    ]

Related component

Search:Query Capabilities

To Reproduce

  1. Make sure mapper-annotated-text plugin is installed
  2. Create an index
PUT /highlight-issue
{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 1
    }
  },
  "mappings": {
    "properties": {
        "contents_eng": {
          "type": "annotated_text",
          "analyzer": "english",
          "fields": {
            "exactmatch": {
              "type": "annotated_text",
              "analyzer": "standard"
            }
          }
        },
        "headline_eng": {
          "type": "annotated_text",
          "term_vector": "with_positions_offsets",
          "analyzer": "english",
          "fields": {
            "exactmatch": {
              "type": "annotated_text",
              "term_vector": "with_positions_offsets",
              "analyzer": "standard"
            }
          }
        }        
    }
  }
}
  1. Index document
PUT /highlight-issue/_doc/1
{
  "headline_eng": "Background of [Grumpy Cat](Tardar Sauce)",
  "contents_eng": "[Grumpy Cat](Tardar Sauce) gained fame after a photo of her was posted on Reddit in September 2012. Her unique appearance led to her becoming an internet meme"
}
  1. Run a query
GET /highlight-issue/_search?request_cache=false
{
  "highlight": {
    "fields": {
      "contents_eng": {
        "boundary_max_scan": 20,
        "boundary_scanner": "sentence",
        "fragment_size": 0,
        "number_of_fragments": 0,
        "require_field_match": true
      }
    },
    "fragment_size": 75,
    "no_match_size": 0,
    "number_of_fragments": 2,
    "order": "score",
    "type": "annotated",
    "max_analyzer_offset": 1000
  },
  "query": {
    "match": {
      "contents_eng": {
        "query": "Tardar Sauce",
        "analyzer": "keyword"
      }
    }
  }
}
  1. See the error
  2. Run the query without max_analyzer_offset options and see that the search completes successfully

Expected behavior

The "annotated" highlighter as one based on unified highlighter supports max_analyzer_offset option as the maximum number of characters in non-annotated text to be analyzed by a highlight request.

Additional Details

OpenSearch v 2.19.4

Plugins
mapper-annotated-text v 2.19.4

Metadata

Metadata

Assignees

Type

No type

Projects

Status

πŸ†• New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions