Skip to content

Conversation

pstrsr
Copy link
Contributor

@pstrsr pstrsr commented Oct 24, 2024

Solves #97849.

Adds access to the two new flags no_sub_matches and no_overlapping_matches to the HyphenationCompoundWordTokenFilter.

Lucene issue: apache/lucene#9231
Lucene PR: apache/lucene#12437

Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.0.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Oct 24, 2024
@pstrsr pstrsr force-pushed the 97849_provide_access_to_new_settings_for_hyphenation_compound_word_token_filter branch from 3996b21 to edb18b5 Compare October 24, 2024 20:15
@astefan astefan added :Search Relevance/Analysis How text is split into tokens and removed needs:triage Requires assignment of a team area label labels Oct 25, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 25, 2024
@peter-strsr peter-strsr requested a review from thecoop October 29, 2024 08:19
@thecoop thecoop removed their request for review October 29, 2024 13:42
@john-wagster john-wagster self-requested a review November 17, 2024 16:08
@john-wagster john-wagster added v8.17.0 auto-backport Automatically create backport pull requests when merged labels Nov 17, 2024
}

/**
* Given a word list of: ["kaffee", "fee", "maschine"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that the word list being used or is it this: [fuss, fussball, ballpumpe, ball, pumpe, kaffee, fee, maschine]. I was thrown off by the comment but had trouble tracking that through in my head. Same thing on the comment on the subsequent test. The test result makes sense to me and looks good.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the wordlist contains ["fuss", "fussball", "ballpumpe", "ball", "pumpe", "kaffee", "fee", "maschine"], as defined in test1.json:43. The comment should highlight, that this parameter should solve this specific problem of preventing the match of "fee" (fairy) within "kaffee" (coffee).

I left in the same wordlist for all tests and input text to ensure that they are not any unintended side effect.

If it's clearer I could isolate the tests and only include the Kaffeemaschine related words in this test and only the Fussballpumpe in the other one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha I'm tracking now; this comment was a "for example". So I'll just nit (change it if you want). I'd just include before the comment something like "for example given a word list of: " ... that way it's clear that the test is validating more than just that word list.

Copy link
Contributor

@john-wagster john-wagster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a minor comment this LGTM.

@john-wagster
Copy link
Contributor

@elasticsearchmachine please test this

@john-wagster
Copy link
Contributor

@elasticmachine test this please

@john-wagster
Copy link
Contributor

@elasticmachine update branch

@john-wagster
Copy link
Contributor

@elasticmachine test this please

@peter-strsr peter-strsr merged commit c804953 into elastic:main Nov 18, 2024
17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

john-wagster pushed a commit to john-wagster/elasticsearch that referenced this pull request Nov 18, 2024
…elastic#115585)

Allow the new flags added in Lucene in the HyphenationCompoundWordTokenFilter

Adds access to the two new flags no_sub_matches and no_overlapping_matches.

Lucene issue: apache/lucene#9231
elasticsearchmachine pushed a commit that referenced this pull request Nov 19, 2024
…#115585) (#116968)

Allow the new flags added in Lucene in the HyphenationCompoundWordTokenFilter

Adds access to the two new flags no_sub_matches and no_overlapping_matches.

Lucene issue: apache/lucene#9231

Co-authored-by: Peter Straßer <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Nov 20, 2024
…elastic#115585)

Allow the new flags added in Lucene in the HyphenationCompoundWordTokenFilter

Adds access to the two new flags no_sub_matches and no_overlapping_matches.

Lucene issue: apache/lucene#9231
alexey-ivanov-es pushed a commit to alexey-ivanov-es/elasticsearch that referenced this pull request Nov 28, 2024
…elastic#115585)

Allow the new flags added in Lucene in the HyphenationCompoundWordTokenFilter

Adds access to the two new flags no_sub_matches and no_overlapping_matches.

Lucene issue: apache/lucene#9231
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged external-contributor Pull request authored by a developer outside the Elasticsearch team >feature :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.17.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants