-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Provide access to new settings for HyphenationCompoundWordTokenFilter #115585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide access to new settings for HyphenationCompoundWordTokenFilter #115585
Conversation
…enFilter Lucene issue: apache/lucene#9231
Documentation preview: |
3996b21
to
edb18b5
Compare
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
} | ||
|
||
/** | ||
* Given a word list of: ["kaffee", "fee", "maschine"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that the word list being used or is it this: [fuss, fussball, ballpumpe, ball, pumpe, kaffee, fee, maschine]. I was thrown off by the comment but had trouble tracking that through in my head. Same thing on the comment on the subsequent test. The test result makes sense to me and looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, the wordlist contains ["fuss", "fussball", "ballpumpe", "ball", "pumpe", "kaffee", "fee", "maschine"]
, as defined in test1.json:43
. The comment should highlight, that this parameter should solve this specific problem of preventing the match of "fee" (fairy) within "kaffee" (coffee).
I left in the same wordlist for all tests and input text to ensure that they are not any unintended side effect.
If it's clearer I could isolate the tests and only include the Kaffeemaschine related words in this test and only the Fussballpumpe in the other one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha I'm tracking now; this comment was a "for example". So I'll just nit (change it if you want). I'd just include before the comment something like "for example given a word list of: " ... that way it's clear that the test is validating more than just that word list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than a minor comment this LGTM.
…example case that highlights the issue.
@elasticsearchmachine please test this |
@elasticmachine test this please |
@elasticmachine update branch |
…henation_compound_word_token_filter
@elasticmachine test this please |
💚 Backport successful
|
…elastic#115585) Allow the new flags added in Lucene in the HyphenationCompoundWordTokenFilter Adds access to the two new flags no_sub_matches and no_overlapping_matches. Lucene issue: apache/lucene#9231
…#115585) (#116968) Allow the new flags added in Lucene in the HyphenationCompoundWordTokenFilter Adds access to the two new flags no_sub_matches and no_overlapping_matches. Lucene issue: apache/lucene#9231 Co-authored-by: Peter Straßer <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
…elastic#115585) Allow the new flags added in Lucene in the HyphenationCompoundWordTokenFilter Adds access to the two new flags no_sub_matches and no_overlapping_matches. Lucene issue: apache/lucene#9231
…elastic#115585) Allow the new flags added in Lucene in the HyphenationCompoundWordTokenFilter Adds access to the two new flags no_sub_matches and no_overlapping_matches. Lucene issue: apache/lucene#9231
Solves #97849.
Adds access to the two new flags
no_sub_matches
andno_overlapping_matches
to the HyphenationCompoundWordTokenFilter.Lucene issue: apache/lucene#9231
Lucene PR: apache/lucene#12437