-
Notifications
You must be signed in to change notification settings - Fork 25.5k
[DOCS] Add docs for new Lucene's filters for Japanese text. #112356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
New filters are: - hiragana_uppercase - katakana_uppercase This is related to: * elastic#106553
Documentation preview: |
Pinging @elastic/es-docs (Team:Docs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I tested that this works as documented. I have one minor non-blocking question about wording.
Thanks for opening PR and adding to the docs 🏅
[[analysis-kuromoji-hiragana-uppercase]] | ||
==== `hiragana_uppercase` token filter | ||
|
||
The `hiragana_uppercase` token filter normalizes small letters (捨て仮名) in hiragana into normal letters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "normal letters" accepted phrasing?
Maybe "The hiragana_uppercase
token filter normalizes small Hiragana letters (捨て仮名) into full-size Hiragana letters? "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, maybe standard
(or regular
) would be better. The word "Full-size" sounds like "full-width" (multi-byte), which is not the case here. Let me change "normal" to "standard".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Glad I was able to communicate that despite my ignorance of linguistic terms :)
[[analysis-kuromoji-katakana-uppercase]] | ||
==== `katakana_uppercase` token filter | ||
|
||
The `katakana_uppercase` token filter normalizes small letters (捨て仮名) in katakana into normal letters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above.
🚢 |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…#112514) (cherry picked from commit 2982fc6) Co-authored-by: Dai Sugimori <[email protected]>
This PR adds a documentation for new Lucene's filters for Japanese text under analysis-kuromoji plugin.
New filters are:
These filters are introduced to Lucene 9.11 and also it's available on Elasticsearch from 8.15.
This is related to: