Releases: meilisearch/charabia
Charabia v0.9.9
Changes
- Enhance fst segmenter (#361) @ManyTheFish
Thanks again to @ManyTheFish, @curquiza, and dependabot[bot]! 🎉
Charabia v0.9.8
Changes
- Update README.md (#351) @ManyTheFish
- Bump Dependencies (#355) @Kerollmops
- Fix: Prevent splitting of numbers and English words in Chinese text segmentation (#354) @JinheLin
- disallow char split on german segmenter (#360) @ManyTheFish
Thanks again to @JinheLin, @Kerollmops, @ManyTheFish, @dependabot[bot], @meili-bors[bot] and dependabot[bot]! 🎉
Charabia v0.9.7
Charabia v0.9.6
Charabia v0.9.5
Changes
- Hotfix: update Lindera to
0.42.3removing the native tls dependency
Thanks again to @ManyTheFish! 🎉
Charabia v0.9.4
Changes
- Upgrade to ubuntu-24.04 in workflows (#336) @ManyTheFish
- Fix a byte-character confusion in the arabic segmenter (#337) @slatian
- fix: update Lindera to 0.41.0 (#334) @Nickersoft
- Bump Lindera to 0.42.1 (#340) @Kerollmops
- Secondary cut for Chinese (#341) @HDT3213
Thanks again to @HDT3213, @Kerollmops, @ManyTheFish, @Nickersoft, and @slatian! 🎉
Charabia v0.9.3
Changes
- Upgrade compatible dependencies (#323) @Kerollmops
- Update license for 2025 (#324)
- Armenian letters should be lowercased (#328) @NarHakobyan
- Update lindera to v0.32.3 (#329) @mosuka
Thanks again to @Kerollmops, @ManyTheFish, @NarHakobyan, @curquiza, @dependabot[bot], @meili-bors[bot], @mosuka and dependabot[bot]! 🎉
Charabia v0.9.2
Changes
- fix: Segment number into word instead of chars (#271) (#311) @dqkqd
- Update wana_kana to 4.0.0 (#312) @tats-u
- Latin camelcase wrong segmentation (#317) @PedroTurik
- Replace jemalloc(ator) with mimalloc, which covers wider platforms (#315) @tats-u
Thanks again to @ManyTheFish, @PedroTurik, @dependabot, @dependabot[bot], @dqkqd, @meili-bors[bot], and @tats-u! 🎉
Charabia v0.9.1
Changes
- Add Turkish normalizer (#305) @tkhshtsh0917
- feat: Adds German compound words decomposition with new segmenter (#303) @luflow
- German: Adds some more test cases and updates dictionary (#306) @luflow
Thanks again to @ManyTheFish, @luflow, @meili-bors[bot], and @tkhshtsh0917! 🎉
Charabia v0.9.0
Changes
(BREAKING) Simplify lang detection (#299) @ManyTheFish
- The Language
allow_listchange from aHashMap<Script, Vec<Language>>to a slice ofLanguage:&[Language]. - Add the
tokenize_with_allow_listmethod to theTokenizer, allowing to dynamically pass aLanguageallow list without having to re-build the tokenizer.
Add math symbols to default separators (#301) @phillitrOSU
Adds all math symbols from https://www.compart.com/en/unicode/category/Sm to the default separator list.
Thanks again to @ManyTheFish, @meili-bors[bot], and @phillitrOSU! 🎉