You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -184,117 +184,36 @@ Creating Elasticsearch indices could require more memory than the JVM (Java Virt
184
184
{{< /hint >}}
185
185
186
186
## Search optimization for other languages
187
+
188
+
The default analyzer is tuned for English and other western languages, and may not perform as well with others. This configuration can be modified for any language that Elasticsearch supports. Reviewing the [chewy index docs] may be useful to prepare for these changes.
189
+
190
+
{{< hint style="warning" >}}
191
+
Adding language support will require code changes and should only be attempted if you are comfortable modifying Ruby code and installing ES extensions.
192
+
{{< /hint >}}
193
+
194
+
[chewy index docs]: https://github.com/toptal/chewy?tab=readme-ov-file#index-definition
195
+
187
196
### Chinese search optimization {#chinese-search-optimization}
188
197
189
-
The standard analyzer is the default for Elasticsearch, but for some languages like Chinese it may not be the optimal choice. To enhance the search experience, consider installing a language-specific analyzer. Before creating indices in Elasticsearch, be sure to install the following extensions:
198
+
Before creating indices in Elasticsearch, be sure to install the following extensions:
After those are installed, you need to modify the code definitions which generate the search indices. Within every index definition file (`app/chewy/*_index.rb`), make the following changes:
204
+
205
+
- Replace all `tokenizer: 'VALUE'` (whitespace, standard, keyword, etc) occurrences with `tokenizer: 'ik_max_word'`
206
+
- In every index that has an `analyzer: { content: ... }` definition, between the `filter` and `analyzer` sections, add:
207
+
208
+
```ruby
209
+
char_filter: {
210
+
tsconvert: {
211
+
type: 'stconvert',
212
+
keep_both: false,
213
+
delimiter: '#',
214
+
convert_type: 't2s',
215
+
},
216
+
},
217
+
```
218
+
219
+
- In those same files, in every `content: ...` section, add an option of `char_filter: %w(tsconvert)` to use that filter
0 commit comments