Skip to content

Commit 9740b2a

Browse files
authored
Replace outdated inline code patch with instructions (#1836)
1 parent 4dd7c32 commit 9740b2a

File tree

1 file changed

+27
-108
lines changed

1 file changed

+27
-108
lines changed

content/en/admin/elasticsearch.md

Lines changed: 27 additions & 108 deletions
Original file line numberDiff line numberDiff line change
@@ -184,117 +184,36 @@ Creating Elasticsearch indices could require more memory than the JVM (Java Virt
184184
{{< /hint >}}
185185
186186
## Search optimization for other languages
187+
188+
The default analyzer is tuned for English and other western languages, and may not perform as well with others. This configuration can be modified for any language that Elasticsearch supports. Reviewing the [chewy index docs] may be useful to prepare for these changes.
189+
190+
{{< hint style="warning" >}}
191+
Adding language support will require code changes and should only be attempted if you are comfortable modifying Ruby code and installing ES extensions.
192+
{{< /hint >}}
193+
194+
[chewy index docs]: https://github.com/toptal/chewy?tab=readme-ov-file#index-definition
195+
187196
### Chinese search optimization {#chinese-search-optimization}
188197
189-
The standard analyzer is the default for Elasticsearch, but for some languages like Chinese it may not be the optimal choice. To enhance the search experience, consider installing a language-specific analyzer. Before creating indices in Elasticsearch, be sure to install the following extensions:
198+
Before creating indices in Elasticsearch, be sure to install the following extensions:
190199
191200
- [elasticsearch-analysis-ik](https://github.com/medcl/elasticsearch-analysis-ik)
192201
- [elasticsearch-analysis-stconvert](https://github.com/medcl/elasticsearch-analysis-stconvert)
193202
194-
And then modify Mastodon's index definition as follows:
195-
196-
```diff
197-
diff --git a/app/chewy/accounts_index.rb b/app/chewy/accounts_index.rb
198-
--- a/app/chewy/accounts_index.rb
199-
+++ b/app/chewy/accounts_index.rb
200-
@@ -4,7 +4,7 @@ class AccountsIndex < Chewy::Index
201-
settings index: { refresh_interval: '5m' }, analysis: {
202-
analyzer: {
203-
content: {
204-
- tokenizer: 'whitespace',
205-
+ tokenizer: 'ik_max_word',
206-
filter: %w(lowercase asciifolding cjk_width),
207-
},
208-
209-
diff --git a/app/chewy/public_statuses_index.rb b/app/chewy/public_statuses_index.rb
210-
--- a/app/chewy/public_statuses_index.rb
211-
+++ b/app/chewy/public_statuses_index.rb
212-
@@ -19,6 +19,15 @@ class PublicStatusesIndex < Chewy::Index
213-
},
214-
},
215-
216-
+ char_filter: {
217-
+ tsconvert: {
218-
+ type: 'stconvert',
219-
+ keep_both: false,
220-
+ delimiter: '#',
221-
+ convert_type: 't2s',
222-
+ },
223-
+ },
224-
+
225-
analyzer: {
226-
verbatim: {
227-
tokenizer: 'uax_url_email',
228-
@@ -26,7 +35,7 @@ class PublicStatusesIndex < Chewy::Index
229-
},
230-
231-
content: {
232-
- tokenizer: 'standard',
233-
+ tokenizer: 'ik_max_word',
234-
filter: %w(
235-
lowercase
236-
asciifolding
237-
@@ -36,6 +45,7 @@ class PublicStatusesIndex < Chewy::Index
238-
english_stop
239-
english_stemmer
240-
),
241-
+ char_filter: %w(tsconvert),
242-
},
243-
244-
hashtag: {
245-
246-
diff --git a/app/chewy/statuses_index.rb b/app/chewy/statuses_index.rb
247-
--- a/app/chewy/statuses_index.rb
248-
+++ b/app/chewy/statuses_index.rb
249-
@@ -16,9 +16,17 @@ class StatusesIndex < Chewy::Index
250-
language: 'possessive_english',
251-
},
252-
},
253-
+ char_filter: {
254-
+ tsconvert: {
255-
+ type: 'stconvert',
256-
+ keep_both: false,
257-
+ delimiter: '#',
258-
+ convert_type: 't2s',
259-
+ },
260-
+ },
261-
analyzer: {
262-
content: {
263-
- tokenizer: 'uax_url_email',
264-
+ tokenizer: 'ik_max_word',
265-
filter: %w(
266-
english_possessive_stemmer
267-
lowercase
268-
@@ -27,6 +35,7 @@ class StatusesIndex < Chewy::Index
269-
english_stop
270-
english_stemmer
271-
),
272-
+ char_filter: %w(tsconvert),
273-
},
274-
},
275-
}
276-
diff --git a/app/chewy/tags_index.rb b/app/chewy/tags_index.rb
277-
--- a/app/chewy/tags_index.rb
278-
+++ b/app/chewy/tags_index.rb
279-
@@ -2,10 +2,19 @@
280-
281-
class TagsIndex < Chewy::Index
282-
settings index: { refresh_interval: '15m' }, analysis: {
283-
+ char_filter: {
284-
+ tsconvert: {
285-
+ type: 'stconvert',
286-
+ keep_both: false,
287-
+ delimiter: '#',
288-
+ convert_type: 't2s',
289-
+ },
290-
+ },
291-
analyzer: {
292-
content: {
293-
- tokenizer: 'keyword',
294-
+ tokenizer: 'ik_max_word',
295-
filter: %w(lowercase asciifolding cjk_width),
296-
+ char_filter: %w(tsconvert),
297-
},
298-
299-
edge_ngram: {
300-
```
203+
After those are installed, you need to modify the code definitions which generate the search indices. Within every index definition file (`app/chewy/*_index.rb`), make the following changes:
204+
205+
- Replace all `tokenizer: 'VALUE'` (whitespace, standard, keyword, etc) occurrences with `tokenizer: 'ik_max_word'`
206+
- In every index that has an `analyzer: { content: ... }` definition, between the `filter` and `analyzer` sections, add:
207+
208+
```ruby
209+
char_filter: {
210+
tsconvert: {
211+
type: 'stconvert',
212+
keep_both: false,
213+
delimiter: '#',
214+
convert_type: 't2s',
215+
},
216+
},
217+
```
218+
219+
- In those same files, in every `content: ...` section, add an option of `char_filter: %w(tsconvert)` to use that filter

0 commit comments

Comments
 (0)