Replace outdated inline code patch with instructions (#1836)

mjankowski · web-flow · commit 9740b2a38626 · 2025-11-24T18:06:19.000Z
diff --git a/content/en/admin/elasticsearch.md b/content/en/admin/elasticsearch.md
@@ -184,117 +184,36 @@ Creating Elasticsearch indices could require more memory than the JVM (Java Virt
 {{< /hint >}}
 
 ## Search optimization for other languages
+
+The default analyzer is tuned for English and other western languages, and may not perform as well with others. This configuration can be modified for any language that Elasticsearch supports. Reviewing the [chewy index docs] may be useful to prepare for these changes.
+
+{{< hint style="warning" >}}
+Adding language support will require code changes and should only be attempted if you are comfortable modifying Ruby code and installing ES extensions.
+{{< /hint >}}
+
+[chewy index docs]: https://github.com/toptal/chewy?tab=readme-ov-file#index-definition
+
 ### Chinese search optimization {#chinese-search-optimization}
 
-The standard analyzer is the default for Elasticsearch, but for some languages like Chinese it may not be the optimal choice. To enhance the search experience, consider installing a language-specific analyzer. Before creating indices in Elasticsearch, be sure to install the following extensions:
+Before creating indices in Elasticsearch, be sure to install the following extensions:
 
 - [elasticsearch-analysis-ik](https://github.com/medcl/elasticsearch-analysis-ik)
 - [elasticsearch-analysis-stconvert](https://github.com/medcl/elasticsearch-analysis-stconvert)
 
-And then modify Mastodon's index definition as follows:
-
-```diff
-diff --git a/app/chewy/accounts_index.rb b/app/chewy/accounts_index.rb
---- a/app/chewy/accounts_index.rb
-+++ b/app/chewy/accounts_index.rb
-@@ -4,7 +4,7 @@ class AccountsIndex < Chewy::Index
-   settings index: { refresh_interval: '5m' }, analysis: {
-     analyzer: {
-       content: {
--        tokenizer: 'whitespace',
-+        tokenizer: 'ik_max_word',
-         filter: %w(lowercase asciifolding cjk_width),
-       },
-
-diff --git a/app/chewy/public_statuses_index.rb b/app/chewy/public_statuses_index.rb
---- a/app/chewy/public_statuses_index.rb
-+++ b/app/chewy/public_statuses_index.rb
-@@ -19,6 +19,15 @@ class PublicStatusesIndex < Chewy::Index
-       },
-     },
- 
-+    char_filter: {
-+      tsconvert: {
-+        type: 'stconvert',
-+        keep_both: false,
-+        delimiter: '#',
-+        convert_type: 't2s',
-+      },
-+    },
-+
-     analyzer: {
-       verbatim: {
-         tokenizer: 'uax_url_email',
-@@ -26,7 +35,7 @@ class PublicStatusesIndex < Chewy::Index
-       },
- 
-       content: {
--        tokenizer: 'standard',
-+        tokenizer: 'ik_max_word',
-         filter: %w(
-           lowercase
-           asciifolding
-@@ -36,6 +45,7 @@ class PublicStatusesIndex < Chewy::Index
-           english_stop
-           english_stemmer
-         ),
-+        char_filter: %w(tsconvert),
-       },
- 
-       hashtag: {
-
-diff --git a/app/chewy/statuses_index.rb b/app/chewy/statuses_index.rb
---- a/app/chewy/statuses_index.rb
-+++ b/app/chewy/statuses_index.rb
-@@ -16,9 +16,17 @@ class StatusesIndex < Chewy::Index
-         language: 'possessive_english',
-       },
-     },
-+    char_filter: {
-+      tsconvert: {
-+        type: 'stconvert',
-+        keep_both: false,
-+        delimiter: '#',
-+        convert_type: 't2s',
-+      },
-+    },
-     analyzer: {
-       content: {
--        tokenizer: 'uax_url_email',
-+        tokenizer: 'ik_max_word',
-         filter: %w(
-           english_possessive_stemmer
-           lowercase
-@@ -27,6 +35,7 @@ class StatusesIndex < Chewy::Index
-           english_stop
-           english_stemmer
-         ),
-+        char_filter: %w(tsconvert),
-       },
-     },
-   }
-diff --git a/app/chewy/tags_index.rb b/app/chewy/tags_index.rb
---- a/app/chewy/tags_index.rb
-+++ b/app/chewy/tags_index.rb
-@@ -2,10 +2,19 @@
-
- class TagsIndex < Chewy::Index
-   settings index: { refresh_interval: '15m' }, analysis: {
-+    char_filter: {
-+      tsconvert: {
-+        type: 'stconvert',
-+        keep_both: false,
-+        delimiter: '#',
-+        convert_type: 't2s',
-+      },
-+    },
-     analyzer: {
-       content: {
--        tokenizer: 'keyword',
-+        tokenizer: 'ik_max_word',
-         filter: %w(lowercase asciifolding cjk_width),
-+        char_filter: %w(tsconvert),
-       },
-
-       edge_ngram: {
-```
+After those are installed, you need to modify the code definitions which generate the search indices. Within every index definition file (`app/chewy/*_index.rb`), make the following changes:
+
+- Replace all `tokenizer: 'VALUE'` (whitespace, standard, keyword, etc) occurrences with `tokenizer: 'ik_max_word'`
+- In every index that has an `analyzer: { content: ... }` definition, between the `filter` and `analyzer` sections, add:
+
+  ```ruby
+    char_filter: {
+      tsconvert: {
+        type: 'stconvert',
+        keep_both: false,
+        delimiter: '#',
+        convert_type: 't2s',
+      },
+    },
+  ```
+
+- In those same files, in every `content: ...` section, add an option of `char_filter: %w(tsconvert)` to use that filter