Skip to content

Commit d8e3caa

Browse files
vidarladriendupuisdabrt
committed
IBX-8104: How to configure Elasticsearch with language analyzers (#2366)
--------- Co-authored-by: Vidar Langseid <[email protected]> Co-authored-by: Adrien Dupuis <[email protected]> Co-authored-by: Tomasz Dąbrowski <[email protected]>
1 parent 9601d7d commit d8e3caa

File tree

2 files changed

+185
-6
lines changed

2 files changed

+185
-6
lines changed
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
ibexa_elasticsearch:
2+
index_templates:
3+
english:
4+
patterns:
5+
- '*_eng_gb*'
6+
settings:
7+
analysis:
8+
normalizer:
9+
lowercase_normalizer:
10+
type: custom
11+
char_filter: []
12+
filter:
13+
- lowercase
14+
analyzer:
15+
english_analyzer:
16+
type: custom
17+
tokenizer: lowercase
18+
filter:
19+
- lowercase
20+
- english_stop
21+
- english_keywords
22+
- english_stemmer
23+
- english_possessive_stemmer
24+
ibexa_spellcheck_analyzer:
25+
type: custom
26+
tokenizer: lowercase
27+
filter:
28+
- lowercase
29+
- ibexa_spellcheck_shingle_filter
30+
ibexa_spellcheck_raw_analyzer:
31+
type: custom
32+
tokenizer: standard
33+
filter:
34+
- lowercase
35+
- english_possessive_stemmer
36+
filter:
37+
ibexa_spellcheck_shingle_filter:
38+
type: shingle
39+
min_shingle_size: 2
40+
max_shingle_size: 3
41+
english_stop:
42+
type: stop
43+
stopwords: '_english_'
44+
english_keywords:
45+
type: keyword_marker
46+
keywords: []
47+
english_stemmer:
48+
type: stemmer
49+
language: light_english
50+
english_possessive_stemmer:
51+
type: stemmer
52+
language: possessive_english
53+
refresh_interval: "-1"
54+
mappings:
55+
dynamic_templates:
56+
- ez_int:
57+
match: "*_i"
58+
mapping:
59+
type: integer
60+
- ez_mint:
61+
match: "*_mi"
62+
mapping:
63+
type: integer
64+
- ez_id:
65+
match: "*_id"
66+
mapping:
67+
type: keyword
68+
- ez_mid:
69+
match: "*_mid"
70+
mapping:
71+
type: keyword
72+
- ez_string:
73+
match: "*_s"
74+
mapping:
75+
type: keyword
76+
normalizer: lowercase_normalizer
77+
- ez_mstring:
78+
match: "*_ms"
79+
mapping:
80+
type: keyword
81+
normalizer: lowercase_normalizer
82+
- ez_long:
83+
match: "*_l"
84+
mapping:
85+
type: long
86+
- ez_mlong:
87+
match: "*_ml"
88+
mapping:
89+
type: long
90+
- ez_text:
91+
match: "*_t"
92+
mapping:
93+
type: text
94+
analyzer: english_analyzer
95+
- ez_text_fulltext:
96+
match: "*_fulltext"
97+
mapping:
98+
type: text
99+
analyzer: english_analyzer
100+
- ez_boolean:
101+
match: "*_b"
102+
mapping:
103+
type: boolean
104+
- ez_mboolean:
105+
match: "*_mb"
106+
mapping:
107+
type: boolean
108+
- ez_float:
109+
match: "*_f"
110+
mapping:
111+
type: float
112+
- ez_double:
113+
match: "*_d"
114+
mapping:
115+
type: double
116+
- ez_date:
117+
match: "*_dt"
118+
mapping:
119+
type: date
120+
- ez_geolocation:
121+
match: "*_gl"
122+
mapping:
123+
type: geo_point
124+
- ez_spellcheck:
125+
match: "*_spellcheck"
126+
mapping:
127+
type: text
128+
analyzer: ibexa_spellcheck_analyzer
129+
fields:
130+
raw:
131+
type: text
132+
analyzer: ibexa_spellcheck_raw_analyzer

docs/search/search_engines/elastic_search/configure_elastic_search.md

Lines changed: 53 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -475,19 +475,66 @@ ibexa_elasticsearch:
475475
# ...
476476
```
477477

478+
### Add language-specific analysers
479+
480+
You can configure Elasticsearch to perform language-specific analysis like stemming.
481+
This way searching for "cars" returns hits with content that contains the word "car".
482+
On a multilingual site, you can have different analyzers configured for different languages, something which is typically required because stemming rules are language-specific.
483+
484+
#### Make a copy of the default template
485+
486+
To enable a language-specific analyzer, create a new template for each language in `config/packages/ibexa_elasticsearch.yaml` first.
487+
This template should be based on the `default` template found in `vendor/ibexa/elasticsearch/src/bundle/Resources/config/default-config.yaml`.
488+
The name of the new template should indicate the language it applies to, for example `eng_gb`, `nor_no` or `fre_fr`.
489+
490+
#### Change match pattern for the new template
491+
492+
The default template matches on `*_location_*` and `*_content_*`.
493+
These patterns are not language-specific and you cannot use them if you plan to use different templates for different languages.
494+
In your copy of the default template, change the pattern as follows:
495+
496+
```diff
497+
patterns:
498+
- - '*_location_*'
499+
- - '*_content_*'
500+
+ - "*_eng_gb*"
501+
```
502+
503+
This pattern matches on English.
504+
For more information about specifying the pattern for your language, see [Define a template](#define-a-template).
505+
506+
#### Create config for language specific analyzer
507+
508+
For information about configuring an analyzer for each specific language, see [Elastic Search documentation](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/analysis-lang-analyzer.html).
509+
510+
An adoption of the [English analyzer](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/analysis-lang-analyzer.html#english-analyzer) in [[= product_name =]] configuration looks like this:
511+
512+
```yaml hl_lines="3-5 15-23 35 41-52 94 99"
513+
[[= include_file('code_samples/search/elasticsearch/config/packages/elasticsearch-en.yaml') =]]
514+
```
515+
516+
Then, you must bind this language template to your Elasticsearch connection.
517+
478518
## Bind templates with connections
479519

480-
Once you have created the field mapping template(s), you must establish a relationship between the templates and a connection. You do this by adding the "index_templates" key to a connection definition.
520+
After you create an index template (for example, for specific data types or linguistic analysis), you must link it to an Elasticsearch connection by adding the `index_templates` key to the connection definition.
481521

482522
If your configuration file contains several connection definitions, you can reuse the same template for different connections.
483523
If you have several index templates, you can apply different combinations of templates to different connections.
484524

485525
``` yaml
486-
<connection_name>:
487-
# ...
488-
index_templates:
489-
- default
490-
- default_en_us
526+
ibexa_elasticsearch:
527+
connections:
528+
<connection_for_english_only_repository>:
529+
# ...
530+
index_templates:
531+
- eng_gb
532+
<connection_for_multilangual_repository>:
533+
# ...
534+
index_templates:
535+
- eng_gb
536+
- fre_fr
537+
- ger_de
491538
```
492539

493540
For more information about how Elasticsearch handles settings and mappings from multiple templates that match the same index, see [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/7.x/indices-templates-v1.html#multiple-templates-v1).

0 commit comments

Comments
 (0)