Skip to content

Commit 041bdb7

Browse files
authored
Clarify synonyms docs (#110822) (#111010)
1 parent 11bf50f commit 041bdb7

File tree

5 files changed

+220
-86
lines changed

5 files changed

+220
-86
lines changed

docs/reference/analysis/tokenfilters/synonym-graph-tokenfilter.asciidoc

Lines changed: 94 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -85,45 +85,45 @@ Additional settings are:
8585
<<indices-reload-analyzers,reloading>> search analyzers to pick up
8686
changes to synonym files. Only to be used for search analyzers.
8787
* `expand` (defaults to `true`).
88-
* `lenient` (defaults to `false`). If `true` ignores exceptions while parsing the synonym configuration. It is important
89-
to note that only those synonym rules which cannot get parsed are ignored. For instance consider the following request:
90-
91-
[source,console]
92-
--------------------------------------------------
93-
PUT /test_index
94-
{
95-
"settings": {
96-
"index": {
97-
"analysis": {
98-
"analyzer": {
99-
"synonym": {
100-
"tokenizer": "standard",
101-
"filter": [ "my_stop", "synonym_graph" ]
102-
}
103-
},
104-
"filter": {
105-
"my_stop": {
106-
"type": "stop",
107-
"stopwords": [ "bar" ]
108-
},
109-
"synonym_graph": {
110-
"type": "synonym_graph",
111-
"lenient": true,
112-
"synonyms": [ "foo, bar => baz" ]
113-
}
114-
}
115-
}
116-
}
117-
}
118-
}
119-
--------------------------------------------------
88+
Expands definitions for equivalent synonym rules.
89+
See <<synonym-graph-tokenizer-expand-equivalent-synonyms,expand equivalent synonyms>>.
90+
* `lenient` (defaults to `false`).
91+
If `true` ignores errors while parsing the synonym configuration.
92+
It is important to note that only those synonym rules which cannot get parsed are ignored.
93+
See <<synonym-graph-tokenizer-stop-token-filter,synonyms and stop token filters>> for an example of `lenient` behaviour for invalid synonym rules.
94+
95+
[discrete]
96+
[[synonym-graph-tokenizer-expand-equivalent-synonyms]]
97+
===== `expand` equivalent synonym rules
98+
99+
The `expand` parameter controls whether to expand equivalent synonym rules.
100+
Consider a synonym defined like:
101+
102+
`foo, bar, baz`
103+
104+
Using `expand: true`, the synonym rule would be expanded into:
120105

121-
With the above request the word `bar` gets skipped but a mapping `foo => baz` is still added. However, if the mapping
122-
being added was `foo, baz => bar` nothing would get added to the synonym list. This is because the target word for the
123-
mapping is itself eliminated because it was a stop word. Similarly, if the mapping was "bar, foo, baz" and `expand` was
124-
set to `false` no mapping would get added as when `expand=false` the target mapping is the first word. However, if
125-
`expand=true` then the mappings added would be equivalent to `foo, baz => foo, baz` i.e, all mappings other than the
126-
stop word.
106+
```
107+
foo => foo
108+
foo => bar
109+
foo => baz
110+
bar => foo
111+
bar => bar
112+
bar => baz
113+
baz => foo
114+
baz => bar
115+
baz => baz
116+
```
117+
118+
When `expand` is set to `false`, the synonym rule is not expanded and the first synonym is treated as the canonical representation. The synonym would be equivalent to:
119+
120+
```
121+
foo => foo
122+
bar => foo
123+
baz => foo
124+
```
125+
126+
The `expand` parameter does not affect explicit synonym rules, like `foo, bar => baz`.
127127

128128
[discrete]
129129
[[synonym-graph-tokenizer-ignore_case-deprecated]]
@@ -160,12 +160,65 @@ Text will be processed first through filters preceding the synonym filter before
160160
{es} will also use the token filters preceding the synonym filter in a tokenizer chain to parse the entries in a synonym file or synonym set.
161161
In the above example, the synonyms graph token filter is placed after a stemmer. The stemmer will also be applied to the synonym entries.
162162

163-
The synonym rules should not contain words that are removed by a filter that appears later in the chain (like a `stop` filter).
164-
Removing a term from a synonym rule means there will be no matching for it at query time.
165-
166163
Because entries in the synonym map cannot have stacked positions, some token filters may cause issues here.
167164
Token filters that produce multiple versions of a token may choose which version of the token to emit when parsing synonyms.
168165
For example, `asciifolding` will only produce the folded version of the token.
169166
Others, like `multiplexer`, `word_delimiter_graph` or `ngram` will throw an error.
170167

171168
If you need to build analyzers that include both multi-token filters and synonym filters, consider using the <<analysis-multiplexer-tokenfilter,multiplexer>> filter, with the multi-token filters in one branch and the synonym filter in the other.
169+
170+
[discrete]
171+
[[synonym-graph-tokenizer-stop-token-filter]]
172+
===== Synonyms and `stop` token filters
173+
174+
Synonyms and <<analysis-stop-tokenfilter,stop token filters>> interact with each other in the following ways:
175+
176+
[discrete]
177+
====== Stop token filter *before* synonym token filter
178+
179+
Stop words will be removed from the synonym rule definition.
180+
This can can cause errors on the synonym rule.
181+
182+
[WARNING]
183+
====
184+
Invalid synonym rules can cause errors when applying analyzer changes.
185+
For reloadable analyzers, this prevents reloading and applying changes.
186+
You must correct errors in the synonym rules and reload the analyzer.
187+
188+
An index with invalid synonym rules cannot be reopened, making it inoperable when:
189+
190+
* A node containing the index starts
191+
* The index is opened from a closed state
192+
* A node restart occurs (which reopens the node assigned shards)
193+
====
194+
195+
For *explicit synonym rules* like `foo, bar => baz` with a stop filter that removes `bar`:
196+
197+
- If `lenient` is set to `false`, an error will be raised as `bar` would be removed from the left hand side of the synonym rule.
198+
- If `lenient` is set to `true`, the rule `foo => baz` will be added and `bar => baz` will be ignored.
199+
200+
If the stop filter removed `baz` instead:
201+
202+
- If `lenient` is set to `false`, an error will be raised as `baz` would be removed from the right hand side of the synonym rule.
203+
- If `lenient` is set to `true`, the synonym will have no effect as the target word is removed.
204+
205+
For *equivalent synonym rules* like `foo, bar, baz` and `expand: true, with a stop filter that removes `bar`:
206+
207+
- If `lenient` is set to `false`, an error will be raised as `bar` would be removed from the synonym rule.
208+
- If `lenient` is set to `true`, the synonyms added would be equivalent to the following synonym rules, which do not contain the removed word:
209+
210+
```
211+
foo => foo
212+
foo => baz
213+
baz => foo
214+
baz => baz
215+
```
216+
217+
[discrete]
218+
====== Stop token filter *after* synonym token filter
219+
220+
The stop filter will remove the terms from the resulting synonym expansion.
221+
222+
For example, a synonym rule like `foo, bar => baz` and a stop filter that removes `baz` will get no matches for `foo` or `bar`, as both would get expanded to `baz` which is removed by the stop filter.
223+
224+
If the stop filter removed `foo` instead, then searching for `foo` would get expanded to `baz`, which is not removed by the stop filter thus potentially providing matches for `baz`.

docs/reference/analysis/tokenfilters/synonym-tokenfilter.asciidoc

Lines changed: 95 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -73,47 +73,45 @@ Additional settings are:
7373
<<indices-reload-analyzers,reloading>> search analyzers to pick up
7474
changes to synonym files. Only to be used for search analyzers.
7575
* `expand` (defaults to `true`).
76-
* `lenient` (defaults to `false`). If `true` ignores exceptions while parsing the synonym configuration. It is important
77-
to note that only those synonym rules which cannot get parsed are ignored. For instance consider the following request:
78-
79-
80-
[source,console]
81-
--------------------------------------------------
82-
PUT /test_index
83-
{
84-
"settings": {
85-
"index": {
86-
"analysis": {
87-
"analyzer": {
88-
"synonym": {
89-
"tokenizer": "standard",
90-
"filter": [ "my_stop", "synonym" ]
91-
}
92-
},
93-
"filter": {
94-
"my_stop": {
95-
"type": "stop",
96-
"stopwords": [ "bar" ]
97-
},
98-
"synonym": {
99-
"type": "synonym",
100-
"lenient": true,
101-
"synonyms": [ "foo, bar => baz" ]
102-
}
103-
}
104-
}
105-
}
106-
}
107-
}
108-
--------------------------------------------------
76+
Expands definitions for equivalent synonym rules.
77+
See <<synonym-tokenizer-expand-equivalent-synonyms,expand equivalent synonyms>>.
78+
* `lenient` (defaults to `false`).
79+
If `true` ignores errors while parsing the synonym configuration.
80+
It is important to note that only those synonym rules which cannot get parsed are ignored.
81+
See <<synonym-tokenizer-stop-token-filter,synonyms and stop token filters>> for an example of `lenient` behaviour for invalid synonym rules.
82+
83+
[discrete]
84+
[[synonym-tokenizer-expand-equivalent-synonyms]]
85+
===== `expand` equivalent synonym rules
86+
87+
The `expand` parameter controls whether to expand equivalent synonym rules.
88+
Consider a synonym defined like:
89+
90+
`foo, bar, baz`
91+
92+
Using `expand: true`, the synonym rule would be expanded into:
10993

110-
With the above request the word `bar` gets skipped but a mapping `foo => baz` is still added. However, if the mapping
111-
being added was `foo, baz => bar` nothing would get added to the synonym list. This is because the target word for the
112-
mapping is itself eliminated because it was a stop word. Similarly, if the mapping was "bar, foo, baz" and `expand` was
113-
set to `false` no mapping would get added as when `expand=false` the target mapping is the first word. However, if
114-
`expand=true` then the mappings added would be equivalent to `foo, baz => foo, baz` i.e, all mappings other than the
115-
stop word.
94+
```
95+
foo => foo
96+
foo => bar
97+
foo => baz
98+
bar => foo
99+
bar => bar
100+
bar => baz
101+
baz => foo
102+
baz => bar
103+
baz => baz
104+
```
116105

106+
When `expand` is set to `false`, the synonym rule is not expanded and the first synonym is treated as the canonical representation. The synonym would be equivalent to:
107+
108+
```
109+
foo => foo
110+
bar => foo
111+
baz => foo
112+
```
113+
114+
The `expand` parameter does not affect explicit synonym rules, like `foo, bar => baz`.
117115

118116
[discrete]
119117
[[synonym-tokenizer-ignore_case-deprecated]]
@@ -135,7 +133,7 @@ To apply synonyms, you will need to include a synonym token filters into an anal
135133
"my_analyzer": {
136134
"type": "custom",
137135
"tokenizer": "standard",
138-
"filter": ["stemmer", "synonym_graph"]
136+
"filter": ["stemmer", "synonym"]
139137
}
140138
}
141139
----
@@ -148,14 +146,67 @@ Order is important for your token filters.
148146
Text will be processed first through filters preceding the synonym filter before being processed by the synonym filter.
149147

150148
{es} will also use the token filters preceding the synonym filter in a tokenizer chain to parse the entries in a synonym file or synonym set.
151-
In the above example, the synonyms graph token filter is placed after a stemmer. The stemmer will also be applied to the synonym entries.
152-
153-
The synonym rules should not contain words that are removed by a filter that appears later in the chain (like a `stop` filter).
154-
Removing a term from a synonym rule means there will be no matching for it at query time.
149+
In the above example, the synonyms token filter is placed after a stemmer. The stemmer will also be applied to the synonym entries.
155150

156151
Because entries in the synonym map cannot have stacked positions, some token filters may cause issues here.
157152
Token filters that produce multiple versions of a token may choose which version of the token to emit when parsing synonyms.
158153
For example, `asciifolding` will only produce the folded version of the token.
159154
Others, like `multiplexer`, `word_delimiter_graph` or `ngram` will throw an error.
160155

161156
If you need to build analyzers that include both multi-token filters and synonym filters, consider using the <<analysis-multiplexer-tokenfilter,multiplexer>> filter, with the multi-token filters in one branch and the synonym filter in the other.
157+
158+
[discrete]
159+
[[synonym-tokenizer-stop-token-filter]]
160+
===== Synonyms and `stop` token filters
161+
162+
Synonyms and <<analysis-stop-tokenfilter,stop token filters>> interact with each other in the following ways:
163+
164+
[discrete]
165+
====== Stop token filter *before* synonym token filter
166+
167+
Stop words will be removed from the synonym rule definition.
168+
This can can cause errors on the synonym rule.
169+
170+
[WARNING]
171+
====
172+
Invalid synonym rules can cause errors when applying analyzer changes.
173+
For reloadable analyzers, this prevents reloading and applying changes.
174+
You must correct errors in the synonym rules and reload the analyzer.
175+
176+
An index with invalid synonym rules cannot be reopened, making it inoperable when:
177+
178+
* A node containing the index starts
179+
* The index is opened from a closed state
180+
* A node restart occurs (which reopens the node assigned shards)
181+
====
182+
183+
For *explicit synonym rules* like `foo, bar => baz` with a stop filter that removes `bar`:
184+
185+
- If `lenient` is set to `false`, an error will be raised as `bar` would be removed from the left hand side of the synonym rule.
186+
- If `lenient` is set to `true`, the rule `foo => baz` will be added and `bar => baz` will be ignored.
187+
188+
If the stop filter removed `baz` instead:
189+
190+
- If `lenient` is set to `false`, an error will be raised as `baz` would be removed from the right hand side of the synonym rule.
191+
- If `lenient` is set to `true`, the synonym will have no effect as the target word is removed.
192+
193+
For *equivalent synonym rules* like `foo, bar, baz` and `expand: true, with a stop filter that removes `bar`:
194+
195+
- If `lenient` is set to `false`, an error will be raised as `bar` would be removed from the synonym rule.
196+
- If `lenient` is set to `true`, the synonyms added would be equivalent to the following synonym rules, which do not contain the removed word:
197+
198+
```
199+
foo => foo
200+
foo => baz
201+
baz => foo
202+
baz => baz
203+
```
204+
205+
[discrete]
206+
====== Stop token filter *after* synonym token filter
207+
208+
The stop filter will remove the terms from the resulting synonym expansion.
209+
210+
For example, a synonym rule like `foo, bar => baz` and a stop filter that removes `baz` will get no matches for `foo` or `bar`, as both would get expanded to `baz` which is removed by the stop filter.
211+
212+
If the stop filter removed `foo` instead, then searching for `foo` would get expanded to `baz`, which is not removed by the stop filter thus potentially providing matches for `baz`.

docs/reference/analysis/tokenfilters/synonyms-format.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ This format uses two different definitions:
1515
ipod, i-pod, i pod
1616
computer, pc, laptop
1717
----
18-
* Explicit mappings: Matches a group of words to other words. Words on the left hand side of the rule definition are expanded into all the possibilities described on the right hand side. Example:
18+
* Explicit synonyms: Matches a group of words to other words. Words on the left hand side of the rule definition are expanded into all the possibilities described on the right hand side. Example:
1919
+
2020
[source,synonyms]
2121
----

docs/reference/search/search-your-data/search-with-synonyms.asciidoc

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,19 @@ If an index is created referencing a nonexistent synonyms set, the index will re
8282
The only way to recover from this scenario is to ensure the synonyms set exists then either delete and re-create the index, or close and re-open the index.
8383
======
8484

85+
[WARNING]
86+
====
87+
Invalid synonym rules can cause errors when applying analyzer changes.
88+
For reloadable analyzers, this prevents reloading and applying changes.
89+
You must correct errors in the synonym rules and reload the analyzer.
90+
91+
An index with invalid synonym rules cannot be reopened, making it inoperable when:
92+
93+
* A node containing the index starts
94+
* The index is opened from a closed state
95+
* A node restart occurs (which reopens the node assigned shards)
96+
====
97+
8598
{es} uses synonyms as part of the <<analysis-overview,analysis process>>.
8699
You can use two types of <<analysis-tokenfilters,token filter>> to include synonyms:
87100

docs/reference/synonyms/apis/synonyms-apis.asciidoc

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,23 @@ NOTE: Synonyms sets are limited to a maximum of 10,000 synonym rules per set.
2424
Synonym sets with more than 10,000 synonym rules will provide inconsistent search results.
2525
If you need to manage more synonym rules, you can create multiple synonyms sets.
2626

27+
WARNING: Synonyms sets must exist before they can be added to indices.
28+
If an index is created referencing a nonexistent synonyms set, the index will remain in a partially created and inoperable state.
29+
The only way to recover from this scenario is to ensure the synonyms set exists then either delete and re-create the index, or close and re-open the index.
30+
31+
[WARNING]
32+
====
33+
Invalid synonym rules can cause errors when applying analyzer changes.
34+
For reloadable analyzers, this prevents reloading and applying changes.
35+
You must correct errors in the synonym rules and reload the analyzer.
36+
37+
An index with invalid synonym rules cannot be reopened, making it inoperable when:
38+
39+
* A node containing the index starts
40+
* The index is opened from a closed state
41+
* A node restart occurs (which reopens the node assigned shards)
42+
====
43+
2744
[discrete]
2845
[[synonyms-sets-apis]]
2946
=== Synonyms sets APIs

0 commit comments

Comments
 (0)