Skip to content

Commit 680ba25

Browse files
committed
Document Node Analysis Components API
This functionality was introduced in PR opensearch-project/OpenSearch#10296 Signed-off-by: Lukáš Vlček <[email protected]>
1 parent 9c8180e commit 680ba25

File tree

2 files changed

+313
-0
lines changed

2 files changed

+313
-0
lines changed

_analyzers/index.md

Lines changed: 311 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,317 @@ Analyzer | Analysis performed | Analyzer output
6464

6565
If needed, you can combine tokenizers, token filters, and character filters to create a custom analyzer.
6666

67+
With the introduction of OpenSearch `v2.12.1`, you can retrieve a comprehensive list of all available text analysis components by using [Nodes Info]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-info/). This can be helpful when building custom analyzers, especially in cases where you need to recall the component's name or identify the analysis plugin to which the component belongs.
68+
69+
Introduced 2.12.1
70+
{: .label .label-purple }
71+
72+
```json
73+
GET /_nodes/analysis_components?pretty=true&filter_path=nodes.*.analysis_components
74+
```
75+
{% include copy-curl.html %}
76+
77+
The following is an example response from a node that includes a `common-analysis` module (a module that is present by default):
78+
79+
<details open markdown="block">
80+
<summary>
81+
Response
82+
</summary>
83+
{: .text-delta}
84+
85+
```json
86+
{
87+
"nodes" : {
88+
"cZidmv5kQbWQN8M8dz9f5g" : {
89+
"analysis_components" : {
90+
"analyzers" : [
91+
"arabic",
92+
"armenian",
93+
"basque",
94+
"bengali",
95+
"brazilian",
96+
"bulgarian",
97+
"catalan",
98+
"chinese",
99+
"cjk",
100+
"czech",
101+
"danish",
102+
"default",
103+
"dutch",
104+
"english",
105+
"estonian",
106+
"fingerprint",
107+
"finnish",
108+
"french",
109+
"galician",
110+
"german",
111+
"greek",
112+
"hindi",
113+
"hungarian",
114+
"indonesian",
115+
"irish",
116+
"italian",
117+
"keyword",
118+
"latvian",
119+
"lithuanian",
120+
"norwegian",
121+
"pattern",
122+
"persian",
123+
"portuguese",
124+
"romanian",
125+
"russian",
126+
"simple",
127+
"snowball",
128+
"sorani",
129+
"spanish",
130+
"standard",
131+
"stop",
132+
"swedish",
133+
"thai",
134+
"turkish",
135+
"whitespace"
136+
],
137+
"tokenizers" : [
138+
"PathHierarchy",
139+
"char_group",
140+
"classic",
141+
"edgeNGram",
142+
"edge_ngram",
143+
"keyword",
144+
"letter",
145+
"lowercase",
146+
"nGram",
147+
"ngram",
148+
"path_hierarchy",
149+
"pattern",
150+
"simple_pattern",
151+
"simple_pattern_split",
152+
"standard",
153+
"thai",
154+
"uax_url_email",
155+
"whitespace"
156+
],
157+
"tokenFilters" : [
158+
"apostrophe",
159+
"arabic_normalization",
160+
"arabic_stem",
161+
"asciifolding",
162+
"bengali_normalization",
163+
"brazilian_stem",
164+
"cjk_bigram",
165+
"cjk_width",
166+
"classic",
167+
"common_grams",
168+
"concatenate_graph",
169+
"condition",
170+
"czech_stem",
171+
"decimal_digit",
172+
"delimited_payload",
173+
"delimited_term_freq",
174+
"dictionary_decompounder",
175+
"dutch_stem",
176+
"edgeNGram",
177+
"edge_ngram",
178+
"elision",
179+
"fingerprint",
180+
"flatten_graph",
181+
"french_stem",
182+
"german_normalization",
183+
"german_stem",
184+
"hindi_normalization",
185+
"hunspell",
186+
"hyphenation_decompounder",
187+
"indic_normalization",
188+
"keep",
189+
"keep_types",
190+
"keyword_marker",
191+
"kstem",
192+
"length",
193+
"limit",
194+
"lowercase",
195+
"min_hash",
196+
"multiplexer",
197+
"nGram",
198+
"ngram",
199+
"pattern_capture",
200+
"pattern_replace",
201+
"persian_normalization",
202+
"porter_stem",
203+
"predicate_token_filter",
204+
"remove_duplicates",
205+
"reverse",
206+
"russian_stem",
207+
"scandinavian_folding",
208+
"scandinavian_normalization",
209+
"serbian_normalization",
210+
"shingle",
211+
"snowball",
212+
"sorani_normalization",
213+
"standard",
214+
"stemmer",
215+
"stemmer_override",
216+
"stop",
217+
"synonym",
218+
"synonym_graph",
219+
"trim",
220+
"truncate",
221+
"unique",
222+
"uppercase",
223+
"word_delimiter",
224+
"word_delimiter_graph"
225+
],
226+
"charFilters" : [
227+
"html_strip",
228+
"mapping",
229+
"pattern_replace"
230+
],
231+
"normalizers" : [
232+
"lowercase"
233+
],
234+
"plugins" : [
235+
{
236+
"name" : "analysis-common",
237+
"classname" : "org.opensearch.analysis.common.CommonAnalysisModulePlugin",
238+
"analyzers" : [
239+
"arabic",
240+
"armenian",
241+
"basque",
242+
"bengali",
243+
"brazilian",
244+
"bulgarian",
245+
"catalan",
246+
"chinese",
247+
"cjk",
248+
"czech",
249+
"danish",
250+
"dutch",
251+
"english",
252+
"estonian",
253+
"fingerprint",
254+
"finnish",
255+
"french",
256+
"galician",
257+
"german",
258+
"greek",
259+
"hindi",
260+
"hungarian",
261+
"indonesian",
262+
"irish",
263+
"italian",
264+
"latvian",
265+
"lithuanian",
266+
"norwegian",
267+
"pattern",
268+
"persian",
269+
"portuguese",
270+
"romanian",
271+
"russian",
272+
"snowball",
273+
"sorani",
274+
"spanish",
275+
"swedish",
276+
"thai",
277+
"turkish"
278+
],
279+
"tokenizers" : [
280+
"PathHierarchy",
281+
"char_group",
282+
"classic",
283+
"edgeNGram",
284+
"edge_ngram",
285+
"keyword",
286+
"letter",
287+
"lowercase",
288+
"nGram",
289+
"ngram",
290+
"path_hierarchy",
291+
"pattern",
292+
"simple_pattern",
293+
"simple_pattern_split",
294+
"thai",
295+
"uax_url_email",
296+
"whitespace"
297+
],
298+
"tokenFilters" : [
299+
"apostrophe",
300+
"arabic_normalization",
301+
"arabic_stem",
302+
"asciifolding",
303+
"bengali_normalization",
304+
"brazilian_stem",
305+
"cjk_bigram",
306+
"cjk_width",
307+
"classic",
308+
"common_grams",
309+
"concatenate_graph",
310+
"condition",
311+
"czech_stem",
312+
"decimal_digit",
313+
"delimited_payload",
314+
"delimited_term_freq",
315+
"dictionary_decompounder",
316+
"dutch_stem",
317+
"edgeNGram",
318+
"edge_ngram",
319+
"elision",
320+
"fingerprint",
321+
"flatten_graph",
322+
"french_stem",
323+
"german_normalization",
324+
"german_stem",
325+
"hindi_normalization",
326+
"hyphenation_decompounder",
327+
"indic_normalization",
328+
"keep",
329+
"keep_types",
330+
"keyword_marker",
331+
"kstem",
332+
"length",
333+
"limit",
334+
"lowercase",
335+
"min_hash",
336+
"multiplexer",
337+
"nGram",
338+
"ngram",
339+
"pattern_capture",
340+
"pattern_replace",
341+
"persian_normalization",
342+
"porter_stem",
343+
"predicate_token_filter",
344+
"remove_duplicates",
345+
"reverse",
346+
"russian_stem",
347+
"scandinavian_folding",
348+
"scandinavian_normalization",
349+
"serbian_normalization",
350+
"snowball",
351+
"sorani_normalization",
352+
"stemmer",
353+
"stemmer_override",
354+
"synonym",
355+
"synonym_graph",
356+
"trim",
357+
"truncate",
358+
"unique",
359+
"uppercase",
360+
"word_delimiter",
361+
"word_delimiter_graph"
362+
],
363+
"charFilters" : [
364+
"html_strip",
365+
"mapping",
366+
"pattern_replace"
367+
],
368+
"hunspellDictionaries" : [ ]
369+
}
370+
]
371+
}
372+
}
373+
}
374+
}
375+
```
376+
</details>
377+
67378
## Text analysis at indexing time and query time
68379

69380
OpenSearch performs text analysis on text fields when you index a document and when you send a search request. Depending on the time of text analysis, the analyzers used for it are classified as follows:

_api-reference/nodes-apis/nodes-info.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ plugins | Information about installed plugins and modules.
6969
ingest | Information about ingest pipelines and available ingest processors.
7070
aggregations | Information about available [aggregations]({{site.url}}{{site.baseurl}}/opensearch/aggregations).
7171
indices | Static index settings configured at the node level.
72+
analysis_components | Information about available [text analysis]({{site.url}}{{site.baseurl}}/analyzers/) components.
7273

7374
## Query parameters
7475

@@ -162,6 +163,7 @@ plugins | Information about the installed plugins, including name, version, Open
162163
modules | Information about the modules, including name, version, OpenSearch version, Java version, description, class name, custom folder name, a list of extended plugins, and `has_native_controller`, which specifies whether the plugin has a native controller process. Modules are different from plugins because modules are loaded into OpenSearch automatically, while plugins have to be installed manually.
163164
ingest | Information about ingest pipelines and processors.
164165
aggregations | Information about the available aggregation types.
166+
analysis_components | Information about available [text analysis]({{site.url}}{{site.baseurl}}/analyzers/) components.
165167

166168

167169
## Required permissions

0 commit comments

Comments
 (0)