Skip to content

Commit fa661d8

Browse files
chideatCopilot
andauthored
solutions for opensearch (#128)
* docs: add guide for IK Analyzer plugin installation with opensearch-operator * docs: update IK Analyzer test cases with longer sentence and add storageClass config * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent cfa09a1 commit fa661d8

File tree

1 file changed

+352
-0
lines changed

1 file changed

+352
-0
lines changed
Lines changed: 352 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,352 @@
1+
---
2+
products:
3+
- Alauda Application Services
4+
kind:
5+
- Solution
6+
---
7+
8+
# How to Install the IK Analyzer Plugin for OpenSearch Using opensearch-operator
9+
10+
:::info
11+
Applicable Version: OpenSearch Operator ~= 2.8.x, OpenSearch ~= 2.19.3 / 3.3.1
12+
:::
13+
14+
This document explains how to deploy an OpenSearch cluster with the [IK Analyzer](https://github.com/infinilabs/analysis-ik) plugin pre-installed using the opensearch-operator. The IK Analyzer is the most widely used Chinese text analysis plugin for OpenSearch/Elasticsearch, providing smart and maximum-granularity tokenization for Chinese text.
15+
16+
## How Plugin Installation Works
17+
18+
The opensearch-operator installs plugins by passing each entry in `pluginsList` to the `opensearch-plugin install` command during node startup. You need to configure `pluginsList` in two places:
19+
20+
| Field | Purpose |
21+
| :--- | :--- |
22+
| `spec.general.pluginsList` | Installs the plugin on all OpenSearch data/master nodes |
23+
| `spec.bootstrap.pluginsList` | Installs the plugin on the bootstrap pod used for initial cluster formation |
24+
25+
Both must be configured. If the bootstrap pod is missing the plugin while `additionalConfig` references it, cluster initialization may fail.
26+
27+
:::note
28+
Adding or modifying `pluginsList` on a running cluster will trigger a **rolling restart** of all nodes to install the new plugin.
29+
:::
30+
31+
## IK Analyzer Plugin Download URLs
32+
33+
| OpenSearch Version | Plugin Download URL |
34+
| :--- | :--- |
35+
| **2.19.3** | `https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-2.19.3.zip` |
36+
| **3.3.1** | `https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-3.3.1.zip` |
37+
38+
:::note
39+
Before applying, verify that the plugin URL for your OpenSearch version is available. Check the [Infinilabs releases page](https://github.com/infinilabs/analysis-ik/releases) to confirm the file exists. If the URL returns a 404, the cluster will fail to start.
40+
:::
41+
42+
:::warning Air-Gapped Environments
43+
If your Kubernetes cluster does not have external network access, download the plugin zip files first and host them on an internal HTTP server (e.g., Nexus, Artifactory, or Nginx). Then replace the download URLs in the configurations below with your internal accessible URLs.
44+
:::
45+
46+
## Deploy OpenSearch with IK Analyzer
47+
48+
### For OpenSearch 2.19.3
49+
50+
```yaml
51+
apiVersion: opensearch.opster.io/v1
52+
kind: OpenSearchCluster
53+
metadata:
54+
name: my-cluster
55+
namespace: <namespace>
56+
spec:
57+
general:
58+
serviceName: my-cluster
59+
version: 2.19.3
60+
setVMMaxMapCount: true
61+
pluginsList:
62+
- "https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-2.19.3.zip"
63+
bootstrap:
64+
pluginsList:
65+
- "https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-2.19.3.zip"
66+
security:
67+
tls:
68+
transport:
69+
generate: true
70+
perNode: true
71+
http:
72+
generate: true
73+
nodePools:
74+
- component: masters
75+
replicas: 3
76+
diskSize: "30Gi"
77+
persistence:
78+
pvc:
79+
storageClass: "<your-storage-class>"
80+
accessModes:
81+
- ReadWriteOnce
82+
roles:
83+
- "cluster_manager"
84+
- "data"
85+
resources:
86+
requests:
87+
memory: "2Gi"
88+
cpu: "500m"
89+
limits:
90+
memory: "2Gi"
91+
cpu: "500m"
92+
dashboards:
93+
enable: true
94+
version: 2.19.3
95+
replicas: 1
96+
resources:
97+
requests:
98+
memory: "512Mi"
99+
cpu: "200m"
100+
limits:
101+
memory: "512Mi"
102+
cpu: "200m"
103+
```
104+
105+
### For OpenSearch 3.3.1
106+
107+
```yaml
108+
apiVersion: opensearch.opster.io/v1
109+
kind: OpenSearchCluster
110+
metadata:
111+
name: my-cluster
112+
namespace: <namespace>
113+
spec:
114+
general:
115+
serviceName: my-cluster
116+
version: 3.3.1
117+
setVMMaxMapCount: true
118+
pluginsList:
119+
- "https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-3.3.1.zip"
120+
bootstrap:
121+
pluginsList:
122+
- "https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-3.3.1.zip"
123+
security:
124+
tls:
125+
transport:
126+
generate: true
127+
perNode: true
128+
http:
129+
generate: true
130+
nodePools:
131+
- component: masters
132+
replicas: 3
133+
diskSize: "30Gi"
134+
persistence:
135+
pvc:
136+
storageClass: "<your-storage-class>"
137+
accessModes:
138+
- ReadWriteOnce
139+
roles:
140+
- "cluster_manager"
141+
- "data"
142+
resources:
143+
requests:
144+
memory: "2Gi"
145+
cpu: "500m"
146+
limits:
147+
memory: "2Gi"
148+
cpu: "500m"
149+
dashboards:
150+
enable: true
151+
version: 3.3.0 # Dashboards 3.3.0 is the latest release compatible with OpenSearch 3.3.1
152+
replicas: 1
153+
resources:
154+
requests:
155+
memory: "512Mi"
156+
cpu: "200m"
157+
limits:
158+
memory: "512Mi"
159+
cpu: "200m"
160+
```
161+
162+
Apply the configuration:
163+
164+
```bash
165+
kubectl apply -f cluster.yaml
166+
```
167+
168+
## Verify the Plugin is Installed
169+
170+
After the cluster is running, verify the IK plugin is installed on a node:
171+
172+
```bash
173+
kubectl -n <namespace> exec my-cluster-masters-0 -- bin/opensearch-plugin list
174+
```
175+
176+
The output should include `analysis-ik`.
177+
178+
## Test IK Analyzer
179+
180+
Port-forward the OpenSearch service and run a quick tokenization test:
181+
182+
```bash
183+
kubectl -n <namespace> port-forward svc/my-cluster 9200
184+
```
185+
186+
**Test `ik_max_word` analyzer** (maximum granularity, splits text into all possible tokens):
187+
188+
```bash
189+
# The operator generates a self-signed cert; -k skips local certificate validation
190+
curl -k -u admin:admin -X POST "https://localhost:9200/_analyze" \
191+
-H "Content-Type: application/json" \
192+
-d '{
193+
"analyzer": "ik_max_word",
194+
"text": "自然语言处理技术在人工智能领域的应用越来越广泛"
195+
}'
196+
```
197+
198+
Expected output:
199+
200+
```json
201+
{
202+
"tokens": [
203+
{ "token": "自然语言", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 0 },
204+
{ "token": "自然", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 1 },
205+
{ "token": "语言", "start_offset": 2, "end_offset": 4, "type": "CN_WORD", "position": 2 },
206+
{ "token": "处理", "start_offset": 4, "end_offset": 6, "type": "CN_WORD", "position": 3 },
207+
{ "token": "技术", "start_offset": 6, "end_offset": 8, "type": "CN_WORD", "position": 4 },
208+
{ "token": "", "start_offset": 8, "end_offset": 9, "type": "CN_CHAR", "position": 5 },
209+
{ "token": "人工智能", "start_offset": 9, "end_offset": 13, "type": "CN_WORD", "position": 6 },
210+
{ "token": "人工", "start_offset": 9, "end_offset": 11, "type": "CN_WORD", "position": 7 },
211+
{ "token": "智能", "start_offset": 11, "end_offset": 13, "type": "CN_WORD", "position": 8 },
212+
{ "token": "领域", "start_offset": 13, "end_offset": 15, "type": "CN_WORD", "position": 9 },
213+
{ "token": "", "start_offset": 15, "end_offset": 16, "type": "CN_CHAR", "position": 10 },
214+
{ "token": "应用", "start_offset": 16, "end_offset": 18, "type": "CN_WORD", "position": 11 },
215+
{ "token": "越来越", "start_offset": 18, "end_offset": 21, "type": "CN_WORD", "position": 12 },
216+
{ "token": "越来", "start_offset": 18, "end_offset": 20, "type": "CN_WORD", "position": 13 },
217+
{ "token": "", "start_offset": 20, "end_offset": 21, "type": "CN_CHAR", "position": 14 },
218+
{ "token": "广泛", "start_offset": 21, "end_offset": 23, "type": "CN_WORD", "position": 15 }
219+
]
220+
}
221+
```
222+
223+
**Test `ik_smart` analyzer** (coarse-grained, splits into the fewest tokens):
224+
225+
```bash
226+
# The operator generates a self-signed cert; -k skips local certificate validation
227+
curl -k -u admin:admin -X POST "https://localhost:9200/_analyze" \
228+
-H "Content-Type: application/json" \
229+
-d '{
230+
"analyzer": "ik_smart",
231+
"text": "自然语言处理技术在人工智能领域的应用越来越广泛"
232+
}'
233+
```
234+
235+
Expected output:
236+
237+
```json
238+
{
239+
"tokens": [
240+
{ "token": "自然语言", "start_offset": 0, "end_offset": 4, "type": "CN_WORD", "position": 0 },
241+
{ "token": "处理", "start_offset": 4, "end_offset": 6, "type": "CN_WORD", "position": 1 },
242+
{ "token": "技术", "start_offset": 6, "end_offset": 8, "type": "CN_WORD", "position": 2 },
243+
{ "token": "", "start_offset": 8, "end_offset": 9, "type": "CN_CHAR", "position": 3 },
244+
{ "token": "人工智能", "start_offset": 9, "end_offset": 13, "type": "CN_WORD", "position": 4 },
245+
{ "token": "领域", "start_offset": 13, "end_offset": 15, "type": "CN_WORD", "position": 5 },
246+
{ "token": "", "start_offset": 15, "end_offset": 16, "type": "CN_CHAR", "position": 6 },
247+
{ "token": "应用", "start_offset": 16, "end_offset": 18, "type": "CN_WORD", "position": 7 },
248+
{ "token": "越来越", "start_offset": 18, "end_offset": 21, "type": "CN_WORD", "position": 8 },
249+
{ "token": "广泛", "start_offset": 21, "end_offset": 23, "type": "CN_WORD", "position": 9 }
250+
]
251+
}
252+
```
253+
254+
## Use IK Analyzer in an Index Mapping
255+
256+
When creating an index, specify `ik_max_word` or `ik_smart` as the analyzer for Chinese text fields:
257+
258+
```bash
259+
curl -k -u admin:admin -X PUT "https://localhost:9200/my-index" \
260+
-H "Content-Type: application/json" \
261+
-d '{
262+
"settings": {
263+
"analysis": {
264+
"analyzer": {
265+
"ik_max_word_analyzer": {
266+
"type": "ik_max_word"
267+
},
268+
"ik_smart_analyzer": {
269+
"type": "ik_smart"
270+
}
271+
}
272+
}
273+
},
274+
"mappings": {
275+
"properties": {
276+
"title": {
277+
"type": "text",
278+
"analyzer": "ik_max_word",
279+
"search_analyzer": "ik_smart"
280+
},
281+
"content": {
282+
"type": "text",
283+
"analyzer": "ik_max_word",
284+
"search_analyzer": "ik_smart"
285+
}
286+
}
287+
}
288+
}'
289+
```
290+
291+
:::note
292+
Using `ik_max_word` for indexing and `ik_smart` for search is a common pattern: it maximizes recall at index time while keeping search queries precise.
293+
:::
294+
295+
## (Optional) Mount a Custom Dictionary
296+
297+
The IK Analyzer supports custom word dictionaries and stop-word lists via `IKAnalyzer.cfg.xml`. To mount a custom dictionary into the cluster, use `additionalVolumes` with a ConfigMap.
298+
299+
### Step 1: Create the ConfigMap
300+
301+
Prepare your custom dictionary files and create a ConfigMap. The following example adds a custom word list:
302+
303+
```bash
304+
# custom_dict.dic — one word per line
305+
cat > custom_dict.dic << 'EOF'
306+
云原生
307+
容器编排
308+
服务网格
309+
EOF
310+
311+
# IKAnalyzer.cfg.xml — reference the custom dictionary
312+
cat > IKAnalyzer.cfg.xml << 'EOF'
313+
<?xml version="1.0" encoding="UTF-8"?>
314+
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
315+
<properties>
316+
<comment>IK Analyzer Extended Configuration</comment>
317+
<!-- Custom extended dictionary; separate multiple files with ; -->
318+
<entry key="ext_dict">custom_dict.dic</entry>
319+
<!-- Custom stop-word dictionary; separate multiple files with ; -->
320+
<entry key="ext_stopwords"></entry>
321+
</properties>
322+
EOF
323+
324+
kubectl -n <namespace> create configmap ik-custom-dict \
325+
--from-file=custom_dict.dic \
326+
--from-file=IKAnalyzer.cfg.xml
327+
```
328+
329+
### Step 2: Mount the ConfigMap via additionalVolumes
330+
331+
Add the `additionalVolumes` section to `spec.general` in your `OpenSearchCluster` CR:
332+
333+
```yaml
334+
spec:
335+
general:
336+
pluginsList:
337+
- "https://release.infinilabs.com/analysis-ik/stable/opensearch-analysis-ik-2.19.3.zip"
338+
additionalVolumes:
339+
- name: ik-custom-dict
340+
path: /usr/share/opensearch/plugins/analysis-ik/config
341+
restartPods: true # Restart pods when ConfigMap content changes
342+
configMap:
343+
name: ik-custom-dict
344+
```
345+
346+
After applying, pods will restart and pick up the new dictionary. Verify by running an `_analyze` request with your custom terms.
347+
348+
## References
349+
350+
- [opensearch-operator: Adding Plugins](https://github.com/opensearch-project/opensearch-k8s-operator/blob/v2.8.0/docs/userguide/main.md#adding-plugins)
351+
- [opensearch-operator: Additional Volumes](https://github.com/opensearch-project/opensearch-k8s-operator/blob/v2.8.0/docs/userguide/main.md#additional-volumes)
352+
- [IK Analyzer for OpenSearch (Infinilabs)](https://github.com/infinilabs/analysis-ik)

0 commit comments

Comments
 (0)