Skip to content

Commit 3d47fa6

Browse files
committed
update to support es 2.0
1 parent a60059f commit 3d47fa6

File tree

18 files changed

+309
-190
lines changed

18 files changed

+309
-190
lines changed

README.md

Lines changed: 22 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,15 @@ IK Analysis for ElasticSearch
33

44
The IK Analysis plugin integrates Lucene IK analyzer into elasticsearch, support customized dictionary.
55

6-
Tokenizer: `ik`
7-
8-
更新:对于使用 ES 集群,用 IK 作为分词插件,经常会修改自定义词典的使用者,可以透过远程加载的方式,每次更新都会重新加载词典,不必重启 ES 服务。
6+
Analyzer: `ik_smart` , `ik_max_word` , Tokenizer: `ik_smart` , `ik_max_word`
97

108
Versions
119
--------
1210

1311
IK version | ES version
1412
-----------|-----------
15-
master | 1.5.0 -> master
13+
master | 2.0.0 -> master
14+
1.5.0 | 2.0.0
1615
1.4.1 | 1.7.2
1716
1.4.0 | 1.6.0
1817
1.3.0 | 1.5.0
@@ -30,108 +29,42 @@ master | 1.5.0 -> master
3029
Install
3130
-------
3231

33-
you can download this plugin from RTF project(https://github.com/medcl/elasticsearch-rtf)
34-
https://github.com/medcl/elasticsearch-rtf/tree/master/plugins/analysis-ik
35-
https://github.com/medcl/elasticsearch-rtf/tree/master/config/ik
36-
37-
<del>also remember to download the dict files,unzip these dict file into your elasticsearch's config folder,such as: your-es-root/config/ik</del>
38-
39-
you need a service restart after that!
40-
41-
Configuration
42-
-------------
43-
44-
### Analysis Configuration
45-
46-
#### `elasticsearch.yml`
47-
48-
```yaml
49-
index:
50-
analysis:
51-
analyzer:
52-
ik:
53-
alias: [ik_analyzer]
54-
type: org.elasticsearch.index.analysis.IkAnalyzerProvider
55-
ik_max_word:
56-
type: ik
57-
use_smart: false
58-
ik_smart:
59-
type: ik
60-
use_smart: true
61-
```
62-
63-
Or
64-
65-
```yaml
66-
index.analysis.analyzer.ik.type : "ik"
67-
```
68-
69-
#### 以上两种配置方式的区别:
32+
1.compile
7033

71-
1、第二种方式,只定义了一个名为 ik 的 analyzer,其 use_smart 采用默认值 false
34+
`mvn package`
7235

73-
2、第一种方式,定义了三个 analyzer,分别为:ik、ik_max_word、ik_smart,其中 ik_max_word 和 ik_smart 是基于 ik 这个 analyzer 定义的,并各自明确设置了 use_smart 的不同值。
36+
copy and unzip `target/release/ik**.zip` to `your-es-root/plugins/ik`
7437

75-
3、其实,ik_max_word 等同于 ik。ik_max_word 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合;而 ik_smart 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。
38+
2.config files:
7639

77-
因此,建议,在设置 mapping 时,用 ik 这个 analyzer,以尽可能地被搜索条件匹配到。
40+
download the dict files,unzip these dict file into your elasticsearch's config folder,such as: `your-es-root/config/ik`
7841

79-
不过,如果你想将 /index_name/_analyze 这个 RESTful API 做为分词器用,用来提取某段文字中的主题词,则建议使用 ik_smart 这个 analyzer:
42+
3.restart elasticsearch
8043

81-
```
82-
POST /hailiang/_analyze?analyzer=ik_smart HTTP/1.1
83-
Host: localhost:9200
84-
Cache-Control: no-cache
85-
86-
中华人民共和国国歌
87-
```
88-
89-
返回值:
90-
91-
```json
92-
{
93-
"tokens" : [ {
94-
"token" : "中华人民共和国",
95-
"start_offset" : 0,
96-
"end_offset" : 7,
97-
"type" : "CN_WORD",
98-
"position" : 1
99-
}, {
100-
"token" : "国歌",
101-
"start_offset" : 7,
102-
"end_offset" : 9,
103-
"type" : "CN_WORD",
104-
"position" : 2
105-
} ]
106-
}
107-
```
44+
Tips:
10845

109-
另外,可以在 elasticsearch.yml 里加上如下一行,设置默认的 analyzer 为 ik:
110-
111-
```yaml
112-
index.analysis.analyzer.default.type : "ik"
113-
```
46+
ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合;
11447

48+
ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。
11549

116-
### Mapping Configuration
11750

11851
#### Quick Example
11952

120-
1. create a index
53+
1.create a index
12154

12255
```bash
12356
curl -XPUT http://localhost:9200/index
12457
```
12558

126-
2. create a mapping
59+
2.create a mapping
12760

12861
```bash
12962
curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
13063
{
13164
"fulltext": {
13265
"_all": {
133-
"indexAnalyzer": "ik",
134-
"searchAnalyzer": "ik",
66+
"indexAnalyzer": "ik_max_word",
67+
"searchAnalyzer": "ik_max_word",
13568
"term_vector": "no",
13669
"store": "false"
13770
},
@@ -140,8 +73,8 @@ curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
14073
"type": "string",
14174
"store": "no",
14275
"term_vector": "with_positions_offsets",
143-
"indexAnalyzer": "ik",
144-
"searchAnalyzer": "ik",
76+
"indexAnalyzer": "ik_max_word",
77+
"searchAnalyzer": "ik_max_word",
14578
"include_in_all": "true",
14679
"boost": 8
14780
}
@@ -150,7 +83,7 @@ curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
15083
}'
15184
```
15285

153-
3. index some docs
86+
3.index some docs
15487

15588
```bash
15689
curl -XPOST http://localhost:9200/index/fulltext/1 -d'
@@ -176,7 +109,7 @@ curl -XPOST http://localhost:9200/index/fulltext/4 -d'
176109
'
177110
```
178111

179-
4. query with highlighting
112+
4.query with highlighting
180113

181114
```bash
182115
curl -XPOST http://localhost:9200/index/fulltext/_search -d'
@@ -193,7 +126,7 @@ curl -XPOST http://localhost:9200/index/fulltext/_search -d'
193126
'
194127
```
195128

196-
#### Result
129+
Result
197130

198131
```json
199132
{
@@ -257,7 +190,7 @@ curl -XPOST http://localhost:9200/index/fulltext/_search -d'
257190
<!--用户可以在这里配置远程扩展字典 -->
258191
<entry key="remote_ext_dict">location</entry>
259192
<!--用户可以在这里配置远程扩展停止词字典-->
260-
<entry key="remote_ext_stopwords">location</entry>
193+
<entry key="remote_ext_stopwords">http://xxx.com/xxx.dic</entry>
261194
</properties>
262195
```
263196

pom.xml

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,29 @@
66
<modelVersion>4.0.0</modelVersion>
77
<groupId>org.elasticsearch</groupId>
88
<artifactId>elasticsearch-analysis-ik</artifactId>
9-
<version>1.4.1</version>
9+
<version>1.5.0</version>
1010
<packaging>jar</packaging>
1111
<description>IK Analyzer for ElasticSearch</description>
1212
<inceptionYear>2009</inceptionYear>
13+
14+
<properties>
15+
<elasticsearch.version>2.0.0</elasticsearch.version>
16+
17+
<elasticsearch.assembly.descriptor>${project.basedir}/src/main/assemblies/plugin.xml</elasticsearch.assembly.descriptor>
18+
<elasticsearch.plugin.classname>org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin</elasticsearch.plugin.classname>
19+
<elasticsearch.plugin.jvm>true</elasticsearch.plugin.jvm>
20+
<tests.rest.load_packaged>false</tests.rest.load_packaged>
21+
<skip.unit.tests>true</skip.unit.tests>
22+
</properties>
23+
1324
<licenses>
1425
<license>
1526
<name>The Apache Software License, Version 2.0</name>
1627
<url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
1728
<distribution>repo</distribution>
1829
</license>
1930
</licenses>
31+
2032
<scm>
2133
<connection>scm:git:[email protected]:medcl/elasticsearch-analysis-ik.git</connection>
2234
<developerConnection>scm:git:[email protected]:medcl/elasticsearch-analysis-ik.git
@@ -30,10 +42,6 @@
3042
<version>7</version>
3143
</parent>
3244

33-
<properties>
34-
<elasticsearch.version>1.7.2</elasticsearch.version>
35-
</properties>
36-
3745
<repositories>
3846
<repository>
3947
<id>oss.sonatype.org</id>
@@ -84,11 +92,6 @@
8492
<version>4.10</version>
8593
<scope>test</scope>
8694
</dependency>
87-
<dependency>
88-
<groupId>org.apache.lucene</groupId>
89-
<artifactId>lucene-core</artifactId>
90-
<version>4.10.4</version>
91-
</dependency>
9295
</dependencies>
9396

9497
<build>
@@ -137,9 +140,6 @@
137140
<mainClass>fully.qualified.MainClass</mainClass>
138141
</manifest>
139142
</archive>
140-
<descriptorRefs>
141-
<descriptorRef>jar-with-dependencies</descriptorRef>
142-
</descriptorRefs>
143143
</configuration>
144144
<executions>
145145
<execution>

src/main/assemblies/plugin.xml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,13 @@
55
<format>zip</format>
66
</formats>
77
<includeBaseDirectory>false</includeBaseDirectory>
8+
<files>
9+
<file>
10+
<source>${project.basedir}/src/main/resources/plugin-descriptor.properties</source>
11+
<outputDirectory></outputDirectory>
12+
<filtered>true</filtered>
13+
</file>
14+
</files>
815
<dependencySets>
916
<dependencySet>
1017
<outputDirectory>/</outputDirectory>

src/main/config/ik.yaml

Whitespace-only changes.

src/main/java/org/elasticsearch/index/analysis/IkAnalysisBinderProcessor.java

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,21 @@
33

44
public class IkAnalysisBinderProcessor extends AnalysisModule.AnalysisBinderProcessor {
55

6-
@Override public void processTokenFilters(TokenFiltersBindings tokenFiltersBindings) {
6+
7+
@Override
8+
public void processTokenFilters(TokenFiltersBindings tokenFiltersBindings) {
79

810
}
911

1012

11-
@Override public void processAnalyzers(AnalyzersBindings analyzersBindings) {
13+
@Override
14+
public void processAnalyzers(AnalyzersBindings analyzersBindings) {
1215
analyzersBindings.processAnalyzer("ik", IkAnalyzerProvider.class);
13-
super.processAnalyzers(analyzersBindings);
1416
}
1517

1618

1719
@Override
1820
public void processTokenizers(TokenizersBindings tokenizersBindings) {
19-
tokenizersBindings.processTokenizer("ik", IkTokenizerFactory.class);
20-
super.processTokenizers(tokenizersBindings);
21+
tokenizersBindings.processTokenizer("ik_tokenizer", IkTokenizerFactory.class);
2122
}
2223
}

src/main/java/org/elasticsearch/index/analysis/IkAnalyzerProvider.java

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
package org.elasticsearch.index.analysis;
22

33
import org.elasticsearch.common.inject.Inject;
4-
import org.elasticsearch.common.inject.assistedinject.Assisted;
54
import org.elasticsearch.common.settings.Settings;
65
import org.elasticsearch.env.Environment;
76
import org.elasticsearch.index.Index;
@@ -12,12 +11,14 @@
1211

1312
public class IkAnalyzerProvider extends AbstractIndexAnalyzerProvider<IKAnalyzer> {
1413
private final IKAnalyzer analyzer;
14+
private boolean useSmart=false;
1515

1616
@Inject
17-
public IkAnalyzerProvider(Index index, @IndexSettings Settings indexSettings, Environment env, @Assisted String name, @Assisted Settings settings) {
17+
public IkAnalyzerProvider(Index index, @IndexSettings Settings indexSettings,Environment env, String name, Settings settings) {
1818
super(index, indexSettings, name, settings);
1919
Dictionary.initial(new Configuration(env));
20-
analyzer=new IKAnalyzer(indexSettings, settings, env);
20+
useSmart = settings.get("use_smart", "false").equals("true");
21+
analyzer=new IKAnalyzer(useSmart);
2122
}
2223

2324
@Override public IKAnalyzer get() {

src/main/java/org/elasticsearch/index/analysis/IkTokenizerFactory.java

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,23 +11,21 @@
1111
import org.wltea.analyzer.dic.Dictionary;
1212
import org.wltea.analyzer.lucene.IKTokenizer;
1313

14-
import java.io.Reader;
15-
1614
public class IkTokenizerFactory extends AbstractTokenizerFactory {
17-
private Environment environment;
18-
private Settings settings;
15+
private final Settings settings;
16+
private boolean useSmart=false;
1917

2018
@Inject
2119
public IkTokenizerFactory(Index index, @IndexSettings Settings indexSettings, Environment env, @Assisted String name, @Assisted Settings settings) {
2220
super(index, indexSettings, name, settings);
23-
this.environment = env;
24-
this.settings = settings;
21+
this.settings=settings;
2522
Dictionary.initial(new Configuration(env));
2623
}
2724

25+
2826
@Override
29-
public Tokenizer create(Reader reader) {
30-
return new IKTokenizer(reader, settings, environment);
31-
}
27+
public Tokenizer create() {
28+
this.useSmart = settings.get("use_smart", "false").equals("true");
3229

30+
return new IKTokenizer(useSmart); }
3331
}

0 commit comments

Comments
 (0)