You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 9, 2023. It is now read-only.
SudachiPy is a Python version of [Sudachi](https://github.com/WorksApplications/Sudachi), a Japanese morphological analyzer.
8
8
9
-
Sudachi & SudachiPy are developed in [WAP Tokushima Laboratory of AI and NLP](http://nlp.worksap.co.jp/), an institute under [Works Applications](http://www.worksap.com/) that focuses on Natural Language Processing (NLP).
10
9
11
-
**Warning: some functions are still incompatible with Java Sudachi.**
12
-
13
-
## Easy Setup
14
-
15
-
### Step 1: Install SudachiPy
16
-
17
-
SudachiPy is distributed from PyPI. You can install SudachiPy by executing `pip install SudachiPy` from the command line.
10
+
## TL;DR
18
11
19
12
```bash
20
-
$ pip install SudachiPy
13
+
$ pip install sudachipy sudachidict_core
14
+
15
+
$ echo"高輪ゲートウェイ駅"| sudachipy
16
+
高輪ゲートウェイ駅 名詞,固有名詞,一般,*,*,* 高輪ゲートウェイ駅
17
+
EOS
18
+
19
+
$ echo"高輪ゲートウェイ駅"| sudachipy -m A
20
+
高輪 名詞,固有名詞,地名,一般,*,* 高輪
21
+
ゲートウェイ 名詞,普通名詞,一般,*,*,* ゲートウェー
22
+
駅 名詞,普通名詞,一般,*,*,* 駅
23
+
EOS
24
+
25
+
$ echo"空缶空罐空きカン"| sudachipy -a
26
+
空缶 名詞,普通名詞,一般,*,*,* 空き缶 空缶 アキカン 0
27
+
空罐 名詞,普通名詞,一般,*,*,* 空き缶 空罐 アキカン 0
28
+
空きカン 名詞,普通名詞,一般,*,*,* 空き缶 空きカン アキカン 0
29
+
EOS
21
30
```
22
31
23
-
SudachiPy(>=v0.3.0) refers to system.dic of SudachiDict_core (not included in SudachiPy) package by default.
24
-
Please proceed to Step 2 to install the dict package.
32
+
## Setup
25
33
26
-
### Step 2: Get The Dictionary
34
+
You need SudachiPy and a dictionary.
27
35
28
-
You can install a dictionary as a Python package. It make take a while to download the dictionary file (around 70MB for the `core` edition).
36
+
### Step 1. Install SudachiPy
29
37
30
38
```bash
31
-
$ pip install sudachidict_core
39
+
$ pip install sudachipy
32
40
```
33
41
34
-
Alternatively, you can choose other editions of the dictionary. There are three editions, namely, `small`, `core`, and `full`. See [WorksApplications/SudachiDict](https://github.com/WorksApplications/SudachiDict) for the detail.
42
+
### Step 2. Get a Dictionary
35
43
36
-
You need to specify the dictionary with the `link -t` command.
44
+
You can get dictionary as a Python package. It make take a while to download the dictionary file (around 70MB for the `core` edition).
37
45
38
46
```bash
39
-
$ pip install sudachidict_small
40
-
$ sudachipy link -t small
47
+
$ pip install sudachidict_core
41
48
```
42
49
43
-
```bash
44
-
$ pip install sudachidict_full
45
-
$ sudachipy link -t full
46
-
```
50
+
Alternatively, you can choose other dictionary editions. See [this section](#dictionary-edition) for the detail.
47
51
48
-
## Usage
49
52
50
-
###As a command
53
+
##Usage: As a command
51
54
52
-
After installing SudachiPy, you may also use it in the terminal via command `sudachipy`.
55
+
There is a CLI command `sudachipy`.
53
56
54
-
You can excute `sudachipy` with standard input by this way:
55
57
```bash
56
-
$ sudachipy
58
+
$ echo"外国人参政権"| sudachipy
59
+
外国人参政権 名詞,普通名詞,一般,*,*,* 外国人参政権
60
+
EOS
61
+
$ echo"外国人参政権"| sudachipy -m A
62
+
外国 名詞,普通名詞,一般,*,*,* 外国
63
+
人 接尾辞,名詞的,一般,*,*,* 人
64
+
参政 名詞,普通名詞,一般,*,*,* 参政
65
+
権 接尾辞,名詞的,一般,*,*,* 権
66
+
EOS
57
67
```
58
68
59
-
`sudachipy` has 4 subcommands (default: `tokenize`)
(With `20200330``core` dictionary. The results may change when you use other versions)
175
172
176
-
You can download and install the built dictionaries from [Python packages · WorksApplications/SudachiDict](https://github.com/WorksApplications/SudachiDict#python-packages).
173
+
174
+
## Dictionary Edition
175
+
176
+
There are three editions of Sudachi Dictionary, namely, `small`, `core`, and `full`. See [WorksApplications/SudachiDict](https://github.com/WorksApplications/SudachiDict) for the detail.
177
+
178
+
SudachiPy uses `sudachidict_core` by default. You can specify the dictionary with the `link -t` command.
177
179
178
180
```bash
179
-
$ pip install SudachiDict_full-20190718.tar.gz
181
+
$ pip install sudachidict_small
182
+
$ sudachipy link -t small
180
183
```
181
184
182
-
You can change the default dict package by executing link command.
183
-
184
185
```bash
186
+
$ pip install sudachidict_full
185
187
$ sudachipy link -t full
186
188
```
187
189
188
-
You can remove default dict setting.
190
+
You can remove the dictionary link with the `link -u` commnad.
189
191
190
192
```bash
191
193
$ sudachipy link -u
192
194
```
193
195
194
-
## Customized dictionary
196
+
Dictionaries are installed as Python packages `sudachidict_small`, `sudachidict_core`, and `sudachidict_full`. SudachiPy tries to refer `sudachidict` package to use a dictionary. The `link` subcommand creates *a symbolic link* of `sudachidict_*` as `sudachidict`, to switch the packages.
place [sudachi.json](https://github.com/WorksApplications/Sudachi/blob/develop/src/main/resources/sudachi.json) to anywhere you like,
198
-
and overwrite `systemDict` value with the relative path from `sudachi.json` to your `system.dic`.
202
+
The dictionary files are not in the package itself, but it is downloaded upon installation.
203
+
204
+
### Dictionary in The Setting File
205
+
206
+
Alternatively, if the dictionary file is specified in the setting file, `sudachi.json`, SudachiPy will use that file.
199
207
200
208
```
201
209
{
@@ -204,42 +212,109 @@ and overwrite `systemDict` value with the relative path from `sudachi.json` to y
204
212
}
205
213
```
206
214
207
-
Then you can specify `sudachi.json` with `-r` option.
215
+
The default setting file is [sudachipy/resources/sudachi.json](https://github.com/WorksApplications/SudachiPy/blob/develop/sudachipy/resources/sudachi.json). You can specify your `sudachi.json` with the `-r` option.
216
+
208
217
```bash
209
218
$ sudachipy -r path/to/sudachi.json
210
219
```
211
220
212
-
In the end, we would like to make a flow to get these resources via the code, like [NLTK](https://www.nltk.org/data.html) (e.g., `import nltk; nltk.download()`) or [spaCy](https://spacy.io/usage/models) (e.g., `$python -m spacy download en`).
213
221
214
-
## User defined Dictionary
222
+
## User Dictionary
215
223
216
-
If you need to apply customized user dictionary, `user.dic`,
217
-
place [sudachi.json](https://github.com/WorksApplications/Sudachi/blob/develop/src/main/resources/sudachi.json) to anywhere you like,
218
-
and add `userDict` value with the relative path from `sudachi.json` to your `user.dic`.
224
+
To use a user dictionary, `user.dic`, place [sudachi.json](https://github.com/WorksApplications/SudachiPy/blob/develop/sudachipy/resources/sudachi.json) to anywhere you like, and add `userDict` value with the relative path from `sudachi.json` to your `user.dic`.
219
225
220
-
```
226
+
```js
221
227
{
222
228
"userDict": ["relative/path/to/user.dic"],
223
229
...
224
230
}
225
231
```
226
232
227
-
Also, you can build user dictionary with sub-command `ubuild`.
233
+
Then specify your `sudachi.json`with the `-r` option.
228
234
229
-
About file format, see [here](https://github.com/WorksApplications/Sudachi/blob/develop/docs/user_dict.md)
230
-
(written in Japanese, English document is unavailable now)
235
+
```bash
236
+
$ sudachipy -r path/to/sudachi.json
237
+
```
238
+
239
+
240
+
You can build a user dictionary with the subcommand `ubuild`.
-d string description comment to be embedded on dictionary
256
+
-o file output file (default: user.dic)
257
+
-s file system dictionary (default: linked system_dic, see link -h)
258
+
```
259
+
260
+
About the dictionary file format, please refer to [this document](https://github.com/WorksApplications/Sudachi/blob/develop/docs/user_dict.md) (written in Japanese, English version is not available yet).
231
261
232
-
## For developer
233
262
234
-
### Code format
263
+
## Customized System Dictionary
235
264
236
-
You can use `./scripts/format.sh` and check if your code is in rule. `flake8` `flake8-import-order` `flake8-buitins` is required. See `requirements.txt`
-d string description comment to be embedded on dictionary
278
+
279
+
required named arguments:
280
+
-m file connection matrix file with MeCab's matrix.def format
281
+
```
282
+
283
+
To use your customized `system.dic`, place [sudachi.json](https://github.com/WorksApplications/SudachiPy/blob/develop/sudachipy/resources/sudachi.json) to anywhere you like, and overwrite `systemDict` value with the relative path from `sudachi.json` to your `system.dic`.
284
+
285
+
```
286
+
{
287
+
"systemDict" : "relative/path/to/system.dic",
288
+
...
289
+
}
290
+
```
291
+
292
+
Then specify your `sudachi.json` with the `-r` option.
293
+
294
+
```bash
295
+
$ sudachipy -r path/to/sudachi.json
296
+
```
297
+
298
+
299
+
## For Developers
300
+
301
+
### Code Format
302
+
303
+
Run `scripts/format.sh` to check if your code is formatted correctly.
304
+
305
+
You need packages `flake8` `flake8-import-order` `flake8-buitins` (See `requirements.txt`).
237
306
238
307
### Test
239
308
240
-
You can use `./scripts/test.sh` and check if your changes do not cause regression.
309
+
Run `scripts/test.sh` to run the tests.
310
+
241
311
242
312
## Contact
243
313
244
-
We have a Slack workspace for developers and users to ask questions and discuss a variety of topics.
245
-
- https://sudachi-dev.slack.com/ (Please take invitation from [here](https://join.slack.com/t/sudachi-dev/shared_invite/enQtMzg2NTI2NjYxNTUyLTMyYmNkZWQ0Y2E5NmQxMTI3ZGM3NDU0NzU4NGE1Y2UwYTVmNTViYjJmNDI0MWZiYTg4ODNmMzgxYTQ3ZmI2OWU))
314
+
Sudachi and SudachiPy are developed by [WAP Tokushima Laboratory of AI and NLP](http://nlp.worksap.co.jp/).
315
+
316
+
Open an issue, or come to our Slack workspace for questions and discussion.
0 commit comments