-
-
Notifications
You must be signed in to change notification settings - Fork 44
Closed
Description
Morpheme.split method seems returns empty morpheme list when the morpheme does not has split information.
It should return a list contains itself in that case.
reproduce:
tok = sudachipy.Dictionary().create()
ms = tok.tokenize("ε½δΌθ°δΊε ει§
γ§ιγγ")
[m.split(mode="a") for m in ms]
outputs:
[<MorphemeList[
<Morpheme(ε½δΌ, 0:2, (0, 364210))>,
<Morpheme(θ°δΊ, 2:4, (0, 686966))>,
<Morpheme(ε , 4:5, (0, 368464))>,
<Morpheme(ε, 5:6, (0, 318425))>,
<Morpheme(ι§
, 6:7, (0, 755333))>,
]>, <MorphemeList[
]>, <MorphemeList[
]>]
should be:
[<MorphemeList[
<Morpheme(ε½δΌ, 0:2, (0, 364210))>,
<Morpheme(θ°δΊ, 2:4, (0, 686966))>,
<Morpheme(ε , 4:5, (0, 368464))>,
<Morpheme(ε, 5:6, (0, 318425))>,
<Morpheme(ι§
, 6:7, (0, 755333))>,
]>, <MorphemeList[
<Morpheme(γ§, ...)>,
]>, <MorphemeList[
<Morpheme(ιγγ, ...)>,
]>]
Metadata
Metadata
Assignees
Labels
No labels