noun_chunks in zh(Chinese) #10695

JingxinLee · 2022-04-24T02:11:24Z

JingxinLee
Apr 24, 2022

Hi, I am from China and want to train sense2vec from scratch. But I met a problem which is the same as #7436 (comment).

What @polm did was "copy over the English version, prepare a bunch of example sentences, and then modify the function until the results were correct for all my example sentences. You could use a similar approach for Chinese."

I am new to Spacy. Could anyone please tell me how to modify the noun_chunks function? "use the PhraseMatcher or the DependencyMatcher " to get the labels?

What the noun_chunks do is to yield all of the (index of the noun, index+1, np_label), right?

Thanks in advance.

adrianeboyd · 2022-04-25T06:27:23Z

adrianeboyd
Apr 25, 2022

Paul's description of how to develop a noun chunks method in that comment is still accurate (#7436 (comment)). Because Chinese uses UD dependency labels, I'd recommend starting with another language as a starting point, one that uses UD in spacy (any language except English or German) and that has a similar noun chunk structure to Chinese.

The method yields tuples that correspond to the noun chunk span, so if you check doc[start:end] you'd get the whole span of the noun chunk, like "autonomous cars" for the English example: https://spacy.io/usage/linguistic-features/#noun-chunks

There's one additional step to add the noun chunks method to the language defaults, which looks like this in the language defaults in __init__py:

spaCy/spacy/lang/en/__init__.py

Line 17 in e075003

syntax_iterators = SYNTAX_ITERATORS

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

noun_chunks in zh(Chinese) #10695

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

noun_chunks in zh(Chinese) #10695

Uh oh!

JingxinLee Apr 24, 2022

Replies: 1 comment

Uh oh!

adrianeboyd Apr 25, 2022

JingxinLee
Apr 24, 2022

adrianeboyd
Apr 25, 2022