Example from https://spacy.io/universe/project/neuralcoref doesn't work for polish #13224

Zydnar · 2024-01-08T10:42:48Z

Zydnar
Jan 8, 2024

How to reproduce the behaviour

Example from https://spacy.io/universe/project/neuralcoref works with english models:

import spacy
import neuralcoref

nlp = spacy.load('en')
neuralcoref.add_to_pipe(nlp)
doc1 = nlp('My sister has a dog. She loves him.')
print(doc1._.coref_clusters)

doc2 = nlp('Angela lives in Boston. She is quite happy in that city.')
for ent in doc2.ents:
    print(ent._.coref_cluster)

Which outputs:

>> python .\spacy_alt.py
[My sister: [My sister, She], a dog: [a dog, him]]
Boston: [Boston, that city]

However if I use either pl_core_news_lg or pl_core_news_sm like that:

import spacy
import neuralcoref
import pl_core_news_lg

#nlp = spacy.load('en_core_web_sm')
nlp = pl_core_news_lg.load()
neuralcoref.add_to_pipe(nlp)
doc1 = nlp('Moja siostra ma psa. Ona go kocha.')
#doc1 = nlp('My sister has a dog. She loves him.')
print(doc1._.coref_clusters)

doc2 = nlp(u'Anna żyje w Krakowie. Jest szczęśliwa w tym mieście.')
#doc2 = nlp('Angela lives in Boston. She is quite happy in that city.')
for ent in doc2.ents:
    print(ent._.coref_cluster)

I get following output:

>> python .\spacy_alt.py
[]
None
None

I was guessing it might be connected to the fact english model is _web_ and polish is _news_ however:

>> python -m spacy download pl_core_web_sm 

✘ No compatible model found for 'pl_core_web_sm' (spaCy v2.3.7).

Your Environment

Operating System: Windows 10 x64
Python Version Used: Python 3.9.6
spaCy Version Used: v2.3.7
Environment Information: most likely irrelevant

Answered by svlandeg

Jan 8, 2024

Hi!

neuralcoref is a plugin originally developed by Huggingface, and as stated on their readme, the pretrained model only works for English, so I'm afraid this is expected behaviour. The trained coref model simply doesn't know how to understand Polish sentences, as it was only trained on English texts.

View full answer

svlandeg · 2024-01-08T15:14:00Z

svlandeg
Jan 8, 2024

Hi!

neuralcoref is a plugin originally developed by Huggingface, and as stated on their readme, the pretrained model only works for English, so I'm afraid this is expected behaviour. The trained coref model simply doesn't know how to understand Polish sentences, as it was only trained on English texts.

1 reply

Zydnar Jan 8, 2024
Author

I think this got me on a right track - looking at readme:

comes with a pre-trained statistical model for English only

However documentation says how to extend / use own model here: https://github.com/huggingface/neuralcoref/blob/master/neuralcoref/train/training.md

Alternative would be using completely different solution provided by StanfordNLP which isn't bad either...

Zydnar · 2024-01-08T17:23:00Z

Zydnar
Jan 8, 2024
Author

I'm closing this thread as initial question has been already answered.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Example from https://spacy.io/universe/project/neuralcoref doesn't work for polish #13224

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Example from https://spacy.io/universe/project/neuralcoref doesn't work for polish #13224

Uh oh!

Zydnar Jan 8, 2024

How to reproduce the behaviour

Your Environment

Replies: 2 comments · 1 reply

Uh oh!

svlandeg Jan 8, 2024

Uh oh!

Zydnar Jan 8, 2024 Author

Uh oh!

Zydnar Jan 8, 2024 Author

Zydnar
Jan 8, 2024

Replies: 2 comments 1 reply

svlandeg
Jan 8, 2024

Zydnar Jan 8, 2024
Author

Zydnar
Jan 8, 2024
Author