Bug: The different punctuation at the end of a sentense lead analysis results wrong. #7778

qingyun1988 · 2021-04-14T03:40:02Z

qingyun1988
Apr 14, 2021

How to reproduce the behaviour

There is a sentence "Whose bike is broken?". However, the analysis result is wrong.
If I added a Chinese question mark "？" to the end of the sentence, the analysis result would be correct!
By multiple tests, I found that If I added different punctuation to the end of the sentence the analysis results would be different.
These different punctuations are "?", "？", "." and none.
The details are in the attached files. Please have a look at it.
How can I resolve the bug? thank you!

Your Environment

Operating System: windows 10
Python Version Used: v3.7.8
spaCy Version Used: v3.0 and the model version is en_core_web_trf 3.0
Environment Information:

polm · 2021-04-14T06:25:36Z

polm
Apr 14, 2021

First, note that the results of the small/medium/large models match the output you list as 3, so this is specific to the transformers model.

Second, note that errors on specific individual sentences are inevitable given the way statistical models work, so it doesn't make sense to consider this a bug. You can read more about that in #3052.

Third, I don't think either of these parses is wrong, though the Transformers one is better.

The parse with the passive construction is like if the sentence means "whose bike is being broken?". In that sense the original sentence is grammatical English, but it's not normal, it's like Biblical speech or something.

The Transformers parse is more normal, compare it to the parse for "The book is blue".

https://explosion.ai/demos/displacy?text=The%20book%20is%20blue&model=en_core_web_sm&cpu=0&cph=0

2 replies

qingyun1988 Apr 14, 2021
Author

@

polm Apr 17, 2021

So on examination, I didn't notice the subj/attr difference. Using attr instead of subj for a noun before the verb is kind of weird. On the other hand, even so it's better than the passive construction.

qingyun1988 · 2021-04-14T10:20:52Z

qingyun1988
Apr 14, 2021
Author

Thank you for your reply.

I think this sentence "Whose bike is broken?" is very normal and simple. So, it isn't a particular and individual one.

However, the analysis result in the "en_core_web_trf v3.0" model lacks "subj".

The "trf" model is considered more accurate, but this output on this sentence is out of expectation.

What's more, why these analysis results are varied is only because of different punctuation attached to the end of the same sentence. It seems very unstable and weird.

So I consider it a bug.

The example "The book is blue" you taken is correct in performing on "https://explosion.ai/demos/displacy". This is because using "sm" model but not "trf" one. It just proves that there is a problem in "trf" model or in Spacy system.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Bug: The different punctuation at the end of a sentense lead analysis results wrong. #7778

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Bug: The different punctuation at the end of a sentense lead analysis results wrong. #7778

Uh oh!

qingyun1988 Apr 14, 2021

How to reproduce the behaviour

Your Environment

Replies: 2 comments · 2 replies

Uh oh!

polm Apr 14, 2021

Uh oh!

qingyun1988 Apr 14, 2021 Author

Uh oh!

polm Apr 17, 2021

Uh oh!

qingyun1988 Apr 14, 2021 Author

qingyun1988
Apr 14, 2021

Replies: 2 comments 2 replies

polm
Apr 14, 2021

qingyun1988 Apr 14, 2021
Author

qingyun1988
Apr 14, 2021
Author