Transformer model fails to recognize simple dates. #10730

IavTavares · 2022-04-29T12:32:32Z

IavTavares
Apr 29, 2022

import spacy
nlp = spacy.load("en_core_web_trf")
string = 'POLICY DETAILS Penod of Cover Your Policy gives you 18/11/21 (Noor) 18/11/22 Comprehensive Sections AB,C,'
doc = nlp(string)
[ent for ent in doc.ents if ent.label_ == "DATE"]

Running the code above will result an empty list, at least on my pc.
Why is that? I have a hard time believing that the transformer model cannot recognize dates in the this simple string...

Answered by koaning

May 2, 2022

The simple reason is probably that the sentence you've given the model is unlike the data on which the model is trained. Your sentence in particular reads as if titles of headers are mixed into the sentence, which might explain what you're seeing.

Here's an alternative sample that does detect a "simple date".

import spacy
nlp = spacy.load("en_core_web_trf")
string = 'My birthday is on 12/12/12'
doc = nlp(string)
[ent for ent in doc.ents]

Note that in this example, if I only read "My birthday is on HIDDENTOKEN" I'm able to guess that "HIDDENTOKEN" might be a date. In the example that you mentioned this is perhaps less obvious and might explain the behaviour.

In general it's good to remembe…

View full answer

koaning · 2022-05-02T08:26:45Z

koaning
May 2, 2022

The simple reason is probably that the sentence you've given the model is unlike the data on which the model is trained. Your sentence in particular reads as if titles of headers are mixed into the sentence, which might explain what you're seeing.

Here's an alternative sample that does detect a "simple date".

import spacy
nlp = spacy.load("en_core_web_trf")
string = 'My birthday is on 12/12/12'
doc = nlp(string)
[ent for ent in doc.ents]

Note that in this example, if I only read "My birthday is on HIDDENTOKEN" I'm able to guess that "HIDDENTOKEN" might be a date. In the example that you mentioned this is perhaps less obvious and might explain the behaviour.

In general it's good to remember that statistical models, even those with a transformer, aren't perfect. They will make the occasional error, especially when it's looking at text that isn't following the same patterns as the training data. If you're certain your dataset has many of these simple date patterns, I might suggest adding a Regex with a pattern matcher.

1 reply

IavTavares May 2, 2022
Author

Hi @koaning I tend to agree with your explanation.
However, I thought that maybe I might have botched the implementation of the model, and forgot something important...
Also, in a comment to this question a person uses the same code, and gets a non-empty list, with a date.
I'm not sure how that's possible, since I wasn't retraining it. Has the transformers model been recently updated?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Transformer model fails to recognize simple dates. #10730

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Transformer model fails to recognize simple dates. #10730

Uh oh!

IavTavares Apr 29, 2022

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

koaning May 2, 2022

Uh oh!

IavTavares May 2, 2022 Author

IavTavares
Apr 29, 2022

Replies: 1 comment 1 reply

koaning
May 2, 2022

IavTavares May 2, 2022
Author