Transformer model fails to recognize simple dates. #10730
-
Running the code above will result an empty list, at least on my pc. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
The simple reason is probably that the sentence you've given the model is unlike the data on which the model is trained. Your sentence in particular reads as if titles of headers are mixed into the sentence, which might explain what you're seeing. Here's an alternative sample that does detect a "simple date". import spacy
nlp = spacy.load("en_core_web_trf")
string = 'My birthday is on 12/12/12'
doc = nlp(string)
[ent for ent in doc.ents] Note that in this example, if I only read "My birthday is on HIDDENTOKEN" I'm able to guess that "HIDDENTOKEN" might be a date. In the example that you mentioned this is perhaps less obvious and might explain the behaviour. In general it's good to remember that statistical models, even those with a transformer, aren't perfect. They will make the occasional error, especially when it's looking at text that isn't following the same patterns as the training data. If you're certain your dataset has many of these simple date patterns, I might suggest adding a Regex with a pattern matcher. |
Beta Was this translation helpful? Give feedback.
The simple reason is probably that the sentence you've given the model is unlike the data on which the model is trained. Your sentence in particular reads as if titles of headers are mixed into the sentence, which might explain what you're seeing.
Here's an alternative sample that does detect a "simple date".
Note that in this example, if I only read "My birthday is on HIDDENTOKEN" I'm able to guess that "HIDDENTOKEN" might be a date. In the example that you mentioned this is perhaps less obvious and might explain the behaviour.
In general it's good to remembe…