Data Augmentation for NER #7500
Replies: 2 comments 5 replies
-
So there are several questions here, but to address a few of them:
NER does not use
Since the token shape is already capturing this variation I wouldn't expect this to have a large effect.
You're trying to use the I think that given those points that addresses the overall thrust of your question, but if there's something I missed let me know. |
Beta Was this translation helpful? Give feedback.
-
@architectures = "spacy.MultiHashEmbed.v1"
width = 96
attrs = ["NORM","PREFIX","SUFFIX","SHAPE","IS_DIGIT"]
rows = [5000,2500,2500,2500,100]
include_static_vectors = false Because of how Right now it's hard to use a feature like |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am training a NER for digit-valued entities. I would like to understand what is the current mechanics and best practices for such case:
is_digit
orpos
properties as features, and if not, where should one look to enable it?Now I need to add existing entities.
If I modify
Doc
construction according to documentation:I'm getting:
ValueError: [E177] Ill-formed IOB input detected: B
.I might not completely get what the documentation
help(Doc)
is saying:what kind of Unicode strings? I wasn't able to find any example on the web.
As a workaround, I tried this custom cython function:
but when I try to run it on my entity, it breaks:
The actual values are:
It looks similar to this issue, but that one is locked and there seems no solution.
Beta Was this translation helpful? Give feedback.
All reactions