Addition of "entity_ruler" in spacy 3.2 - Portuguese #10232
Replies: 2 comments 2 replies
-
Hi, welcome to spaCy and to the forums! It would be helpful to us if you format your post with appropriate markdown formatting. It makes your question easier to understand, so we can help you quicker. As it's your first post, I've gone ahead and formatted your question myself, but please be mindful of this in the future.
I'd also like to point out that definitely not everyone on the spaCy team (or in the community) are men, so you might want to make sure to use more inclusive phrasing ;-) That said, we'll have a look at your question and get back to you! |
Beta Was this translation helpful? Give feedback.
-
@pmoniz7 Thanks for the question. I think the issue is that the document is being tokenized differently than how you expect it to and the To debug match rules, you can always print out the tokens and any attributes you want to check from the doc, for example: for token in doc:
print(token.orth_, token.shape_) If you do this, you'll see the
I had to double check this myself on the token attribute docs, but it is correct - since this is a single token, the "d" in the shape is truncated after 4 repeats until the next unique character ( |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
good morning gentlemen!
I would like to have your help. I'm new to Spacy and I have countless (millions) documents in Portuguese and like most of these documents have several important metadata. for example "CPF" , "TELEFONE" and DATE in dd/mm/yyyy format are strings that have numbers , "-" , and "/" and I would like to classify them as new entities to be able to capture them.
I tested it in English and it worked perfectly, but when I use Portuguese, spacy doesn't recognize it.
Am I doing the thing wrong? I think I'm doing that I'm following what's in the documentation.
Please, does anyone have any clues what I'm doing wrong?
Thanks in advance for the help
Below is the code I am using:
Beta Was this translation helpful? Give feedback.
All reactions