Deleting a Vector so that it is recognized as an oov #11939
Replies: 1 comment
-
This is not the case - when you call
When you set a vector for a word, it's no longer OOV. OOV for tokens is defined as not having a vector set. The vocab isn't designed with removing words in mind. In this case the easiest thing is for you to not set the vector. More generally, removing vectors is going to be kind of complicated, so if you needed to do that you'd be better off using a separate user attribute on the token, or a separate dictionary of "known unknowns". |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am trying to identify an out of vocabulary word (oov), find the most similar word to it, and replace the old word with the most similar word. I use the "is_oov" function from Spacy. In order to find the most similar word, I need to first set a vector for the oov (correct me if I am wrong here). But after finding the most similar, I would like to reset that vector back to "zero" so it is recognized as an oov if it is seen again. I try to do this manually via "nlp.vocab.set_vector(token.text, 0)". If I then print out the vector, it is a zero vector. However, for some reason it is not recognized as an oov anymore, though it "looks" the same as when it first was processed by an oov.
Is there a way to reset a vector so that it is recognized as an oov once it has been given a vector representation?
Beta Was this translation helpful? Give feedback.
All reactions