Support for FastText word embeddings #2154
Replies: 6 comments
-
We do support changing the It wouldn't let us take advantage of the character features when using the vectors as features in models --- but then, neither would changing the |
Beta Was this translation helpful? Give feedback.
-
Yes it would be possible to to use Changing the |
Beta Was this translation helpful? Give feedback.
-
The ELMO vectors would naturally use the Perhaps we could support back-off logic within the Vocab's |
Beta Was this translation helpful? Give feedback.
-
Indeed for ELMO vectors the If we implement the switch in Vocab's What I thought of was to split the
so we can in each case (word2vec-like or fasttext) use the We can possibly add a new method to return vectors in context (something like |
Beta Was this translation helpful? Give feedback.
-
I think having this type of flexibility would be a massive boon considering the rich variety of embedding types which now exist. Flair embeddings come to mind, alongside BERT and ELMO. |
Beta Was this translation helpful? Give feedback.
-
@samhardyhey the more word embedding models the better |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently, Vectors.__getitem__ returns the vector associated with the given key (https://github.com/explosion/spaCy/blob/master/spacy/vectors.pyx#L88).
This is expected for most word vectors, but it makes the integration of FastText word embeddings difficult. Indeed, the vector of a word is generated by summing the vectors associated with its char n-grams (see the gensim implementation for more details: https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/keyedvectors.py#L1657).
This is convenient as we can generate an embedding for an OOV word if we have a vector for at least one of its char n-grams.
One way to support FastText word embedding would be to let __getitem__ behave differently given a
backend
attribute of the Vectors (which in this case would befasttext
).If you think this feature is worth adding to spaCy, I can participate in the implementation.
Your Environment
Beta Was this translation helpful? Give feedback.
All reactions