How to extend word vectors in Spacy? #6061
-
|
Is there a way to extend the existing word vectors? The document here introduces how to load third-party word embeddings. If there's a way to export Spacy word vectors to other format (e.g. GLOVE), we can implement this extension with gensim. I want to keep existing words since I don't want to create a gap between existing and new features built on Spacy. Please advise. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
|
The spacy models only include the plain vectors, which isn't enough information to continue training the model. The provided vectors weren't trained with gensim, but it's similar to the difference between You could potentially train a new model from scratch with your corpus and then align the new vector space with the existing vectors before adding the new words, but it's not something I've tried before so I'm not really sure how well it would work in practice. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks! Is there a way to export all the Spacy word vectors to the format of GLOVE or FastText? |
Beta Was this translation helpful? Give feedback.
-
|
There's not a built-in method, but you can just iterate over the vectors to get the plain text word2vec format: print(nlp.vocab.vectors.n_keys, nlp.vocab.vectors.shape[1])
for word in nlp.vocab.vectors:
print(nlp.vocab.strings[word], " ".join(str(x) for x in nlp.vocab.vectors[word]))There are probably better or faster ways to do the vector printing/formatting with |
Beta Was this translation helpful? Give feedback.
-
|
That's really helpful! Thanks a lot. |
Beta Was this translation helpful? Give feedback.
There's not a built-in method, but you can just iterate over the vectors to get the plain text word2vec format:
There are probably better or faster ways to do the vector printing/formatting with
numpy, but this should work.