Skip to content
Discussion options

You must be logged in to vote

Two issues here:

  1. I noticed recently that the vectors distributed with the German models have a lot of missing vocabulary, so I wouldn't recommend using the de_core_news_md vectors.

    >>> for t in sentenceA:
    ...   print(t.text, t.has_vector)
    ... 
    Das True
    System True
    muss True
    10 False
    Bierkisten False
    transportieren True
    können True
    . False
    

    Alternate vector sources can have different tokenizations which can cause minor problems, but it's pretty easy to use spacy init-model to create a spacy model with vectors from word2vec text format (not binary format!) vectors. As an example using fasttext vectors (from https://fasttext.cc/docs/en/crawl-vectors.html / German: https://dl.fbaipublicfile…

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by ines
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / vectors Feature: Word vectors and similarity
2 participants