Preparing text for generating word vectors with Floret #11285
orglce
started this conversation in
Help: Best practices
Replies: 1 comment
-
I would recommend only tokenizing. The static vectors currently look up a token's vector by the token text ( (It's technically possible to have vectors for a token attribute other than |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have been trying to generate my own word vectors with Floret and I was just wondering if there are any recommended preprocessing steps besides tokenization. Would it improve the down-stream accuracy of the pipeline if the text would be
I reckon none of these things would prove beneficial as a part of the whole pipeline (POS, NER, lemmatization...) but I don't know exactly how Spacy uses the vectors under the hood.
Beta Was this translation helpful? Give feedback.
All reactions