What is the best way to add new words to model? #9714
-
I'm using the es_core_news_lg model to do textcat task What is the best way to vectorize new words? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @info2000 ,
To do this you need to train the vectors. However at this stage, a 3% out-of-vocab training data is still low. You might want to check first why there are many whitespace tokens in your dataset (note that the top 3 common "words" are all whitespaces). Perhaps clean the data a bit more and see how it works :) |
Beta Was this translation helpful? Give feedback.
Hi @info2000 ,
To do this you need to train the vectors. However at this stage, a 3% out-of-vocab training data is still low. You might want to check first why there are many whitespace tokens in your dataset (note that the top 3 common "words" are all whitespaces). Perhaps clean the data a bit more and see how it works :)