Trying to increase accuracy with floret #10391
Replies: 4 comments 5 replies
-
Thanks for the note about the warning, that was a mistake in It's hard to say exactly why you're not seeing a lot of improvement. I would definitely train on more text than is used by default in the demo project, which was intended to run relatively quickly for demo purposes. I'd try the full OSCAR subset for Turkish if you have the space/time to process it, which might mean it takes a couple hours to download/tokenize the corpus and closer to a day to train the vectors, depending largely on your CPU and Also consider testing with larger We'd be interested in hearing about the results if you find some particular good (or bad) settings! |
Beta Was this translation helpful? Give feedback.
-
Thanks @adrianeboyd , I will try with larger text and arrange parameters. I hope will have good results to report back here... By the way I though text size is enough since you got result you stated here (https://spacy.io/usage/v3-2#vectors) with demo project. Maybe problem is the quality of corpus... |
Beta Was this translation helpful? Give feedback.
-
Hello again, I produced to vector model for cbow and skipgram with following parameters and I got almost save accuracy (%91-92) for these two training (one for cbow and another for skipgram) which is again almost same with previous trainings. Much bigger corpus and different parameters did not give and meaningful change which leads me to think I am doing something wrong.
By the way, the reason I could not produce bigger corpus is that, increasing max_texts do not change anything if you do not have too much RAM. I ranted a machine with 16GB/8 core. But after producing 1 GB corpus, app crashed without error, and every time I had restart it again skipping first part like [dataset.skip(1280609 + 328750 + 411257)] (I have check if I duplicate the corpus) Is it possible there is something wrong with vector model? When I run Another thing is here I do not understand, when I run
|
Beta Was this translation helpful? Give feedback.
-
Thanks @adrianeboyd for advises and explanation on floret. I guess there isn't much to try. I will continue with transformer model. Though can't get more then %92 with it as well... |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I wanted to give floret a try after examining results here: https://spacy.io/usage/v3-2#vectors
First I trained a model with this dataset without word vectors with the following result:
https://github.com/UniversalDependencies/UD_Turkish-Kenet
Then I tried with floret vectors trained with parameters from example here: https://github.com/explosion/projects/tree/v3/pipelines/floret_fi_core_demo
The only difference I could find in the config (from my first try end Finnish example) - no major changes:
As you can see from the results, there is no meaningful change... I was expecting more since I was trying with another what is called agglutinative language...
Is there anything else I could try (parameters or data set ) to see at least 3 point increase like in Finnish example ?
Edit: Another thing, maybe it is related, although I can get similarity result, there is a warning:
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions