add performance

js1010 · js1010 · commit 955a3c8c7095 · 2021-02-16T00:03:08.000+09:00
diff --git a/README.md b/README.md
@@ -29,6 +29,18 @@ python setup.py install
 
 ### Performance
 
+- [AWS P3 2xlarge instance](https://aws.amazon.com/ec2/instance-types/p3/) is used to the experiment. (One Tesla V100 GPU with 8 vcpus)
+- results can be reproduced by running `examples/example_w2v.py` and `examples/example_lda.py`
+- To evaluate w2v model, we used `evaluate_word_pairs` function ([ref link](https://radimrehurek.com/gensim/auto_examples/tutorials/run_word2vec.html#evaluating)) in gensim, note that better performance on WS-353 test set does not mean that the model will workbetter in application as desribed on the link. However, it is good to be measured quantitively and fast training time will be at least very objective measure of performaance.
+  - I trained W2V model on quora-duplicat-questions dataset from gensim downloader api with cusim and the performance with gensim.
+- To evaluate LDA model, I think there is no good way to measure the quality of traing results quantitatively. But we can check the model by looking at the top words of each topic. Also, we can compare the training time here.
+- W2V (CBOW, negative sampling)
+
+| attr          |   1 workers |   2 workers |   4 workers |   8 workers |      GPU |
+|:--------------|------------:|------------:|------------:|------------:|---------:|
+| training_time |  181.009    |  102.302    |   58.9811   |   47.7482   | 9.60324  |
+| pearson       |    0.203882 |    0.207705 |    0.221758 |    0.198408 | 0.331749 |
+| spearman      |    0.25208  |    0.254706 |    0.275231 |    0.238611 | 0.295346 |
 
 ### Future tasks