Problem about input embeddings generated by other algo.

Hi, I noticed that in your paper 6.1, as the inefficiency of optimizing likelihood function including both **Z** and **V**, you choose to divide the process into two stages. First, get word embeddings and then take them as input in the second stage.

I wonder if it's ok when I input embeddings generated by other algorithm (e.g. word2vec ) instead of PSDvec.  

I've tried it and got some wried results.  My corpus includes 10000 docs that contains 3223788 validated words. The embedding as input is generated using w2v. 

In iter1, loglike is 1.3e11, iter2 0.7e11, and as the process continues, the loglike keep decrease. Hence the best result always occurs after the first iterator instead of the last round. However, the output is quite reasonable based on "Most relevant words", but the strange behaviour of likelihood really bothers me.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem about input embeddings generated by other algo. #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problem about input embeddings generated by other algo. #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions