Skip to content

Problem about input embeddings generated by other algo. #2

@geekinglcq

Description

@geekinglcq

Hi, I noticed that in your paper 6.1, as the inefficiency of optimizing likelihood function including both Z and V, you choose to divide the process into two stages. First, get word embeddings and then take them as input in the second stage.

I wonder if it's ok when I input embeddings generated by other algorithm (e.g. word2vec ) instead of PSDvec.

I've tried it and got some wried results. My corpus includes 10000 docs that contains 3223788 validated words. The embedding as input is generated using w2v.

In iter1, loglike is 1.3e11, iter2 0.7e11, and as the process continues, the loglike keep decrease. Hence the best result always occurs after the first iterator instead of the last round. However, the output is quite reasonable based on "Most relevant words", but the strange behaviour of likelihood really bothers me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions