LDA and ProdLDA

Consider an addmixture model, where each mutation $Y_{ng}\in \{0, 1\}$ is generated from a "topic" $Z_{ng}\in \{H, 1, ..., K\}$, where $H$ is a "healthy" topic, with $P(Y_{ng}=1\mid Z_{ng}=H) \ll 1$.

Then, we can use an LDA-like model where instead of word positions we have enumerated genes and the vocabulary at each position is $\{0, 1\}$, sampled from the Bernoulli distribution. Hence, the mixing matrix is again $\eta_{kg} = P(Y_g=1\mid Z_g=k)$ and is interpretable (as it can be made sparse using e.g., $\mathrm{Beta}(0.1, 0.1)$ distribution).

Inference in LDA and closely-related [ProdLDA](https://num.pyro.ai/en/stable/examples/prodlda.html) can be implemented e.g., in [NumPyro](https://num.pyro.ai/en/stable/examples/prodlda.html).

This task should be split into several smaller tasks, for example:
- Simulate data sets according to LDA and ProdLDA models.
- Experiment with the implementation provided. See whether simulations match the results.
- If the results are satisfactory, incorporate LDA and ProdLDA into the codebase.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LDA and ProdLDA #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LDA and ProdLDA #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions