Skip to content

Non-negative matrix factorizationΒ #44

@michalovadek

Description

@michalovadek

I have been using non-negative matrix factorization (NMF) for topic modelling (as an alternative to LDA) for a while now, but so far I have not been able to find a good R package for this. In my limited experience, the NMF package is a bit of a mess that does not work properly due to being heavily spiked with Bioconductor dependencies and when I did manage to make it work, it seemed slow. The other two packages that can do NMF are NMFN and rNMF. I have found both to be rather slow.

My solution so far has been to use reticulate:

library(reticulate)

use_condaenv("r-reticulate")

sklearn <- import("sklearn")

decomp <- py_run_string("from sklearn import decomposition")

model <- decomp$decomposition$NMF(init="nndsvd", n_components= as.integer(15),
                                   random_state = as.integer(23))

W = model$fit_transform(your_matrix)
H = model$components_

This works well, but native R support would be obviously better. I don't know how difficult it would be to port the Python solution to R or optimize the existing packages, but I thought I would raise this here in case you thought this was a worthy addition to the quanteda.textmodels family. I read your discussion about supporting LDA, but I think the way NMF works is somewhat more conducive to being directly supported here (plus the fact that unlike LDA, there aren't good alternatives out there).

Greene and Cross 2017 take this a step further (and generally make the case for NMF topic modelling), but for starters a fast NMF decomposer that actually works (with text data) would be nice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions