You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper/paper.md
+40-40Lines changed: 40 additions & 40 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ affiliations:
15
15
index: 1
16
16
- name: Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, Japan
17
17
index: 2
18
-
date: 8 July 2025
18
+
date: 3 December 2025
19
19
bibliography: paper.bib
20
20
---
21
21
@@ -27,7 +27,7 @@ Despite its broad applicability, NMF becomes computationally prohibitive for lar
27
27
28
28
# Statement of need
29
29
30
-
NMF is a workhorse algorithm for most data science tasks. However, as the size of the data matrix increases, it often becomes too large to fit into memory. In such cases, an out-of-core (OOC) implementation — where only subsets of data stored on disk are loaded into memory for computation — is desirable. Additionally, representing the data in a sparse matrix format, where only non-zero values and their coordinates are stored, is computationally advantageous. Therefore, a NMF implementation that supports both OOC computation and sparse data handling is highly desirable.
30
+
NMF is a workhorse algorithm for most data science tasks. However, as the size of the data matrix increases, it often becomes too large to fit into memory. In such cases, an out-of-core (OOC) implementation — where only subsets of data stored on disk are loaded into memory for computation — is desirable. Additionally, representing the data in a sparse matrix format, where only non-zero values and their coordinates are stored, is computationally advantageous. Therefore, a NMF implementation that supports both OOC computation and sparse data handling is highly desirable (Figure 1).
31
31
32
32
Similar discussions have been made in the context of Principal Component Analysis (PCA), and we have independently developed a Julia package, \texttt{OnlinePCA.jl} [@onlinepcajl]. \texttt{OnlineNMF.jl} is a spin-off version of \texttt{OnlinePCA.jl}, implementing NMF.
33
33
@@ -39,21 +39,19 @@ NMF can be easily reproduced on any machine where Julia is pre-installed by usin
39
39
40
40
## Installation
41
41
42
-
First, install \texttt{OnlinePCA.jl} and \texttt{OnlineNMF.jl} from the official Julia package registry or directly from GitHub:
42
+
First, install \texttt{OnlineNMF.jl} from the official Julia package registry or directly from GitHub:
43
43
44
44
```julia
45
-
# Install OnlinePCA.jl and OnlineNMF.jl from Julia General
Then, write a synthetic data as a CSV file, convert it to a compressed binary format using Zstandard, and prepare summary statistics for PCA. MM format is also supported for sparse matrices.
54
+
Then, write a synthetic data as a CSV file, convert it to a compressed binary format using Zstandard, and prepare summary statistics for PCA. Matrix Market (MM) format is also supported for sparse matrices.
This example demonstrates NMF using the $\alpha$-divergence as the loss function. By setting alpha=2, the objective corresponds to the Pearson divergence. The input data is assumed to be a dense matrix compressed with Zstandard (.zst format).
128
+
This example demonstrates NMF using the $\alpha$-divergence as the loss function (Figure 2). By setting alpha=2, the objective corresponds to the Pearson divergence. The input data is assumed to be a dense matrix compressed with Zstandard (.zst format).
This example performs NMF on a sparse matrix using the $\beta$-divergence. The input is a MM formatted sparse matrix file (.mtx.zst). When beta=1, the loss corresponds to the Kullback-Leibler divergence, and sparse-specific optimization is used internally.
141
+
This example performs NMF on a sparse matrix using the $\beta$-divergence (Figure 3). The input is a MM formatted sparse matrix file (.mtx.zst). When beta=1, the loss corresponds to the Kullback-Leibler divergence, and sparse-specific optimization is used internally.
There are various implementations of NMF [@nntensor; @sklearn; @nmfk] and some of them are OOC-type or sparse-type [@sklearn; @rcppplanc] but \texttt{OnlineNMF.jl} is the only tool that supports both OOC computation and sparse data formats (e.g., MM, BinCOO).
155
+
There are various implementations of NMF [@nntensor; @sklearn; @nmfk] and some of them support OOC computation or sparse data formats [@sklearn; @rcppplanc]. While \texttt{RcppPlanc/PLANC} supports both OOC and R's internal sparse format (dgCMatrix), \texttt{OnlineNMF.jl} is designed to handle language-agnostic sparse formats such as MM and Binary COO (BinCOO), enabling seamless integration with external data pipelines.
157
156
158
157
| Function Name | Language | OOC | Sparse Format |
159
158
|:------ | :----: | :----: | :----: |
@@ -163,5 +162,6 @@ There are various implementations of NMF [@nntensor; @sklearn; @nmfk] and some o
0 commit comments