Skip to content

Commit 7337edf

Browse files
committed
Fix for JOSS review (docs and tests)
1 parent 5db7aa7 commit 7337edf

18 files changed

+723
-102
lines changed

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ name = "OnlineNMF"
22
uuid = "e0c94d91-d830-4516-8b46-9a113d37a394"
33
license = "MIT"
44
authors = ["kokitsuyuzaki <koki.tsuyuzaki@gmail.com>"]
5-
version = "0.99.6"
5+
version = "0.99.7"
66

77
[deps]
88
ArgParse = "c7e460c6-2fb9-53a9-8c5b-16f535851c63"

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ Online Non-negative Matrix Factorization
1010
## Description
1111
OnlineNMF.jl performs some online-NMF functions for extreamly large scale matrix.
1212

13+
__Note: The input matrix is supposed to be a non-negative matrix.__
14+
1315
## Algorithms
1416

1517
- Multiplicative Update (MU)
@@ -26,14 +28,12 @@ OnlineNMF.jl performs some online-NMF functions for extreamly large scale matrix
2628
## Installation
2729
<!-- ```julia
2830
julia> using Pkg
29-
julia> Pkg.add(url="https://github.com/rikenbit/OnlinePCA.jl.git")
3031
julia> Pkg.add(url="https://github.com/rikenbit/OnlineNMF.jl.git")
3132
julia> Pkg.add("PlotlyJS")
3233
```
3334
-->
3435
```julia
3536
# push the key "]" and type the following command.
36-
(@julia) pkg> add https://github.com/rikenbit/OnlinePCA.jl
3737
(@julia) pkg> add https://github.com/rikenbit/OnlineNMF.jl
3838
(@julia) pkg> add PlotlyJS
3939
# After that, push Ctrl + C to leave from Pkg REPL mode

paper/paper.bib

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,5 +117,6 @@ @misc{rcppplanc
117117
title = {{RcppPlanc}: R wrapper for the PLANC Nonnegative Matrix Factorization library},
118118
year = {2023},
119119
howpublished = {\url{https://github.com/welch-lab/RcppPlanc}},
120+
doi = {10.32614/CRAN.package.RcppPlanc},
120121
note = {Accessed: 2025-05-01}
121122
}

paper/paper.md

Lines changed: 40 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ affiliations:
1515
index: 1
1616
- name: Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, Japan
1717
index: 2
18-
date: 8 July 2025
18+
date: 3 December 2025
1919
bibliography: paper.bib
2020
---
2121

@@ -27,7 +27,7 @@ Despite its broad applicability, NMF becomes computationally prohibitive for lar
2727

2828
# Statement of need
2929

30-
NMF is a workhorse algorithm for most data science tasks. However, as the size of the data matrix increases, it often becomes too large to fit into memory. In such cases, an out-of-core (OOC) implementation — where only subsets of data stored on disk are loaded into memory for computation — is desirable. Additionally, representing the data in a sparse matrix format, where only non-zero values and their coordinates are stored, is computationally advantageous. Therefore, a NMF implementation that supports both OOC computation and sparse data handling is highly desirable.
30+
NMF is a workhorse algorithm for most data science tasks. However, as the size of the data matrix increases, it often becomes too large to fit into memory. In such cases, an out-of-core (OOC) implementation — where only subsets of data stored on disk are loaded into memory for computation — is desirable. Additionally, representing the data in a sparse matrix format, where only non-zero values and their coordinates are stored, is computationally advantageous. Therefore, a NMF implementation that supports both OOC computation and sparse data handling is highly desirable (Figure 1).
3131

3232
Similar discussions have been made in the context of Principal Component Analysis (PCA), and we have independently developed a Julia package, \texttt{OnlinePCA.jl} [@onlinepcajl]. \texttt{OnlineNMF.jl} is a spin-off version of \texttt{OnlinePCA.jl}, implementing NMF.
3333

@@ -39,21 +39,19 @@ NMF can be easily reproduced on any machine where Julia is pre-installed by usin
3939

4040
## Installation
4141

42-
First, install \texttt{OnlinePCA.jl} and \texttt{OnlineNMF.jl} from the official Julia package registry or directly from GitHub:
42+
First, install \texttt{OnlineNMF.jl} from the official Julia package registry or directly from GitHub:
4343

4444
```julia
45-
# Install OnlinePCA.jl and OnlineNMF.jl from Julia General
46-
julia> Pkg.add("OnlinePCA")
45+
# Install OnlineNMF.jl from Julia General
4746
julia> Pkg.add("OnlineNMF")
4847

4948
# or GitHub for the latest version
50-
julia> Pkg.add(url="https://github.com/rikenbit/OnlinePCA.jl.git")
5149
julia> Pkg.add(url="https://github.com/rikenbit/OnlineNMF.jl.git")
5250
```
5351

5452
## Preprocess of CSV
5553

56-
Then, write a synthetic data as a CSV file, convert it to a compressed binary format using Zstandard, and prepare summary statistics for PCA. MM format is also supported for sparse matrices.
54+
Then, write a synthetic data as a CSV file, convert it to a compressed binary format using Zstandard, and prepare summary statistics for PCA. Matrix Market (MM) format is also supported for sparse matrices.
5755

5856
```julia
5957
using OnlinePCA
@@ -76,12 +74,10 @@ write_csv(joinpath(tmp, "Data.csv"), data)
7674
mmwrite(joinpath(tmp, "Data.mtx"), sparse(data))
7775

7876
# Binarization (Zstandard)
79-
csv2bin(csvfile=joinpath(tmp, "Data.csv"),
80-
binfile=joinpath(tmp, "Data.zst"))
77+
csv2bin(csvfile=joinpath(tmp, "Data.csv"), binfile=joinpath(tmp, "Data.zst"))
8178

8279
# Sparsification (Zstandard + MM format)
83-
mm2bin(mmfile=joinpath(tmp, "Data.mtx"),
84-
binfile=joinpath(tmp, "Data.mtx.zst"))
80+
mm2bin(mmfile=joinpath(tmp, "Data.mtx"), binfile=joinpath(tmp, "Data.mtx.zst"))
8581
```
8682

8783
## Setting for plot
@@ -92,41 +88,44 @@ Define a helper function to visualize the results of NMF using the \texttt{Plotl
9288
using DataFrames
9389
using PlotlyJS
9490

95-
function subplots(resnmf, group)
96-
# data frame
97-
data_left = DataFrame(pc1=resnmf[:,1], pc2=resnmf[:,2], group=group)
98-
data_right = DataFrame(pc2=resnmf[:,2], pc3=resnmf[:,3], group=group)
99-
# plot
100-
p_left = Plot(data_left, x=:nmf1, y=:nmf2, mode="markers",
91+
function subplots(out_nmf, group)
92+
# data frame
93+
data_left = DataFrame(nmf1=out_nmf[1][:,1], nmf2=out_nmf[1][:,2],
94+
group=group)
95+
data_right = DataFrame(nmf2=out_nmf[1][:,2], nmf3=out_nmf[1][:,3],
96+
group=group)
97+
# plot
98+
p_left = Plot(data_left, x=:nmf1, y=:nmf2, mode="markers",
10199
marker_size=10, group=:group)
102-
p_right = Plot(data_right, x=:nmf2, y=:nmf3, mode="markers",
100+
p_right = Plot(data_right, x=:nmf2, y=:nmf3, mode="markers",
103101
marker_size=10,
104-
group=:group, showlegend=false)
105-
p_left.data[1]["marker_color"] = "red"
106-
p_left.data[2]["marker_color"] = "blue"
107-
p_left.data[3]["marker_color"] = "green"
108-
p_right.data[1]["marker_color"] = "red"
109-
p_right.data[2]["marker_color"] = "blue"
110-
p_right.data[3]["marker_color"] = "green"
111-
p_left.data[1]["name"] = "group1"
112-
p_left.data[2]["name"] = "group2"
113-
p_left.data[3]["name"] = "group3"
114-
p_left.layout["title"] = "PC1 vs PC2"
115-
p_right.layout["title"] = "PC2 vs PC3"
116-
p_left.layout["xaxis_title"] = "pc1"
117-
p_left.layout["yaxis_title"] = "pc2"
118-
p_right.layout["xaxis_title"] = "pc2"
119-
p_right.layout["yaxis_title"] = "pc3"
120-
plot([p_left p_right])
102+
group=:group, showlegend=false)
103+
p_left.data[1]["marker_color"] = "red"
104+
p_left.data[2]["marker_color"] = "blue"
105+
p_left.data[3]["marker_color"] = "green"
106+
p_right.data[1]["marker_color"] = "red"
107+
p_right.data[2]["marker_color"] = "blue"
108+
p_right.data[3]["marker_color"] = "green"
109+
p_left.data[1]["name"] = "group1"
110+
p_left.data[2]["name"] = "group2"
111+
p_left.data[3]["name"] = "group3"
112+
p_left.layout["title"] = "Component 1 vs Component 2"
113+
p_right.layout["title"] = "Component 2 vs Component 3"
114+
p_left.layout["xaxis_title"] = "nmf-1"
115+
p_left.layout["yaxis_title"] = "nmf-2"
116+
p_right.layout["xaxis_title"] = "nmf-2"
117+
p_right.layout["yaxis_title"] = "nmf-3"
118+
plot([p_left p_right])
121119
end
122120

123-
group=vcat(repeat(["group1"],inner=33), repeat(["group2"],inner=33),
124-
repeat(["group3"],inner=33))
121+
group=vcat(repeat(["group1"],inner=100),
122+
repeat(["group2"],inner=100),
123+
repeat(["group3"],inner=100))
125124
```
126125

127126
## NMF based on Alpha-Divergence
128127

129-
This example demonstrates NMF using the $\alpha$-divergence as the loss function. By setting alpha=2, the objective corresponds to the Pearson divergence. The input data is assumed to be a dense matrix compressed with Zstandard (.zst format).
128+
This example demonstrates NMF using the $\alpha$-divergence as the loss function (Figure 2). By setting alpha=2, the objective corresponds to the Pearson divergence. The input data is assumed to be a dense matrix compressed with Zstandard (.zst format).
130129

131130
```julia
132131
out_nmf_alpha = nmf(input=joinpath(tmp, "Data.zst"),
@@ -139,7 +138,7 @@ subplots(out_nmf_alpha, group)
139138

140139
## Sparse-NMF based on Beta-Divergence
141140

142-
This example performs NMF on a sparse matrix using the $\beta$-divergence. The input is a MM formatted sparse matrix file (.mtx.zst). When beta=1, the loss corresponds to the Kullback-Leibler divergence, and sparse-specific optimization is used internally.
141+
This example performs NMF on a sparse matrix using the $\beta$-divergence (Figure 3). The input is a MM formatted sparse matrix file (.mtx.zst). When beta=1, the loss corresponds to the Kullback-Leibler divergence, and sparse-specific optimization is used internally.
143142

144143
```julia
145144
out_sparse_nmf_beta = sparse_nmf(input=joinpath(tmp, "Data.mtx.zst"),
@@ -153,7 +152,7 @@ subplots(out_sparse_nmf_beta, group)
153152

154153
# Related work
155154

156-
There are various implementations of NMF [@nntensor; @sklearn; @nmfk] and some of them are OOC-type or sparse-type [@sklearn; @rcppplanc] but \texttt{OnlineNMF.jl} is the only tool that supports both OOC computation and sparse data formats (e.g., MM, BinCOO).
155+
There are various implementations of NMF [@nntensor; @sklearn; @nmfk] and some of them support OOC computation or sparse data formats [@sklearn; @rcppplanc]. While \texttt{RcppPlanc/PLANC} supports both OOC and R's internal sparse format (dgCMatrix), \texttt{OnlineNMF.jl} is designed to handle language-agnostic sparse formats such as MM and Binary COO (BinCOO), enabling seamless integration with external data pipelines.
157156

158157
| Function Name | Language | OOC | Sparse Format |
159158
|:------ | :----: | :----: | :----: |
@@ -163,5 +162,6 @@ There are various implementations of NMF [@nntensor; @sklearn; @nmfk] and some o
163162
| \texttt{NMF.MultUpdate} | Julia | No | - |
164163
| \texttt{sklearn.decomposition.MiniBatchNMF} | Python | Yes | - |
165164
| \texttt{RcppPlanc/PLANC} | R/C++ | Yes | dgCMatrix |
165+
| \texttt{OnlineNMF.jl} | Julia | Yes | MM/BinCOO |
166166

167167
# References

paper/paper.pdf

38 Bytes
Binary file not shown.

src/bincoo_dnmf.jl

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -341,13 +341,29 @@ function init_bincoo_dnmf(
341341
lower::Number,
342342
upper::Number
343343
)
344-
# Type Check
344+
# Initialization
345345
N, M = nm(input)
346346
binu = convert(Float32, binu)
347347
binv = convert(Float32, binv)
348348
teru = convert(Float32, teru)
349349
terv = convert(Float32, terv)
350350
graphv = convert(Float32, graphv)
351+
# Check non-negative parameters
352+
if binu < 0
353+
throw(ArgumentError("binu must be non-negative, got $binu"))
354+
end
355+
if binv < 0
356+
throw(ArgumentError("binv must be non-negative, got $binv"))
357+
end
358+
if teru < 0
359+
throw(ArgumentError("teru must be non-negative, got $teru"))
360+
end
361+
if terv < 0
362+
throw(ArgumentError("terv must be non-negative, got $terv"))
363+
end
364+
if graphv < 0
365+
throw(ArgumentError("graphv must be non-negative, got $graphv"))
366+
end
351367
# Initialization by BinCOO-NMF
352368
out_nmf = bincoo_nmf(
353369
input=input,

src/bincoo_nmf.jl

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -479,7 +479,7 @@ function init_bincoo_nmf(
479479
lower::Number,
480480
upper::Number
481481
)
482-
# Type Check
482+
# Initialization
483483
N, M = nm(input)
484484
alpha = convert(Float32, alpha)
485485
beta = convert(Float32, beta)
@@ -493,6 +493,49 @@ function init_bincoo_nmf(
493493
chunksize = convert(Int64, chunksize)
494494
lower = convert(Float32, lower)
495495
upper = convert(Float32, upper)
496+
# Argument Check
497+
# Check matrix dimensions (single row/column is meaningless for NMF)
498+
if N == 1
499+
throw(ArgumentError("Input matrix has only 1 row. NMF requires at least 2 rows."))
500+
end
501+
if M == 1
502+
throw(ArgumentError("Input matrix has only 1 column. NMF requires at least 2 columns."))
503+
end
504+
# Check non-negative parameters
505+
if graphv < 0
506+
throw(ArgumentError("graphv must be non-negative, got $graphv"))
507+
end
508+
if l1u < 0
509+
throw(ArgumentError("l1u must be non-negative, got $l1u"))
510+
end
511+
if l1v < 0
512+
throw(ArgumentError("l1v must be non-negative, got $l1v"))
513+
end
514+
if l2u < 0
515+
throw(ArgumentError("l2u must be non-negative, got $l2u"))
516+
end
517+
if l2v < 0
518+
throw(ArgumentError("l2v must be non-negative, got $l2v"))
519+
end
520+
if dim < 1
521+
throw(ArgumentError("dim must be positive, got $dim"))
522+
end
523+
if numepoch < 1
524+
throw(ArgumentError("numepoch must be positive, got $numepoch"))
525+
end
526+
if chunksize < 1
527+
throw(ArgumentError("chunksize must be positive, got $chunksize"))
528+
end
529+
if lower < 0
530+
throw(ArgumentError("lower must be non-negative, got $lower"))
531+
end
532+
if upper < 0
533+
throw(ArgumentError("upper must be non-negative, got $upper"))
534+
end
535+
# Check dim vs matrix size
536+
if min(N, M) < dim
537+
throw(ArgumentError("dim ($dim) must be <= min(N, M) = $(min(N, M))"))
538+
end
496539
# Initialization of U and V
497540
U = load_or_random(initU, N, dim)
498541
V = load_or_random(initV, M, dim)

src/dnmf.jl

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -306,13 +306,29 @@ function init_dnmf(
306306
lower::Number,
307307
upper::Number
308308
)
309-
# Type Check
309+
# Initizalization
310310
N, M = nm(input)
311311
binu = convert(Float32, binu)
312312
binv = convert(Float32, binv)
313313
teru = convert(Float32, teru)
314314
terv = convert(Float32, terv)
315315
graphv = convert(Float32, graphv)
316+
# Check non-negative parameters
317+
if binu < 0
318+
throw(ArgumentError("binu must be non-negative, got $binu"))
319+
end
320+
if binv < 0
321+
throw(ArgumentError("binv must be non-negative, got $binv"))
322+
end
323+
if teru < 0
324+
throw(ArgumentError("teru must be non-negative, got $teru"))
325+
end
326+
if terv < 0
327+
throw(ArgumentError("terv must be non-negative, got $terv"))
328+
end
329+
if graphv < 0
330+
throw(ArgumentError("graphv must be non-negative, got $graphv"))
331+
end
316332
# Initialization by NMF
317333
out_nmf = nmf(
318334
input=input,

src/nmf.jl

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -381,7 +381,7 @@ function init_nmf(
381381
lower::Number,
382382
upper::Number
383383
)
384-
# Type Check
384+
# Initialization
385385
N, M = nm(input)
386386
alpha = convert(Float32, alpha)
387387
beta = convert(Float32, beta)
@@ -395,6 +395,49 @@ function init_nmf(
395395
chunksize = convert(Int64, chunksize)
396396
lower = convert(Float32, lower)
397397
upper = convert(Float32, upper)
398+
# Argument Check
399+
# Check matrix dimensions (single row/column is meaningless for NMF)
400+
if N == 1
401+
throw(ArgumentError("Input matrix has only 1 row. NMF requires at least 2 rows."))
402+
end
403+
if M == 1
404+
throw(ArgumentError("Input matrix has only 1 column. NMF requires at least 2 columns."))
405+
end
406+
# Check non-negative parameters
407+
if graphv < 0
408+
throw(ArgumentError("graphv must be non-negative, got $graphv"))
409+
end
410+
if l1u < 0
411+
throw(ArgumentError("l1u must be non-negative, got $l1u"))
412+
end
413+
if l1v < 0
414+
throw(ArgumentError("l1v must be non-negative, got $l1v"))
415+
end
416+
if l2u < 0
417+
throw(ArgumentError("l2u must be non-negative, got $l2u"))
418+
end
419+
if l2v < 0
420+
throw(ArgumentError("l2v must be non-negative, got $l2v"))
421+
end
422+
if dim < 1
423+
throw(ArgumentError("dim must be positive, got $dim"))
424+
end
425+
if numepoch < 1
426+
throw(ArgumentError("numepoch must be positive, got $numepoch"))
427+
end
428+
if chunksize < 1
429+
throw(ArgumentError("chunksize must be positive, got $chunksize"))
430+
end
431+
if lower < 0
432+
throw(ArgumentError("lower must be non-negative, got $lower"))
433+
end
434+
if upper < 0
435+
throw(ArgumentError("upper must be non-negative, got $upper"))
436+
end
437+
# Check dim vs matrix size
438+
if min(N, M) < dim
439+
throw(ArgumentError("dim ($dim) must be <= min(N, M) = $(min(N, M))"))
440+
end
398441
# Initialization of U and V
399442
U = load_or_random(initU, N, dim)
400443
V = load_or_random(initV, M, dim)

0 commit comments

Comments
 (0)