Skip to content

Commit 3e410ac

Browse files
youdongguoyoudongguotimholy
authored
Yg/add read me (#3)
* deal with Inf case, change jump * add CI.yml and dependabot.yml * update CI and Codecov badges * add description for functions * add demo to readme * resize the figure in readme * resize figure in readme * finish readme, test file. finish everything * fix demo dir issue * fix test ci failing issue: issue better than gsvd * change the api and reorganize the code, haven't update doc * update the api and redesign the doc Co-authored-by: Tim Holy <[email protected]> * change api move initial NMF step * add Docstring in ? mode --------- Co-authored-by: youdongguo <[email protected]> Co-authored-by: Tim Holy <[email protected]>
1 parent e4ad263 commit 3e410ac

File tree

10 files changed

+411
-32
lines changed

10 files changed

+411
-32
lines changed

.github/dependabot.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
2+
version: 2
3+
updates:
4+
- package-ecosystem: "github-actions"
5+
directory: "/" # Location of package manifests
6+
schedule:
7+
interval: "weekly"

.github/workflows/CI.yml

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
name: CI
2+
on:
3+
push:
4+
branches:
5+
- main
6+
tags: ['*']
7+
pull_request:
8+
workflow_dispatch:
9+
concurrency:
10+
# Skip intermediate builds: always.
11+
# Cancel intermediate builds: only if it is a pull request build.
12+
group: ${{ github.workflow }}-${{ github.ref }}
13+
cancel-in-progress: ${{ startsWith(github.ref, 'refs/pull/') }}
14+
jobs:
15+
test:
16+
name: Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }}
17+
runs-on: ${{ matrix.os }}
18+
timeout-minutes: 60
19+
permissions: # needed to allow julia-actions/cache to proactively delete old caches that it has created
20+
actions: write
21+
contents: read
22+
strategy:
23+
fail-fast: false
24+
matrix:
25+
version:
26+
- '1.10'
27+
- '1'
28+
os:
29+
- ubuntu-latest
30+
arch:
31+
- x64
32+
steps:
33+
- uses: actions/checkout@v4
34+
- uses: julia-actions/setup-julia@v2
35+
with:
36+
version: ${{ matrix.version }}
37+
arch: ${{ matrix.arch }}
38+
- uses: julia-actions/cache@v2
39+
- uses: julia-actions/julia-buildpkg@v1
40+
- uses: julia-actions/julia-runtest@v1
41+
- uses: julia-actions/julia-processcoverage@v1
42+
- uses: codecov/codecov-action@v4
43+
with:
44+
files: lcov.info
45+
token: ${{ secrets.CODECOV_TOKEN }}
46+
fail_ci_if_error: false
47+
# docs:
48+
# name: Documentation
49+
# runs-on: ubuntu-latest
50+
# permissions:
51+
# actions: write # needed to allow julia-actions/cache to proactively delete old caches that it has created
52+
# contents: write
53+
# statuses: write
54+
# steps:
55+
# - uses: actions/checkout@v4
56+
# - uses: julia-actions/setup-julia@v2
57+
# with:
58+
# version: '1'
59+
# - uses: julia-actions/cache@v2
60+
# - name: Configure doc environment
61+
# shell: julia --project=docs --color=yes {0}
62+
# run: |
63+
# using Pkg
64+
# Pkg.develop(PackageSpec(path=pwd()))
65+
# Pkg.instantiate()
66+
# - uses: julia-actions/julia-buildpkg@v1
67+
# - uses: julia-actions/julia-docdeploy@v1
68+
# env:
69+
# GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
70+
# DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }}
71+
# - name: Run doctests
72+
# shell: julia --project=docs --color=yes {0}
73+
# run: |
74+
# using Documenter: DocMeta, doctest
75+
# using GsvdInitialization
76+
# DocMeta.setdocmeta!(GsvdInitialization, :DocTestSetup, :(using GsvdInitialization); recursive=true)
77+
# doctest(GsvdInitialization)

Project.toml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,18 @@ version = "0.1.0"
77
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
88
NMF = "6ef6ca0d-6ad7-5ff6-b225-e928bfa0a386"
99
NonNegLeastSquares = "b7351bd1-99d9-5c5d-8786-f205a815c4d7"
10+
TSVD = "9449cd9e-2762-5aa3-a617-5413e99d722e"
1011

1112
[compat]
13+
LinearAlgebra = "1"
1214
NMF = "1"
1315
NonNegLeastSquares = "0.4"
14-
julia = "1"
16+
TSVD = "0.4"
17+
julia = "1.10"
1518

1619
[extras]
20+
NMF = "6ef6ca0d-6ad7-5ff6-b225-e928bfa0a386"
1721
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
1822

1923
[targets]
20-
test = ["Test"]
24+
test = ["NMF", "Test"]

README.md

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,147 @@
11
# GsvdInitialization
2+
3+
[![CI](https://github.com/HolyLab/GsvdInitialization.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/HolyLab/GsvdInitialization.jl/actions/workflows/CI.yml)
4+
[![codecov](https://codecov.io/gh/HolyLab/GsvdInitialization.jl/graph/badge.svg?token=LxqRCsZIvn)](https://codecov.io/gh/HolyLab/GsvdInitialization.jl)
5+
6+
This package includes the code of the paper 'GSVD-NMF: Recovering Missing Features in
7+
Non-negative Matrix Factorization`.
8+
It is used to recover Non-negative matrix factorization(NMF) components from low-dimensional space to higher dimensional space by exploiting the generalized singular value decomposition (GSVD) between existing NMF results and the SVD of X.
9+
This method allows the incremental expansion of the number of components, which can be convenient and effective for interactive analysis of large-scale data.
10+
11+
See also [NMFMerge](https://github.com/HolyLab/NMFMerge.jl) for the converse operation. Together, the two result in a substantial improvement in the quality and consistency of NMF factorization.
12+
---------------------------
13+
14+
Demo:
15+
16+
To run this demo, NMF.jl and LinearAlgebra.jl are also required.
17+
18+
Install and load packages
19+
```julia
20+
julia>] add GsvdInitialization;
21+
julia> using GsvdInitialization, NMF, LinearAlgebra;
22+
```
23+
24+
Generating grouth truth with 10 features.
25+
26+
```julia
27+
julia> include("demo/generate_ground_truth.jl")
28+
julia> W_GT, H_GT = generate_ground_truth();
29+
julia> X = W_GT*H_GT;
30+
```
31+
32+
<img src="demo/GroundTruth.png" alt="Sample Figure" width="400"/>
33+
34+
Running standard NMF(HALS) using NNDSVD as initialization on X. Here, we're taking a couple of precautions to try to ensure the best possible result from NMF:
35+
- we disable premature convergence by setting `maxiter` to something that is practically infinite
36+
- we use the full `svd`, rather than `rsvd`, for initializing NNDSVD, as `svd` gives higher-quality results than `rsvd`
37+
Despite these precautions, we'll see that the NMF result leaves much to be desired:
38+
39+
```julia
40+
julia> result_hals = nnmf(X, 10; init=:nndsvd, alg = :cd, tol = 1e-4, maxiter=10^12, initdata = svd(X));
41+
julia> sum(abs2, X-result_hals.W*result_hals.H)/sum(abs2, X)
42+
0.0999994991270576
43+
```
44+
The result is given by
45+
46+
<img src="demo/ResultHals.png" alt="Sample Figure" width="400"/>
47+
48+
This factorization is not perfect as two components are same and two features share one component.
49+
Then, running GSVD-NMF on X (also using NNSVD as initialization).
50+
51+
```julia
52+
Wgsvd, Hgsvd = gsvdnmf(X, 9=>10; alg = :cd, tol_final = 1e-4, tol_intermediate = 1e-2, maxiter = 10^12);
53+
julia> sum(abs2, X-Wgsvd*Hgsvd)/sum(abs2, X)
54+
1.2322603074132593e-10
55+
```
56+
Gsvd-NMF factorizes the gound truth well based on the comparison between relative fitting errors and figures.
57+
58+
<img src="demo/ResultGsvdNMF.png" alt="Sample Figure" width="400"/>
59+
60+
61+
---------------------------
62+
63+
## Functions
64+
65+
W, H = **gsvdnmf**(X::AbstractMatrix, ncomponents::Pair{Int,Int}; tol_final=1e-4, tol_intermediate=1e-4, kwargs...)
66+
67+
This function performs "GSVD-NMF" on 2D data matrix ``X``.
68+
69+
Arguments:
70+
71+
``X``: non-nagetive 2D data matrix
72+
73+
``ncomponents::Pair{Int,Int}``: in the form of ``n1 => n2``, augments from ``n1`` components to ``n2``components, where ``n1`` is the number of components for initial NMF (under-complete NMF), and ``n2`` is the number of components for final NMF.
74+
75+
Alternatively, ``ncomponents`` can be an integer denoting the number of components for final NMF.
76+
In this case, ``gsvdnmf`` defaults to augment components on initial NMF solution by 1.
77+
78+
Keyword arguments:
79+
80+
``tol_final``: The tolerence of final NMF, default:``10^{-4}``
81+
82+
``tol_intermediate``: The tolerence of initial NMF (under-complete NMF), default: $\mathrm{tol\\_final}$
83+
84+
Other keyword arguments are passed to ``NMF.nnmf``.
85+
86+
-----
87+
88+
W, H = **gsvdnmf**(X::AbstractMatrix, W::AbstractMatrix, H::AbstractMatrix, f;
89+
n2 = size(first(f), 2),
90+
tol_nmf=1e-4,
91+
kwargs...)
92+
93+
This funtion augments components for ``W`` and ``H``, and subsequently polishs new ``W`` and ``H`` by NMF.
94+
95+
Arguments:
96+
97+
``X``: non-nagetive 2D data matrix
98+
99+
``W``: initialization of initial NMF
100+
101+
``H``: initialization of initial NMF
102+
103+
``n2``: the number of components in augmented matrix
104+
105+
``f``: SVD (or Truncated SVD) of ``X``, ``f`` needs to be explicitly writen in ``Tuple`` form.
106+
107+
Keyword arguments
108+
109+
``tol_nmf``: the tolerance of NMF polishing step, default: $10^{-4}$
110+
111+
Other keyword arguments are passed to ``NMF.nnmf``.
112+
113+
-----
114+
115+
Wadd, Hadd, S = **gsvdrecover**(X, W0, H0, kadd, f)
116+
117+
This funtion augments components for ``W`` and ``H`` without polishing NMF step.
118+
119+
Outputs:
120+
121+
``Wadd``: augmented NMF solution
122+
123+
``Hadd``: augmented NMF solution
124+
125+
``S``: related generalized singular value
126+
127+
Arguments:
128+
129+
``X``: non-nagetive 2D data matrix
130+
131+
``W0``: NMF solution
132+
133+
``H0``: NMF solution
134+
135+
``kadd``: number of new components
136+
137+
``f``: SVD (or Truncated SVD) of ``X``, ``f`` needs to be indexable.
138+
139+
-----
140+
141+
## Citation
142+
The code is welcomed to be used in your publication, please cite:
143+
144+
145+
146+
147+

demo/GroundTruth.png

72.1 KB
Loading

demo/ResultGsvdNMF.png

73.2 KB
Loading

demo/ResultHals.png

74.4 KB
Loading

demo/generate_ground_truth.jl

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
function generate_ground_truth()
2+
m, n, nfeatures = 150, 120, 10
3+
feature_sigmas = 2*ones(nfeatures)
4+
feature_centers = [i*10+10 for i in 1:nfeatures]
5+
feature_intensity = ones(nfeatures)
6+
W = W_init_gauss(m, feature_centers, feature_sigmas, feature_intensity)
7+
H = zeros(nfeatures, n)
8+
for r in 1:nfeatures
9+
h_start = r*10+7
10+
h_length = 10
11+
h_end = h_start+h_length-1
12+
H[r, h_start:h_end] .+= ones(1, h_length)'
13+
end
14+
return W, H
15+
end
16+
17+
function W_init_gauss(n, centers, sigmas, intensity)
18+
W = []
19+
nc, ns, ni = length(centers), length(sigmas), length(intensity)
20+
nc == ns && ns == ni||throw(ArgumentError("centers, sigmas and intensity should have same length"))
21+
for i in 1:nc
22+
w = zeros(n)
23+
gauss_template = gaussiantemplate(sigmas[i])
24+
δ = Int(round((length(gauss_template)-1)/2))
25+
template_start = centers[i]-δ
26+
template_end = template_start + length(gauss_template)-1
27+
w[template_start:template_end] .+= intensity[i]*gauss_template
28+
push!(W, w)
29+
end
30+
return hcat(W...)
31+
end
32+
33+
function gaussiantemplate(T::Type, r::Real)
34+
len = round(Int, 8*r+1)
35+
w = (len - 1) ÷ 2
36+
template = Array{T}(undef, len)
37+
R2 = 2*r*r
38+
for x = -w:w
39+
template[x+w+1] = exp(-(x*x)/R2)
40+
end
41+
return template
42+
end
43+
gaussiantemplate(r::Real) = gaussiantemplate(Float64, r)
44+

0 commit comments

Comments
 (0)