-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Probably a bit of a niche use case, but I want to construct my matrix as a BlockSkylineMatrix, and then (because actually each block is still very sparse, but not banded) convert the whole matrix to a SparseMatrixCSC using sparse()
. At present, sparse()
would fall back to the AbstractArray
interface, which is slow (horrendously slow for large matrices). For example, comparing to a dense Matrix
julia> using BenchmarkTools, BlockBandedMatrices, SparseArrays
julia> block_sizes = fill(20, 100);
julia> n = sum(block_sizes)
2000
julia> dm = rand(n, n);
julia> bm = BlockBandedMatrix(BlockBandedMatrices.Zeros(n, n), block_sizes, block_sizes, (2,2));
[ Warning: Zeros is defined in FillArrays and is not public in BlockBandedMatrices
julia> bm.data .= rand(length(bm.data));
julia> @benchmark sparse(dm)
BenchmarkTools.Trial: 264 samples with 1 evaluation per sample.
Range (min … max): 8.313 ms … 162.070 ms ┊ GC (min … max): 0.00% … 94.62%
Time (median): 17.742 ms ┊ GC (median): 3.04%
Time (mean ± σ): 18.910 ms ± 9.344 ms ┊ GC (mean ± σ): 13.49% ± 10.29%
▇█ ▆▃▁
▄▃▃▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▂▁▁▂▁▁▁▁▁▁███▇▅▃▁▁▂▄▇███▄▃▃▂▂▁▁▂▁▃▂▁▂▂▁▂▃▃ ▃
8.31 ms Histogram: frequency by time 25.4 ms <
Memory estimate: 61.05 MiB, allocs estimate: 10.
julia> @benchmark sparse(bm)
BenchmarkTools.Trial: 79 samples with 1 evaluation per sample.
Range (min … max): 62.598 ms … 69.781 ms ┊ GC (min … max): 0.00% … 9.16%
Time (median): 63.138 ms ┊ GC (median): 0.00%
Time (mean ± σ): 63.525 ms ± 1.077 ms ┊ GC (mean ± σ): 0.43% ± 1.11%
▄▆ ▂██▂▂▂ ▂
▄███▄██████▄█▄▆▆▆▆▄█▆▄▁▄▁▁▁▁▁▁▁▁▁▁▁▄▁▁▄▁▁▄▁▄▆▄▁▁▁▄▁▁▁▁▁▁▄▁▄ ▁
62.6 ms Histogram: frequency by time 66.1 ms <
Memory estimate: 8.00 MiB, allocs estimate: 44.
We can make this much better by writing an optimised version of sparse()
(see below):
julia> includet("optimized_sparse.jl")
julia> using BenchmarkTools, BlockBandedMatrices, SparseArrays
julia> block_sizes = fill(20, 100);
julia> n = sum(block_sizes)
2000
julia> dm = rand(n, n);
julia> bm = BlockBandedMatrix(BlockBandedMatrices.Zeros(n, n), block_sizes, block_sizes, (2,2));
julia> bm.data .= rand(length(bm.data));
julia> @benchmark sparse(dm)
BenchmarkTools.Trial: 258 samples with 1 evaluation per sample.
Range (min … max): 8.480 ms … 177.454 ms ┊ GC (min … max): 0.00% … 90.16%
Time (median): 17.825 ms ┊ GC (median): 5.62%
Time (mean ± σ): 19.435 ms ± 10.924 ms ┊ GC (mean ± σ): 15.43% ± 12.38%
▁██▃ ▁▂ ▁
▄▇▁▂▁▁▂▂▁▂▁▃▄▄▃▁▂▁▂▁▁▁▁▂▁▃▅████▃▃▂▁▁▃██▇▅▄▃▃▁▂▁▂▄▅▃▃▂▄█▄▄▂▂▂ ▃
8.48 ms Histogram: frequency by time 26.9 ms <
Memory estimate: 61.05 MiB, allocs estimate: 10.
julia> @benchmark sparse(bm)
BenchmarkTools.Trial: 4117 samples with 1 evaluation per sample.
Range (min … max): 828.653 μs … 8.365 ms ┊ GC (min … max): 0.00% … 86.64%
Time (median): 986.804 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.210 ms ± 438.267 μs ┊ GC (mean ± σ): 17.75% ± 19.14%
▂▅█▇▆▂
▂▄███████▇▅▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▂▁▁▂▂▂▃▃▃▄▄▅▅▅▅▅▅▄▄▄▃▃▃▂▂▂▂▂▂ ▃
829 μs Histogram: frequency by time 2.1 ms <
Memory estimate: 11.17 MiB, allocs estimate: 1363.
My optimised version is maybe a bit hacky, I suspect I might have missed some methods in BlockBandedMatrices.jl and/or BlockArrays.jl to simplify some of the indexing, but is good enough for me. Correctness is easy enough to check by doing something like bm == sparse(bm)
.
using BlockBandedMatrices
using BlockArrays
using SparseArrays
import SparseArrays: sparse
function sparse(m::BlockSkylineMatrix)
s = spzeros(size(m))
s_nzvals = SparseArrays.nonzeros(s)
s_rowvals = SparseArrays.rowvals(s)
s_colptr = SparseArrays.getcolptr(s)
# Number of blocks
N, M = blocksize(m)
l, u = BlockBandedMatrices.colblockbandwidths(m)
m_block_sizes = blocksizes(m)
column_widths = [bs[2] for bs ∈ m_block_sizes[1,:]]
column_starts = [0]
append!(column_starts, cumsum(@view(column_widths[1:end-1])))
@views column_starts .+= 1
row_sizes = [bs[1] for bs ∈ m_block_sizes]
row_offsets = [0]
append!(row_offsets, cumsum(row_sizes[1:end-1]))
for J ∈ 1:M
KR = max(1,J-u[J]):min(J+l[J],N)
if !isempty(KR)
# Don't use BlockArrays.jl interface for extracting blocks, because that would
# not be type-stable. We only select blocks that are a contiguous, dense array
# so can construct a `Matrix` using the block start and block size.
n_cols = column_widths[J]
n_rows = sum(row_sizes[KR,J])
n_block = n_cols * n_rows
this_start = BlockBandedMatrices.blockstart(m, KR[1], J)
this_block = reshape(@view(m.data[this_start:this_start+n_block-1]), n_rows, n_cols)
sparse_block = sparse(this_block)
this_nzvals = SparseArrays.nonzeros(sparse_block)
append!(s_nzvals, this_nzvals)
# The rows in the full matrix actually start at row_offsets[KR[1]]+1, so add
# the offset to all the rowvals.
this_rowvals = SparseArrays.rowvals(sparse_block) .+ row_offsets[KR[1]]
append!(s_rowvals, this_rowvals)
this_col_start = column_starts[J]
this_col_end = this_col_start + n_cols - 1
this_colptr = SparseArrays.getcolptr(sparse_block)
@. @views s_colptr[this_col_start+1:this_col_end+1] += this_colptr[2:end] - 1
@. @views s_colptr[this_col_end+2:end] += this_colptr[end] - 1
end
end
return s
end
Maybe it could be useful to have a SparseArrays extension for this package, and put something like this in it?