-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Description
I have a big BanedBlockBandedMatrix
and need to multiply it with a vector. However it seems to be rather slow and require a lot of allocation. Here's a MWE
using BlockArrays: BlockRange
using ApproxFunOrthogonalPolynomials
using LinearAlgebra
using SparseArrays
using BenchmarkTools
# Making one of my Matrices
CC = Chebyshev(-1..1) ⊗ Chebyshev(-1..1)
UC = Ultraspherical(1, -1..1) ⊗ Chebyshev(-1..1)
CU = Chebyshev(-1..1) ⊗ Ultraspherical(1, -1..1)
UU = Ultraspherical(1, -1..1) ⊗ Ultraspherical(1, -1..1)
degree = 500
Du = Derivative(CC, [1,0])[BlockRange(1:degree), BlockRange(1:degree+1)]
UCtoC2 = Conversion(UC, UU)[BlockRange(1:degree), BlockRange(1:degree)]
DutoC2 = UCtoC2 * Du
# Benchmarks
@benchmark mul!(out, $DutoC2, v) setup=begin
out = Vector{Float64}(undef, size(DutoC2, 1))
v = rand(size(DutoC2, 2))
end
# 3.2 ms with 257.62 KiB allocated
@benchmark mul!(out, $(sparse(DutoC2)), v) setup=begin
out = Vector{Float64}(undef, size(DutoC2, 1))
v = rand(size(DutoC2, 2))
end
# For comparison: 1.5 ms with 0 bytes allocated
I wonder why the BandedBlockBandedMatrix
multiplication allocates so much and why it's slower than using a sparse matrix. Any ideas?
Metadata
Metadata
Assignees
Labels
No labels