Skip to content

Performance of BandedBlockBandedMatrix Vector Multiplication #121

@Luapulu

Description

@Luapulu

I have a big BanedBlockBandedMatrix and need to multiply it with a vector. However it seems to be rather slow and require a lot of allocation. Here's a MWE

using BlockArrays: BlockRange
using ApproxFunOrthogonalPolynomials
using LinearAlgebra
using SparseArrays

using BenchmarkTools

# Making one of my Matrices
CC = Chebyshev(-1..1)         ⊗ Chebyshev(-1..1)
UC = Ultraspherical(1, -1..1) ⊗ Chebyshev(-1..1)
CU = Chebyshev(-1..1)         ⊗ Ultraspherical(1, -1..1)
UU = Ultraspherical(1, -1..1) ⊗ Ultraspherical(1, -1..1)

degree = 500

Du = Derivative(CC, [1,0])[BlockRange(1:degree), BlockRange(1:degree+1)]
UCtoC2 = Conversion(UC, UU)[BlockRange(1:degree), BlockRange(1:degree)]
DutoC2 = UCtoC2 * Du

# Benchmarks

@benchmark mul!(out, $DutoC2, v) setup=begin
    out = Vector{Float64}(undef, size(DutoC2, 1))
    v = rand(size(DutoC2, 2))
end
# 3.2 ms with 257.62 KiB allocated

@benchmark mul!(out, $(sparse(DutoC2)), v) setup=begin
    out = Vector{Float64}(undef, size(DutoC2, 1))
    v = rand(size(DutoC2, 2))
end
# For comparison: 1.5 ms with 0 bytes allocated

I wonder why the BandedBlockBandedMatrix multiplication allocates so much and why it's slower than using a sparse matrix. Any ideas?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions