-
Notifications
You must be signed in to change notification settings - Fork 5
Support unsorted spectra in TruncationKeepAbove/Below
#26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
TruncationKeepAbove/TruncationKeepBelowTruncationKeepAbove/Below
TruncationKeepAbove/BelowTruncationKeepAbove/Below
Codecov ReportAttention: Patch coverage is
🚀 New features to boost your workflow:
|
|
In the grand scheme of things, I do wonder what the performance impact is replacing slicing with a range by slicing with a general vector representing the same range. julia> using BenchmarkTools
julia> a = randn(1000);
julia> r = 97:197;
julia> v = collect(r);
julia> @benchmark getindex(a, r)
BenchmarkTools.Trial: 10000 samples with 991 evaluations per sample.
Range (min … max): 42.970 ns … 659.687 ns ┊ GC (min … max): 0.00% … 87.48%
Time (median): 48.016 ns ┊ GC (median): 0.00%
Time (mean ± σ): 55.161 ns ± 33.581 ns ┊ GC (mean ± σ): 12.19% ± 14.92%
▄█▅ ▁▂ ▁
████▇▆▆▅▄▄▁▁▁▁▁▁▁▁▁▅▄▁▃▁▁▁▁▁▁▁▁▇█▇▆▄▄▄▁▃▁▃▁▄▃▁▃▄▁▄▁▁▅▄▄▃▆▇██ █
43 ns Histogram: log(frequency) by time 216 ns <
Memory estimate: 928 bytes, allocs estimate: 2.
julia> @benchmark getindex(a, v)
BenchmarkTools.Trial: 10000 samples with 960 evaluations per sample.
Range (min … max): 93.185 ns … 793.706 ns ┊ GC (min … max): 0.00% … 82.07%
Time (median): 96.181 ns ┊ GC (median): 0.00%
Time (mean ± σ): 105.955 ns ± 45.747 ns ┊ GC (mean ± σ): 8.11% ± 12.89%
█▆▂▁ ▂▂ ▁
████▆▆▄▁▃▁▆▅▅▅▄▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅███ █
93.2 ns Histogram: log(frequency) by time 319 ns <
Memory estimate: 928 bytes, allocs estimate: 2.So it is twice as slow for this example, but we are talking about nanosecond operations. That is probably completely negligible in the kind of computations this is going to be used. Maybe the type instability has a larger performance impact. |
|
I wasn't so concerned about the conversion from range to vector, rather that afterward you use the range/vector to slice the |
lkdvos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure if I agree with this implementation. The type instability is definitely a problem, and in this case these strategies are really all just a filter with different values of the truncation and should probably be mapped to that.
I don't think it is really necessary to have the general cases handle all exceptions though, non-sorted singular or eigenvalues are really the exception, and I would say that it is actually better to document which methods assume sorted rather than make them work for arbitrary inputs.
On top of that, for the case in the referenced pull request, actually we are immediately going to map it to a logical vector, so we should really immediately implement this as values .> atol and then this generalization here is obsolete again. Given the code design here, it's actually very easy to specialize for the relevant special cases, so unless we can handle more cases without hurting performance I don't think that is a great deal.
What I am testing with my benchmark is the actual slicing and not the conversion itself? It is the slicing of the vector of values, maybe it makes a bigger difference for the matrices, so let me try: julia> r = 97:197;
julia> v = collect(r);
julia> U = randn(1000,1000);
julia> V = randn(1000, 1000);
julia> S = randn(1000);
julia> @benchmark getindex($S, $r)
BenchmarkTools.Trial: 10000 samples with 991 evaluations per sample.
Range (min … max): 55.667 ns … 712.619 μs ┊ GC (min … max): 0.00% … 99.96%
Time (median): 92.205 ns ┊ GC (median): 0.00%
Time (mean ± σ): 184.328 ns ± 7.389 μs ┊ GC (mean ± σ): 49.27% ± 1.41%
▁▃▅▅▄▅▆█▄▁
▂▂▂▁▁▁▁▁▁▂▂▁▂▁▁▁▁▂▂▂▂▂▂▄██████████▇▅▄▄▃▃▃▃▃▃▃▂▂▃▂▃▂▂▂▂▂▂▂▂▂▂▂ ▃
55.7 ns Histogram: frequency by time 129 ns <
Memory estimate: 928 bytes, allocs estimate: 2.
julia> @benchmark getindex($S, $v)
BenchmarkTools.Trial: 10000 samples with 950 evaluations per sample.
Range (min … max): 104.957 ns … 784.318 μs ┊ GC (min … max): 0.00% … 99.98%
Time (median): 143.947 ns ┊ GC (median): 0.00%
Time (mean ± σ): 226.135 ns ± 7.842 μs ┊ GC (mean ± σ): 34.68% ± 1.00%
▁██
▂▂▂▂▂▂▂▁▁▁▂▂▂▂▁▁▁▁▁▁▁▁▁▁▂▂▄████▆▆▅▄▄▄▄▃▃▃▃▃▃▃▂▂▃▃▃▂▂▂▂▂▂▂▂▂▂▂ ▃
105 ns Histogram: frequency by time 184 ns <
Memory estimate: 928 bytes, allocs estimate: 2.
julia> @benchmark getindex($U, :, $r)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 35.667 μs … 108.709 ms ┊ GC (min … max): 0.00% … 99.88%
Time (median): 86.167 μs ┊ GC (median): 0.00%
Time (mean ± σ): 101.417 μs ± 1.180 ms ┊ GC (mean ± σ): 15.26% ± 1.41%
▁█▇█ ▃▃▂
▅▂▂▂▂▁▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃████▇▆▇████▇▇▇▆▅▄▄▄▄▃▃▃▂▃▂▂ ▃
35.7 μs Histogram: frequency by time 111 μs <
Memory estimate: 800.08 KiB, allocs estimate: 3.
julia> @benchmark getindex($U, :, $v)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 38.958 μs … 129.800 ms ┊ GC (min … max): 0.00% … 99.90%
Time (median): 89.333 μs ┊ GC (median): 0.00%
Time (mean ± σ): 107.037 μs ± 1.374 ms ┊ GC (mean ± σ): 16.35% ± 1.41%
▅▅█ ▁▂▁
▄▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▂▇███▆▅▇████▆▇▇▆▅▅▄▅▄▃▃▃▃▃▃▃▂ ▃
39 μs Histogram: frequency by time 114 μs <
Memory estimate: 800.08 KiB, allocs estimate: 3.
julia> @benchmark getindex($V, $r, :)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 37.375 μs … 146.763 ms ┊ GC (min … max): 0.00% … 99.82%
Time (median): 89.708 μs ┊ GC (median): 0.00%
Time (mean ± σ): 108.304 μs ± 1.527 ms ┊ GC (mean ± σ): 17.45% ± 1.41%
█ ▁
▂▃▂▂▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▃▅██▇▄▄▅██▇▇▆▆▆▅▄▄▃▃▃▃▃▂▂▂▂ ▃
37.4 μs Histogram: frequency by time 112 μs <
Memory estimate: 800.08 KiB, allocs estimate: 3.
julia> @benchmark getindex($V, $v, :)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 87.250 μs … 120.687 ms ┊ GC (min … max): 0.00% … 99.76%
Time (median): 134.458 μs ┊ GC (median): 0.00%
Time (mean ± σ): 150.218 μs ± 1.261 ms ┊ GC (mean ± σ): 10.47% ± 1.41%
▁█
▅▂▂▂▂▁▂▂▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▅███▅▄▄▄▆▆▆▅▅▅▅▅▄▄▃▃▃▃▃▂▂▂▂▂▂▂ ▃
87.2 μs Histogram: frequency by time 163 μs <
Memory estimate: 800.08 KiB, allocs estimate: 3.
So the only case where this seems to make a signicant difference is for slicing the rows of a matrix, which in hindsight is not so surprising. Even then the question is how significant this is within the larger simulation. |
I guess methods that known the values are sorted could immediately call |
|
Sorry @Jutho, I misread your post. Thanks for analyzing that. I wonder if there is more significant difference for GPU arrays. Also, is there some advantage that you can more easily take views and preserve that they are strided, or even do certain truncations in-place if they are ranges (for example with resizing)? I also doubt the overhead of type instability is significant here since it is just a small union that should get handled with union splitting. I think the broader strategy I'm trying to take with this PR is being more agnostic about where the spectrum came from and then what will be done with it after, since I'm open to considering alternatives to this design. One of my objections to the previous design was that some truncation strategies handled unsorted values while others didn't. Documenting that helps to some extent, but I think we should be more systematic about that distinction in some way. Some alternatives to the current design that I can think of are:
These approaches could complement each other and complement the current PR, but I would still lean towards having something like the current PR, since I think |
Co-authored-by: Jutho <[email protected]>
|
The current MatrixAlgebraKit.jl/src/implementations/truncation.jl Lines 139 to 162 in c46119e
does not specialize on the factorisation arguments, i.e We could also have Regarding your other questions, yes instead of slicing a view with a range would be strided, whereas a view with a general vector not, but I don't know if this is very relevant here. And whereas you could in principle interpret throw away the final columns of a matrix as a |
That's a good point, customizing at the level of
I like that proposal.
Those are fair points. My broader point was that I would like to be relatively agnostic about what users might want to do with the output of |
|
Though on a related note, @lkdvos and I were discussing that it would be helpful for |
|
I'm definitely fine trying to avoid type stability in this case, but for posterity: julia> f(x, n) = x > 0.5 ? g(n) : h(n)
f (generic function with 1 method)
julia> g(n) = 1:n
g (generic function with 1 method)
julia> h(n) = collect(1:n)
h (generic function with 1 method)
julia> @btime g(20);
1.716 ns (0 allocations: 0 bytes)
julia> @btime f(1, 20);
1.716 ns (0 allocations: 0 bytes)
julia> @btime h(20);
31.349 ns (2 allocations: 224 bytes)
julia> @btime f(0.2, 20);
32.018 ns (2 allocations: 224 bytes)
julia> v = randn(40);
julia> @btime $v[g(20)];
32.557 ns (2 allocations: 224 bytes)
julia> @btime $v[f(1, 20)];
35.749 ns (2 allocations: 224 bytes)
julia> @btime $v[h(20)];
79.180 ns (4 allocations: 448 bytes)
julia> @btime $v[f(0.2, 20)];
76.093 ns (4 allocations: 448 bytes)I have been under the impression that when there are type instabilities that involve small unions the compiler can take advantage of union splitting to compile those cases into if-statements, so there is very little overhead. |
|
Is there anything that needs to happen here? I am happy with the approach in the current state of the PR. Should I approve so that this can be merged? |
|
Can I take a look at this tomorrow when I'm back in the office? |
I'm good with merging this, I don't have any more to do here. Lukas is traveling so maybe we want to wait for his feedback, it seemed like there was still some disagreement about the premise of this PR. |
lkdvos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some small comments about the implementations, and given the discussions below I think here is my revised opinion/suggestions:
I do like splitting the implementations between findtruncated_sorted and findtruncated_unsorted, to clearly distinguish what is being assumed.
However, there might still be some questions about the sorting here: when we say sorted, I think currently we are assuming real entries sorted in descending (reverse) order, which is what we are using now. This is really only the case for the singular values, so it might make sense to add a by and rev argument to findtruncated_sorted as well, to indicate that this is what we are assuming.
Additionally, I don't think the current implementations are being very consistent with handling complex values, and abs is only sometimes added. Maybe we can be slightly more careful about this, but this ties back a bit to my previous comment about what we are assuming for findtruncated_sorted.
Given the benchmarks above and the general implications of this, I don't think I really like the if issorted branch in the findtruncated implementation. Using the findtruncated_sorted implementation and converting to a Vector{Int} seems like a very hard-coded fix for the type instability, making assumptions about the output type of findtruncated_unsorted without making many restrictions on the input types.
It makes me wonder if it's not both easier, cleaner, more generic and better to maintain in the future if we simply use findtruncated = findtruncated_unsorted, ie we assume unsorted by default. If the implementation would try and sort an input which is already sorted, this is very cheap anyways, so I don't think there are that many drawbacks to this.
In general, when I first wrote this part of the code I wasn't necessarily thinking that findtruncated would be the specialization point, and this was more a way to avoid code duplication than the best point for customization.
As @mtfishman already mentioned, at this point in the algorithms we really are missing some information about what kind of spectrum is being provided, so writing very generic implementations/specializations is not that straightforward.
Given the comments in this PR, I would maybe even argue to get rid of findtruncated altogether, and simply keep findtruncated_sorted and findtruncated_unsorted instead.
Yes, that's a good point. I can add
I can go through and make the default of
I was also wondering about the
I would bias towards keeping |
|
I agree with the remarks of Lukas, but if the reasoning is that sorting an already sorted array is very cheap, then we can maybe ditch the Yet another strategy is to always compute the |
|
For the sake of this PR, I would be fine with just having |
|
I'm still not entirely sure this is the right place for all of this, given how annoying it is to get all of this to line up properly.
At the end of the day, to me the following seems like a very reasonable default implementation, and I am a bit hesitant to think that the infrastructure with sorted or unsorted function truncate!(::typeof(svd_trunc!), (U, S, Vh), strategy::TruncationKeepBelow)
values = diagview(S)
atol = max(strategy.atol, strategy.rtol * norm(values))
ind = searchsortedlast(values, atol):length(values)
return U[:, ind], Diagonal(values[ind]), Vᴴ[ind, :]
endOn a separate note, it seems like we are using the 1-norm determining relative tolerances, which seems odd in the context of singular values since I would argue 2-norms are more relevant there. |
|
Just to be clear, are you basically proposing getting rid of |
Agreed. EDIT: Previously it was actually using the infinity norm (the maximum value) but I changed it to use the 2-norm by default, with the option to specify the norm. |
|
As a concrete step, I'll try to collate all of the feedback above into this PR and we'll see how general and simple we can make I'll add that a nontrivial usage of the |
|
Another option is one that I suggested above which is to pass more information to function findtruncated(::typeof(svd_trunc!), (U, S, Vh), strategy::TruncationKeepBelow)
values = diagview(S)
atol = max(strategy.atol, strategy.rtol * norm(values))
return searchsortedlast(values, atol):length(values)
end
function truncate!(::typeof(svd_trunc!), (U, S, Vh), strategy::TruncationStrategy)
ind = findtruncated(svd_trunc!, (U, S, Vh), strategy)
return U[:, ind], Diagonal(values[ind]), Vᴴ[ind, :]
endThere could still be a general fallback of |
|
I think I've addressed all of the comments. In the latest commits, I:
This is good to go from my end, let me know what you think. |
lkdvos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely looks good to me.
Given that we now really heavily use and rely on findtruncated and findtruncated_sorted, it might make sense to add docstrings to these (especially to document the assumptions for the latter), but otherwise I would be happy to merge this.
Just double-checking here: I don't think this counts as a breaking change, since fieldnames are internal and the rest is fixes and improvements, so we might actually just tag this?
Sure, I can add docstrings and make them public.
The issue is that the |
Co-authored-by: Jutho <[email protected]>
lkdvos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I thought v0.2 got tagged last week, my bad. That makes everything even easier, we can simply tag 0.2?
As the PR title says, this adds support for truncating unsorted spectra in
TruncationKeepAbove/TruncationKeepBelow.It also more systematically handles sorted vs. unsorted spectra where relevant by splitting between codepaths
findtruncated_sortedandfindtruncated_unsorted, where the sorted version outputs unit ranges.This has the downside that those cases of
findtruncatedbecome mildly type unstable (i.e.Union{Vector{Int},UnitRange{Int}}), but I think that is worth it since slicing a spectrum/matrix with a unit range is generally faster than slicing with an arbitrary vector.I'm open to hearing other design ideas as well, an alternative or complement to this could be that certain factorizations indicate to the truncation code that the spectrum is sorted already so it can avoid the check and maybe become type stable.
One motivation for this is that even though the spectrum for an SVD is generally (reverse) sorted, if you do a blockwise SVD of a block diagonal matrix the spectra are only sorted within blocks, this PR helps with handling that case.