Skip to content

Conversation

@mdtanker
Copy link
Member

Adds an equivalanet function to maxabs but for calculating the min and max values of arrays, or optionally user-specified percentiles of the values.

This is useful for

  1. if you want to plot a series of datasets with the same colorscale so you need to calculate the overall min/max for all the datasets
  2. if you want to get robust colormap limits, excluding outliers by using percentiles, such as the 2nd and 98th percentiles.

Relevant issues/PRs:

Implements the function requested in #525
Follows the percentiles approach from #524 and #523

Copy link
Member

@santisoler santisoler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mdtanker for pushing this forward! I just left a few minor suggestions. Let me know what do you think!

It might be also nice to check if min_percentile <= max_percentile so we are ensure that min <= max regardless of the choices of percentile values.

mdtanker and others added 6 commits November 21, 2025 09:39
Co-authored-by: Santiago Soler <santisoler@fastmail.com>
Co-authored-by: Santiago Soler <santisoler@fastmail.com>
Co-authored-by: Santiago Soler <santisoler@fastmail.com>
Co-authored-by: Santiago Soler <santisoler@fastmail.com>
Since percentile calculations are slower than min/max, I only do them if non-default values (0,100) are provided. If only 1 non-default percentile provided, only calculate that percentile and use min/max for other value.
@mdtanker
Copy link
Member Author

Here are some timings for future reference:

import numpy as np
import verde as vd
a = np.random.uniform(size=(27, 100))

  • Using np.nanmin or np.nanmax is ~x2 slower than the non-nan functions

    %timeit vd.minmax(a, nan=True)
    %timeit vd.minmax(a, nan=False)
    
    >>>48.8 μs ± 1.94 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
    >>>20.2 μs ± 1.84 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  • Using percentiles is ~4x slower than min/max

    • Only calculating 1 percentile, and using min/max for the other saves some time
    %timeit vd.minmax(a, min_percentile=0, max_percentile=100, nan=True)
    %timeit vd.minmax(a, min_percentile=1, max_percentile=99, nan=True)
    %timeit vd.minmax(a, min_percentile=0, max_percentile=99, nan=True)
    
    >>>50.5 μs ± 2.56 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
    >>>289 μs ± 26.5 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
    >>>179 μs ± 3.93 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  • np.nanpercentile is ~5-10x slower than np.percentile

    %timeit vd.minmax(a, min_percentile=0, max_percentile=100, nan=False)
    %timeit vd.minmax(a, min_percentile=1, max_percentile=99, nan=False)
    %timeit vd.minmax(a, min_percentile=0, max_percentile=99, nan=False)
    
    >>>20.7 μs ± 843 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
    >>>232 μs ± 17.7 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
    >>>111 μs ± 5.5 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

@mdtanker mdtanker requested a review from santisoler November 21, 2025 11:16
Copy link
Member

@santisoler santisoler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, @mdtanker. Just left a few comments. Let me know what do you think!

Comment on lines +367 to +392
# get min value
if min_percentile == 0:
min_ = npmin([npmin(i) for i in arrays])

# get max value
if max_percentile == 100:
max_ = npmax([npmax(i) for i in arrays])

# calculate min, max or both percentiles
if min_percentile != 0 or max_percentile != 100:
# concatenate values of all arrays
combined_array = np.concatenate([a.ravel() for a in arrays])

# if neither percentiles are defaults, calculate them together
if min_percentile != 0 and max_percentile != 100:
min_, max_ = nppercentile(combined_array, [min_percentile, max_percentile])

# only calculate min percentile
elif min_percentile != 0:
min_ = nppercentile(combined_array, min_percentile)

# only calculate max percentile
if max_percentile != 100:
max_ = nppercentile(combined_array, max_percentile)

return min_, max_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could simplify this logic a bit. What about this:

Suggested change
# get min value
if min_percentile == 0:
min_ = npmin([npmin(i) for i in arrays])
# get max value
if max_percentile == 100:
max_ = npmax([npmax(i) for i in arrays])
# calculate min, max or both percentiles
if min_percentile != 0 or max_percentile != 100:
# concatenate values of all arrays
combined_array = np.concatenate([a.ravel() for a in arrays])
# if neither percentiles are defaults, calculate them together
if min_percentile != 0 and max_percentile != 100:
min_, max_ = nppercentile(combined_array, [min_percentile, max_percentile])
# only calculate min percentile
elif min_percentile != 0:
min_ = nppercentile(combined_array, min_percentile)
# only calculate max percentile
if max_percentile != 100:
max_ = nppercentile(combined_array, max_percentile)
return min_, max_
if min_percentile == 0 and max_percentile == 0:
min_ = npmin([npmin(i) for i in arrays])
max_ = npmax([npmax(i) for i in arrays])
return min_, max_
# concatenate values of all arrays
combined_array = np.concatenate([a.ravel() for a in arrays])
min_, max_ = nppercentile(combined_array, [min_percentile, max_percentile])
return min_, max_

If any of the min_percentile and max_percentile is different from the defaults, then we'll use nppercentile anyways. And a min_percentile=0 will always lead to the minimum value. Moreover, I suspect there's no significant computational hit in passing two percentiles, since I guess that Numpy is computing each while looping over the elements in the array once.

So, we would only use npmin and npmax if min_percentile==0 and max_percentile==100.

What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bit of time savings from only calculating one percentile and using min/max for the other. But maybe the time savings is not worth the extra code? At least that's how I interpreted the benchmarking results from below. The first uses min and max, with no percentiles, the second uses 1 and 99 percentiles, and the third uses min and 99 percentiles. As you can see, using 1 percentile and 1 min/max calculation is a bit fast than using 2 percentile calculations.

What do you think?

%timeit vd.minmax(a, min_percentile=0, max_percentile=100, nan=True)
%timeit vd.minmax(a, min_percentile=1, max_percentile=99, nan=True)
%timeit vd.minmax(a, min_percentile=0, max_percentile=99, nan=True)

>>>50.5 μs ± 2.56 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>>289 μs ± 26.5 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>>>179 μs ± 3.93 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I would have guessed that passing multiple percentiles would not impact the computation time that much. But nonetheless, those differences are not significant, so I think I would keep the code simpler. Do you agree?

Even for a large array with 100 million elements, the difference is not very noticeable by users:

import numpy as np

a = np.random.default_rng().uniform(size=100_000_000)

# benchmark np.min and np.percentile with a single value
%timeit np.min(a)
%timeit np.percentile(a, 90)
38.1 ms ± 351 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
862 ms ± 3.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# benchmark percentile with two values
%timeit np.percentile(a, [0, 90])
1.01 s ± 5.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

mdtanker and others added 4 commits December 4, 2025 15:18
Co-authored-by: Santiago Soler <santisoler@fastmail.com>
Co-authored-by: Santiago Soler <santisoler@fastmail.com>
Co-authored-by: Santiago Soler <santisoler@fastmail.com>
Co-authored-by: Santiago Soler <santisoler@fastmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants