add new `minmax` function #526

mdtanker · 2025-11-20T09:50:06Z

Adds an equivalanet function to maxabs but for calculating the min and max values of arrays, or optionally user-specified percentiles of the values.

This is useful for

if you want to plot a series of datasets with the same colorscale so you need to calculate the overall min/max for all the datasets
if you want to get robust colormap limits, excluding outliers by using percentiles, such as the 2nd and 98th percentiles.

Relevant issues/PRs:

Implements the function requested in #525
Follows the percentiles approach from #524 and #523

santisoler

Thanks @mdtanker for pushing this forward! I just left a few minor suggestions. Let me know what do you think!

It might be also nice to check if min_percentile <= max_percentile so we are ensure that min <= max regardless of the choices of percentile values.

verde/utils.py

Co-authored-by: Santiago Soler <santisoler@fastmail.com>

Since percentile calculations are slower than min/max, I only do them if non-default values (0,100) are provided. If only 1 non-default percentile provided, only calculate that percentile and use min/max for other value.

mdtanker · 2025-11-21T10:34:29Z

Here are some timings for future reference:

import numpy as np
import verde as vd
a = np.random.uniform(size=(27, 100))

Using np.nanmin or np.nanmax is ~x2 slower than the non-nan functions

%timeit vd.minmax(a, nan=True)
%timeit vd.minmax(a, nan=False)

>>>48.8 μs ± 1.94 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>>20.2 μs ± 1.84 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Using percentiles is ~4x slower than min/max

Only calculating 1 percentile, and using min/max for the other saves some time

%timeit vd.minmax(a, min_percentile=0, max_percentile=100, nan=True)
%timeit vd.minmax(a, min_percentile=1, max_percentile=99, nan=True)
%timeit vd.minmax(a, min_percentile=0, max_percentile=99, nan=True)

>>>50.5 μs ± 2.56 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>>289 μs ± 26.5 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>>>179 μs ± 3.93 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

np.nanpercentile is ~5-10x slower than np.percentile

%timeit vd.minmax(a, min_percentile=0, max_percentile=100, nan=False)
%timeit vd.minmax(a, min_percentile=1, max_percentile=99, nan=False)
%timeit vd.minmax(a, min_percentile=0, max_percentile=99, nan=False)

>>>20.7 μs ± 843 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
>>>232 μs ± 17.7 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
>>>111 μs ± 5.5 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

santisoler

Looking good, @mdtanker. Just left a few comments. Let me know what do you think!

verde/utils.py

santisoler · 2025-12-03T20:50:34Z

verde/utils.py

+    # get min value
+    if min_percentile == 0:
+        min_ = npmin([npmin(i) for i in arrays])
+
+    # get max value
+    if max_percentile == 100:
+        max_ = npmax([npmax(i) for i in arrays])
+
+    # calculate min, max or both percentiles
+    if min_percentile != 0 or max_percentile != 100:
+        # concatenate values of all arrays
+        combined_array = np.concatenate([a.ravel() for a in arrays])
+
+        # if neither percentiles are defaults, calculate them together
+        if min_percentile != 0 and max_percentile != 100:
+            min_, max_ = nppercentile(combined_array, [min_percentile, max_percentile])
+
+        # only calculate min percentile
+        elif min_percentile != 0:
+            min_ = nppercentile(combined_array, min_percentile)
+
+        # only calculate max percentile
+        if max_percentile != 100:
+            max_ = nppercentile(combined_array, max_percentile)
+
+    return min_, max_


I think we could simplify this logic a bit. What about this:

Suggested change

# get min value

if min_percentile == 0:

min_ = npmin([npmin(i) for i in arrays])

# get max value

if max_percentile == 100:

max_ = npmax([npmax(i) for i in arrays])

# calculate min, max or both percentiles

if min_percentile != 0 or max_percentile != 100:

# concatenate values of all arrays

combined_array = np.concatenate([a.ravel() for a in arrays])

# if neither percentiles are defaults, calculate them together

if min_percentile != 0 and max_percentile != 100:

min_, max_ = nppercentile(combined_array, [min_percentile, max_percentile])

# only calculate min percentile

elif min_percentile != 0:

min_ = nppercentile(combined_array, min_percentile)

# only calculate max percentile

if max_percentile != 100:

max_ = nppercentile(combined_array, max_percentile)

return min_, max_

if min_percentile == 0 and max_percentile == 0:

min_ = npmin([npmin(i) for i in arrays])

max_ = npmax([npmax(i) for i in arrays])

return min_, max_

# concatenate values of all arrays

combined_array = np.concatenate([a.ravel() for a in arrays])

min_, max_ = nppercentile(combined_array, [min_percentile, max_percentile])

return min_, max_

If any of the min_percentile and max_percentile is different from the defaults, then we'll use nppercentile anyways. And a min_percentile=0 will always lead to the minimum value. Moreover, I suspect there's no significant computational hit in passing two percentiles, since I guess that Numpy is computing each while looping over the elements in the array once.

So, we would only use npmin and npmax if min_percentile==0 and max_percentile==100.

What do you think?

There is a bit of time savings from only calculating one percentile and using min/max for the other. But maybe the time savings is not worth the extra code? At least that's how I interpreted the benchmarking results from below. The first uses min and max, with no percentiles, the second uses 1 and 99 percentiles, and the third uses min and 99 percentiles. As you can see, using 1 percentile and 1 min/max calculation is a bit fast than using 2 percentile calculations.

What do you think?

%timeit vd.minmax(a, min_percentile=0, max_percentile=100, nan=True) %timeit vd.minmax(a, min_percentile=1, max_percentile=99, nan=True) %timeit vd.minmax(a, min_percentile=0, max_percentile=99, nan=True) >>>50.5 μs ± 2.56 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) >>>289 μs ± 26.5 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) >>>179 μs ± 3.93 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Hmm I would have guessed that passing multiple percentiles would not impact the computation time that much. But nonetheless, those differences are not significant, so I think I would keep the code simpler. Do you agree?

Even for a large array with 100 million elements, the difference is not very noticeable by users:

import numpy as np a = np.random.default_rng().uniform(size=100_000_000) # benchmark np.min and np.percentile with a single value %timeit np.min(a) %timeit np.percentile(a, 90)

38.1 ms ± 351 μs per loop (mean ± std. dev. of 7 runs, 10 loops each) 862 ms ± 3.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# benchmark percentile with two values %timeit np.percentile(a, [0, 90])

1.01 s ± 5.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Co-authored-by: Santiago Soler <santisoler@fastmail.com>

mdtanker added 3 commits November 20, 2025 10:44

add new minmax function

57353a9

fix doctest for minmax

15ef75d

fix minmax test

263b450

mdtanker force-pushed the minmax branch from 87adeda to 263b450 Compare November 20, 2025 13:08

santisoler requested changes Nov 20, 2025

View reviewed changes

verde/utils.py Outdated Show resolved Hide resolved

verde/utils.py Outdated Show resolved Hide resolved

verde/utils.py Outdated Show resolved Hide resolved

verde/utils.py Outdated Show resolved Hide resolved

mdtanker and others added 6 commits November 21, 2025 09:39

add optional type for nan parameter

23010b3

Co-authored-by: Santiago Soler <santisoler@fastmail.com>

use min and max percentiles separetely

deafd99

Co-authored-by: Santiago Soler <santisoler@fastmail.com>

simplify return statement

ead11ad

Co-authored-by: Santiago Soler <santisoler@fastmail.com>

seperate min and max calculations

979916f

Co-authored-by: Santiago Soler <santisoler@fastmail.com>

refactor minmax to seperate min and max percentiles

f146536

Since percentile calculations are slower than min/max, I only do them if non-default values (0,100) are provided. If only 1 non-default percentile provided, only calculate that percentile and use min/max for other value.

update minmax tests

5712f77

remove Union operator to pass py3.9 tests

785bcd8

mdtanker requested a review from santisoler November 21, 2025 11:16

mdtanker and others added 2 commits November 25, 2025 14:26

Merge branch 'main' into minmax

aa3a41c

Merge branch 'main' into minmax

bb24ee3

santisoler requested changes Dec 3, 2025

View reviewed changes

mdtanker and others added 4 commits December 4, 2025 15:18

Update verde/utils.py

e20b4d5

Co-authored-by: Santiago Soler <santisoler@fastmail.com>

Update verde/utils.py

ef753a0

Co-authored-by: Santiago Soler <santisoler@fastmail.com>

Update verde/utils.py

0647cf4

Co-authored-by: Santiago Soler <santisoler@fastmail.com>

Update verde/utils.py

15daf3b

Co-authored-by: Santiago Soler <santisoler@fastmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add new `minmax` function #526

add new `minmax` function #526

Uh oh!

mdtanker commented Nov 20, 2025

Uh oh!

santisoler left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdtanker commented Nov 21, 2025

Uh oh!

santisoler left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

santisoler Dec 3, 2025

Uh oh!

mdtanker Dec 4, 2025

Uh oh!

santisoler Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add new minmax function #526

Are you sure you want to change the base?

add new minmax function #526

Uh oh!

Conversation

mdtanker commented Nov 20, 2025

Uh oh!

santisoler left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdtanker commented Nov 21, 2025

Uh oh!

santisoler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

santisoler Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

mdtanker Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

santisoler Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add new `minmax` function #526

add new `minmax` function #526

santisoler left a comment •

edited

Loading