Skip to content

More stable algorithm for variance, standard deviation #456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from
Draft
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
0f29529
update to nanvar to use more stable algorithm if engine is flox
jemmajeffree Jul 18, 2025
1fbf5f8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 18, 2025
322f511
[revert] only nanvar test
dcherian Jul 18, 2025
adab8e6
Some mods
dcherian Jul 18, 2025
93cd9b3
Update flox/aggregations.py to neater tuple unpacking
jemmajeffree Jul 21, 2025
2be4f74
Change np.all to all in flox/aggregate_flox.py
jemmajeffree Jul 21, 2025
edb655d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 21, 2025
dd2e4b6
delete some resolved comments
jemmajeffree Jul 21, 2025
936ed1d
Remove answered questions in comments
jemmajeffree Jul 21, 2025
1968870
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 21, 2025
d036ebc
Merge branch 'main' into var_algorithm
jemmajeffree Jul 21, 2025
12bcb0f
Remove more unnecessary comments
jemmajeffree Jul 21, 2025
6f5bece
Merge branch 'var_algorithm' of github.com:jemmajeffree/flox into var…
jemmajeffree Jul 21, 2025
b1f7b5d
Remove _version.py
jemmajeffree Jul 21, 2025
cd9a8b8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 21, 2025
27448e4
Add preliminary test for std/var precision
jemmajeffree Jul 31, 2025
10214cc
Merge branch 'var_algorithm' of github.com:jemmajeffree/flox into var…
jemmajeffree Jul 31, 2025
a81b1a3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 31, 2025
004fddc
Correct comment
jemmajeffree Jul 31, 2025
4491ce9
fix merge conflicts
jemmajeffree Jul 31, 2025
c3a6d88
Update flox/aggregate_flox.py
jemmajeffree Aug 5, 2025
4dcd7c2
Replace some list comprehension with tuple
jemmajeffree Aug 5, 2025
c101a2b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 5, 2025
98e1b4e
Fixes
dcherian Aug 5, 2025
d0d09df
minor edit for neater test reports.
dcherian Aug 5, 2025
1139a9c
Fix another list/tuple comprehension
jemmajeffree Aug 5, 2025
569629c
implement np.full
jemmajeffree Aug 5, 2025
50ad095
Implement np.full and empty chunks in var_chunk
jemmajeffree Aug 6, 2025
f88e231
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 6, 2025
77526fd
update comment
jemmajeffree Aug 6, 2025
0f5d587
Fix merge conflict
jemmajeffree Aug 6, 2025
31f30c9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions tests/test_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2240,3 +2240,25 @@ def test_sparse_nan_fill_value_reductions(chunks, fill_value, shape, func):
expected = np.expand_dims(npfunc(numpy_array, axis=-1), axis=-1)
actual, *_ = groupby_reduce(array, by, func=func, axis=-1)
assert_equal(actual, expected)


@pytest.mark.parametrize("func", ("nanvar","var")) # Expect to expand this to other functions once written. Putting var in to begin with bc I know it will fail
@pytest.mark.parametrize("engine",("flox",)) # Expect to expand this to other engines once written
@pytest.mark.parametrize("offset",(0,10e2,10e4,10e6,10e8,10e10,10e12)) # Should fail at 10e8 for old algorithm, and survive 10e12 for current
def test_std_var_precision(func,engine,offset):
# Generate a dataset with small variance and big mean
# Check that func with engine gives you the same answer as numpy

l =1000
array = np.linspace(-1,1,l) # has zero mean
labels = np.arange(l)%2 # Ideally we'd parametrize this too.

# These two need to be the same function, but with the offset added and not added
no_offset, _ = groupby_reduce(array, labels, engine=engine, func=func)
with_offset, _ = groupby_reduce(array+offset, labels, engine=engine, func=func)

tol = {"rtol": 1e-8, "atol": 1e-10} # Not sure how stringent to be here

# Failure threshold in my external tests is dependent on dask chunksize, maybe needs exploring better?

assert_equal(no_offset, with_offset, tol)