Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions numcodecs/bitround.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import numpy as np


from .abc import Codec
from .compat import ensure_ndarray, ndarray_copy


class BitRound(Codec):
codec_id = 'bitround'

def __init__(self, keepbits: int):
self.keepbits = keepbits

def encode(self, buf):
# TODO: figure out if we need to make a copy
# Currently this appears to be overwriting the input buffer
# Is that the right behavior?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In BitInformation.jl the rounding is implemented as scalar version which does not overwrite the input (as float32 is immutable so a copy is created anyway), however, I define a rounding function for arrays, that can either act in-place (i.e. overwriting the bits in an existing array) or acts on a copy of the array, such that the input array in unchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. This is more a question about numcodecs (e.g. for @jakirkham), rather than about BitInformation.jl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we shouldn't be overwriting the input buffer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing

b = a.view(dtype=np.int32)

to

b = a.view(dtype=np.int32).copy()

to avoid the overwriting of the input buffer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a.astype(np.int32, copy=True) is more canonical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continued in PR: #608

a = ensure_ndarray(buf).view()
assert a.dtype == np.float32
b = a.view(dtype=np.int32)
maskbits = 23 - self.keepbits
mask = (0xFFFFFFFF >> maskbits) << maskbits
half_quantum1 = (1 << (maskbits - 1)) - 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section will error with ValueError: negative shift count if keepbits=23. That causes the test_no_rounding test to error.

Copy link

@rkouznetsov rkouznetsov Dec 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a feature. half_quantum1 is indeed inexpressible if your quantum is the least significant bit... I would just add return from the beginning or not call the sub at all if keepbits == 23. If keepbits > 23 (or < 0) you might wish to raise an exception...
What is the kind of failure the test is supposed to capture?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find a condition like keepbits >= 0 reasonable, but note that keepbits < 0 within reasonable limits (see below) shouldn't cause problems as this will just round the exponent bits. E.g.

julia> using BitInformation
julia> round(4f0,-1)
2.0f0

So in this format only ...,0.125,0.5,2,8,... are representable. However, as the exponent bits describe logarithmically distributed floats, round to nearest is now round to nearest in log-space. Meaning that while 4 is round down to 2 in the example above,

julia> round(4.1f0,-1)
8.0f0

although in lin-space 4.1 is closer to 2 than to 8. Sure, this has to be treated with caution, as NaN/Inf are defined through their exponents, meaning you end up with situations like

julia> round(NaN32,-1)
-0.0f0
julia> round(Inf32,-1)
-0.0f0

And the carry bit can propagate into the sign bit (for |x|>2), which is also weird

julia> round(4f0,-8)
-0.0f0
julia> round(-4f0,-8)
0.0f0

b += ((b >> maskbits) & 1) + half_quantum1
b &= mask
return b

def decode(self, buf, out=None):
data = ensure_ndarray(buf).view(np.float32)
out = ndarray_copy(data, out)
return out
81 changes: 81 additions & 0 deletions numcodecs/tests/test_bitround.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
import numpy as np

import pytest

from numcodecs.bitround import BitRound

# adapted from https://github.com/milankl/BitInformation.jl/blob/main/test/round_nearest.jl


# TODO: add other dtypes
@pytest.fixture(params=[np.float32])
def dtype(request):
return request.param


# number of mantissa bits for each dtype
MBITS = {np.float32: 23}


def round(data, keepbits):
codec = BitRound(keepbits=keepbits)
data = data.copy() # otherwise overwrites the input
encoded = codec.encode(data)
return codec.decode(encoded)


def test_round_zero_to_zero(dtype):
a = np.zeros((3, 2), dtype=dtype)
# Don't understand Milan's original test:
# How is it possible to have negative keepbits?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just end up rounding the exponent bits, see other comment

# for k in range(-5, 50):
for k in range(0, MBITS[dtype]):
ar = round(a, k)
np.testing.assert_equal(a, ar)


def test_round_one_to_one(dtype):
a = np.ones((3, 2), dtype=dtype)
for k in range(0, MBITS[dtype]):
ar = round(a, k)
np.testing.assert_equal(a, ar)


def test_round_minus_one_to_minus_one(dtype):
a = -np.ones((3, 2), dtype=dtype)
for k in range(0, MBITS[dtype]):
ar = round(a, k)
np.testing.assert_equal(a, ar)


# This triggers a 'negative shift count' error in the codec
def test_no_rounding(dtype):
a = np.random.random_sample((300, 200)).astype(dtype)
keepbits = MBITS[dtype]
ar = round(a, keepbits)
np.testing.assert_equal(a, ar)


APPROX_KEEPBITS = {np.float32: 10}


def test_approx_equal(dtype):
a = np.random.random_sample((300, 200)).astype(dtype)
ar = round(a, APPROX_KEEPBITS[dtype])
# Mimic julia behavior - https://docs.julialang.org/en/v1/base/math/#Base.isapprox
rtol = np.sqrt(np.finfo(np.float32).eps)
# This gets us much closer but still failing for ~6% of the array
# It does pass if we add 1 to keepbits (11 instead of 10)
# Is there an off-by-one issue here?
np.testing.assert_allclose(a, ar, rtol=rtol)
Comment on lines +57 to +65
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this test passes if we use keepbits=11 instead of 10 makes me think we are dealing with a off-by-one issue, perhaps related to julia vs. python indexing

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I just guessed the tolerances here. The motivation for this test was less to test exactness, but just to flag immediately if rounding is completely off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then I will just bump keepbits to 11 for this test.



def test_idempotence(dtype):
a = np.random.random_sample((300, 200)).astype(dtype)
for k in range(20):
ar = round(a, k)
ar2 = round(a, k)
np.testing.assert_equal(ar, ar2)


# TODO: implement tie_to_even and round_to_nearest