You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Special case chunk encoding for dict chunk store (#359)
* Add `values` method to `CustomMapping`
For doing a quick pass through the values within a `MutableMapping`, it
is helpful to have the `values` method. This will be needed in some of
the tests that we are adding. So add a quick implementation of `values`
in the `CustomMapping`.
* Test stores contain bytes-like data
* Add a test to check that all values are bytes
Currently this is a somewhat loose requirement that not all stores
enforce. However it is a useful check to have in some cases.
Particularly this is a useful constraint with in-memory stores where
there is a concern that the data in the store might be manipulated
externally due to ndarray views. The bytes instances don't have this
problem as they are immutable and own their data. While views can be
take onto bytes instances, they will be read-only views and thus are not
a concern.
For other stores that place their data in some other storage backend
(e.g. on disk), this is less of a concern. Also other stores may choose
to represent their data in other ways (e.g. LMDB with `memoryview`s).
Manipulating the loaded data from these stores is less of a concern
since as it doesn't affect their actually contents, only an in-memory
representation. In these cases, it may make sense to disable this test.
* Disable bytes value test for LMDBStore w/buffers
It's possible to request that `LMDBStore` return `buffer`s/`memoryview`s
instead of returning `bytes` for values. This is incredibly useful as
LMDB is memory-mapped. So this avoids a copy when accessing the data.
However this means it will fail `test_store_has_bytes_values`. Though
this is ok as noted previously since that is not a hard requirement of
stores. So we just disable that test for this `LMDBStore` case. The
other `LMDBStore` case returns `bytes` instances instead (copying the
data); so, it passes this test without issues.
* Have CustomMapping ensure it stores bytes
Since people will coming looking to see how a store should be
implemented, we should show them good behavior. Namely we should ensure
the values provided to `__setitem__` are `bytes`-like and immutable
(once stored). By coercing the data to `bytes` we ensure that the data
is `bytes`-like and we ensure the data is immutable since `bytes` are
immutable.
* Disable bytes value test for LRUStoreCache
As the point of the `LRUStoreCache` is to merely hold onto values
retrieved from the underlying store and keep them around in memory
unaltered, the caching layer doesn't have any control over what type of
values are returned to it. Thus it doesn't make much sense to test
whether the values it returns are of `bytes` type or not. Though this is
fine as there is not strict requirement that values of `bytes` type be
returned by stores, so simply disable `test_store_has_bytes_values` for
the `LRUStoreCache` test case with a note explaining this.
* Ensure `bytes` in `_encode_chunk` for `dict`
In `Array` when running `_encode_chunk` and using a `dict`-based chunk
store, ensure that the chunk data is of `bytes` type. This is done to
convert the underlying data to an immutable value to protect against
external views onto the data from changing it (as is the case with NumPy
arrays). Also this is done to ensure it is possible to compare
`dict`-based stores easily.
While coercing to a `bytes` object can introduce a copying, this
generally won't happen if a compressor is involved as it will usually
have returned a `bytes` object. Though filters may not, which could
cause this to introduce an extra copy if only filters and no compressors
are used. However this is an unlikely case and is not as important as
guaranteeing the values own their own data and are read-only. Plus this
should allow us to drop the preventive copy that happens earlier when
storing values as this neatly handles that case of no filters and no
compressors.
* Skip copying when no compressor or filter is used
This copy was taken primarily to protect in-memory stores from being
mutated by external views of the array. However all stores we define
(including in-memory ones like `DictStore`) already perform this
safeguarding themselves on the values they store. For the builtin `dict`
store, we perform this safeguarding for it (since `dict`'s don't do this
themselves) by ensuring we only store `bytes` objects into them. As
these are immutable and own their own data, there isn't a way to mutate
their content after storing them. Thus this preventive copy here is not
needed and can be dropped.
* Note dict stores bytes and preemptive copy dropped
* Ignore fail case coverage in bytes-like value test
0 commit comments