Skip to content

Commit a48787c

Browse files
meggartfelixcremer
andauthored
WIP: Complete reimplementation of getindex and setindex (#146)
* Move _DiskArray out of tests * First commit of bikeshedding everything * Start new indexing scheme * More tests passing * Tests pass for first time * setindex tests pass as well * delete most of batchgetindex * start work on batch optimization * cont * batch reading for vector is done in slices * All tests passing for chunkwise access * Fix merge errors * Remove Manifest * more batch options * range-based approach * clean-up work * Improve Readme * some cleanup * remove include * Fix some type instabilities * Fix some type instabilities and test them * Fix a special case * Clean up and fix vec * Update README.md Co-authored-by: Felix Cremer <[email protected]> * SMall readme fix --------- Co-authored-by: Felix Cremer <[email protected]>
1 parent 48ba993 commit a48787c

File tree

11 files changed

+629
-420
lines changed

11 files changed

+629
-420
lines changed

.github/workflows/ci.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@ jobs:
1515
fail-fast: false
1616
matrix:
1717
version:
18-
- '1.6' # Replace this with the minimum Julia version that your package supports. E.g. if your package requires Julia 1.5 or higher, change this to '1.5'.
1918
- '1' # Leave this line unchanged. '1' will automatically expand to the latest stable 1.x release of Julia.
2019
- 'nightly'
2120
os:

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@
33
*.jl.mem
44
/deps/deps.jl
55
/docs/build
6+
Manifest.toml

Project.toml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "DiskArrays"
22
uuid = "3c3547ce-8d99-4f5e-a174-61eb10b00ae3"
33
authors = ["Fabian Gans <[email protected]>"]
4-
version = "0.3.23"
4+
version = "0.4.0"
55

66
[deps]
77
LRUCache = "8ac3fa9e-de4c-5943-b1dc-09c6b5f20637"
@@ -10,10 +10,10 @@ OffsetArrays = "6fe1bfb0-de20-5000-8ca7-80f57d26f881"
1010
[compat]
1111
Aqua = "0.8"
1212
LRUCache = "1"
13-
OffsetArrays = "1"
14-
Statistics = "1.6"
15-
Test = "1.6"
1613
julia = "1.6"
14+
OffsetArrays = "1"
15+
Statistics = "1.9"
16+
Test = "1.9"
1717

1818
[extras]
1919
Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"

README.md

Lines changed: 45 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,13 @@
66
[![CI](https://github.com/meggart/DiskArrays.jl/actions/workflows/ci.yml/badge.svg)](https://github.com/meggart/DiskArrays.jl/actions/workflows/ci.yml)
77
[![Codecov](https://codecov.io/gh/meggart/DiskArrays.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/meggart/DiskArrays.jl/tree/main)
88

9-
This package is an attempt to collect utilities for working with n-dimensional array-like data
10-
structures that do not have considerable overhead for single read operations. Most important
11-
examples are arrays that represent data on hard disk that are accessed through a C
12-
library or that are compressed in chunks. It can be inadvisable to make these arrays a subtype
13-
of `AbstractArray` many functions working with AbstractArrays assume fast random access into single
14-
values (including basic things like `getindex`, `show`, `reduce`, etc...). Currently supported features are:
9+
This package provides a collection of utilities for working with n-dimensional array-like data
10+
structures that do have considerable overhead for single read operations.
11+
Most important examples are arrays that represent data on hard disk that are accessed through a C
12+
library or that are compressed in chunks.
13+
It can be inadvisable to make these arrays a direct subtype of `AbstractArray` many functions working with AbstractArrays assume fast random access into single values (including basic things like `getindex`, `show`, `reduce`, etc...).
14+
15+
Currently supported features are:
1516

1617
- `getindex`/`setindex` with the same rules as base (trailing or singleton dimensions etc)
1718
- views into `DiskArrays`
@@ -23,13 +24,44 @@ values (including basic things like `getindex`, `show`, `reduce`, etc...). Curre
2324
- customization of `broadcast` when there is a `DiskArray` on the LHS. This at least makes things
2425
like `a.=5` possible and relatively fast
2526

26-
There are basically two ways to use this package.
27-
Either one makes the abstraction directly a subtype of `AbstractDiskArray` which requires
28-
to implement a single `readblock!` method that reads a Cartesian range of data points.
29-
The remaining `getindex` methods will
30-
come for free then. The second way is to use
31-
the `interpret_indices_disk` function to get a translation of the user-supplied indices
32-
into a set of ranges and then use these to read the data from disk.
27+
28+
## AbstractDiskArray Interface definition
29+
30+
Package authors who want to use this library to make their disk-based array an `AbstractDiskArray` should at least
31+
implement methods for the following functions:
32+
33+
````julia
34+
Base.size(A::CustomDiskArray)
35+
readblock!(A::CustomDiskArray{T,N},aout,r::Vargarg{AbstractUnitRange,N})
36+
writeblock!(A::CustomDiskArray{T,N},ain,r::Vargarg{AbstractUnitRange,N})
37+
````
38+
39+
Here `readblock!` will read a subset of array `A` in a hyper-rectangle defined by the unit ranges `r`. The results shall be written into `aout`. `writeblock!` should write the data given by `ain` into the (hyper-)rectangle of A defined by `r`
40+
When defining the functions it can be safely assumed that `length(r) == ndims(A)` as well as `size(ain) == length.(r)`.
41+
However, bounds checking is *not* performed by the DiskArray machinery and currently should be done by the implementation.
42+
43+
If the data on disk has rectangular chunks as underlying storage units, you should addtionally implement the following
44+
methods to optimize some operations like broadcast, reductions and sparse indexing:
45+
46+
````julia
47+
DiskArrays.haschunks(A::CustomDiskArray) = DiskArrays.Chunked()
48+
DiskArrays.eachchunk(A::CustomDiskArray) = DiskArrays.GridChunks(A, chunksize)
49+
````
50+
51+
where `chunksize` is a int-tuple of chunk lengths. If the array does not have an internal chunking structure, one should
52+
define
53+
54+
````julia
55+
DiskArrays.haschunks(A::CustomDiskArray) = DiskArrays.Unchunked()
56+
````
57+
58+
Implementing only these methods makes all kinds of strange indexing patterns work (Colons, StepRanges, Integer vectors,
59+
Boolean masks, CartesianIndices, Arrays of CartesianIndex, and mixtures of all these) while making sure that as few
60+
`readblock!` or `writeblock!` calls as possible are performed by reading a rectangular bounding box of the required
61+
array values and re-arranging the resulting values into the output array.
62+
63+
In addition, DiskArrays.jl provides a few optimizations for sparse indexing patterns to avoid reading and discarding
64+
too much unnecessary data from disk, for example for indices like `A[:,:,[1,1500]]`.
3365

3466
# Example
3567

src/DiskArrays.jl

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ using LRUCache: LRUCache, LRU
99
read(path, String)
1010
end DiskArrays
1111

12-
export AbstractDiskArray, interpret_indices_disk, eachchunk, ChunkIndex, ChunkIndices
12+
export AbstractDiskArray, eachchunk, ChunkIndex, ChunkIndices
1313

1414
include("scalar.jl")
1515
include("diskarray.jl")
@@ -42,7 +42,6 @@ macro implement_diskarray(t)
4242
@implement_array_methods $t
4343
@implement_permutedims $t
4444
@implement_subarray $t
45-
@implement_batchgetindex $t
4645
@implement_cat $t
4746
@implement_zip $t
4847
@implement_show $t
@@ -60,12 +59,12 @@ end
6059
@implement_array_methods AbstractDiskArray
6160
@implement_permutedims AbstractDiskArray
6261
@implement_subarray AbstractDiskArray
63-
@implement_batchgetindex AbstractDiskArray
6462
@implement_cat AbstractDiskArray
6563
@implement_generator AbstractDiskArray
6664
@implement_show AbstractDiskArray
6765

6866
#And we define the test types
6967
include("util/testtypes.jl")
7068

69+
7170
end # module

0 commit comments

Comments
 (0)