-
Notifications
You must be signed in to change notification settings - Fork 22
Add functionality to use arbitrary singular values in ConcatDiskArrays #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
9add95a
b744d4f
2284968
eaadc95
afe15cc
6781eae
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -1,4 +1,3 @@ | ||||||
|
||||||
""" | ||||||
ConcatDiskArray <: AbstractDiskArray | ||||||
|
||||||
|
@@ -15,7 +14,7 @@ Returned from `cat` on disk arrays. | |||||
|
||||||
It is also useful on its own as it can easily concatenate an array of disk arrays. | ||||||
""" | ||||||
struct ConcatDiskArray{T,N,P,C,HC,ID} <: AbstractDiskArray{T,N} | ||||||
struct ConcatDiskArray{T,N,P,C,HC, ID} <: AbstractDiskArray{T,N} | ||||||
parents::P | ||||||
startinds::NTuple{N,Vector{Int}} | ||||||
size::NTuple{N,Int} | ||||||
|
@@ -24,23 +23,25 @@ struct ConcatDiskArray{T,N,P,C,HC,ID} <: AbstractDiskArray{T,N} | |||||
innerdims::Val{ID} | ||||||
end | ||||||
|
||||||
function ConcatDiskArray(arrays::AbstractArray{Union{<:AbstractArray,Missing}}) | ||||||
function ConcatDiskArray(arrays::AbstractArray{Union{<:AbstractArray,Missing}}; fill=missing) | ||||||
et = Base.nonmissingtype(eltype(arrays)) | ||||||
T = Union{Missing,eltype(et)} | ||||||
T = promotetype(typeof(fill), eltype(et)) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought the passed in array would still contain either arrays or So what we have here is kinda half way. Probably we should either drop the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I prefer to drop the fill keyword. I actual thought I already did so. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But is that actually useful? I can't imagine a real use case for multiple fill values. And putting The missing value used in the array of arrays doesn't have to be coupled to the fill value. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that it might be a bit dangerous to allow arbitrary values to represent missings. OTOH one never knows when this might become useful for some reason. I am ok with both ways and would currently slightly prefer @felixcremer s solution because there is already an implementation. If we want to be a bit more on the safe side, we could introduce a small wrapper type like struct MissingTile{F}
fillvalue::F
end that explicitly marks missing entries to be concatenated and how they should be filled. The main reason why the current solution might be problematic is for DiskArrays with arrays as eltype, but maybe this is overthinking things a bit too much. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't you have to read all the fill values in the array and promote them? Missing is not necessarily in the type? |
||||||
N = ndims(arrays) | ||||||
M = ndims(et) | ||||||
_ConcatDiskArray(arrays, T, Val(N), Val(M)) | ||||||
end | ||||||
|
||||||
function infer_eltypes(arrays) | ||||||
foldl(arrays, init=(-1, Union{})) do (M, T), a | ||||||
if ismissing(a) | ||||||
(M, promote_type(Missing, T)) | ||||||
if !isa(a, AbstractArray) | ||||||
(M, promote_type(typeof(a), T)) | ||||||
else | ||||||
M == -1 || ndims(a) == M || throw(ArgumentError("All arrays to concatenate must have equal ndims")) | ||||||
(ndims(a), promote_type(eltype(a), T)) | ||||||
end | ||||||
end | ||||||
end | ||||||
|
||||||
function ConcatDiskArray(arrays::AbstractArray{<:AbstractArray}) | ||||||
N = ndims(arrays) | ||||||
T = eltype(eltype(arrays)) | ||||||
|
@@ -90,7 +91,7 @@ function arraysize_and_startinds(arrays1) | |||||
sizes = map(i -> zeros(Int, i), size(arrays1)) | ||||||
for i in CartesianIndices(arrays1) | ||||||
ai = arrays1[i] | ||||||
ismissing(ai) && continue | ||||||
ai isa AbstractArray || continue | ||||||
sizecur = extenddims(size(ai), size(arrays1), 1) | ||||||
foreach(sizecur, i.I, sizes) do si, ind, sizeall | ||||||
if sizeall[ind] == 0 | ||||||
|
@@ -123,10 +124,10 @@ function readblock!(a::ConcatDiskArray, aout, inds::AbstractUnitRange...) | |||||
# Find affected blocks and indices in blocks | ||||||
_concat_diskarray_block_io(a, inds...) do outer_range, array_range, I | ||||||
vout = view(aout, outer_range...) | ||||||
if ismissing(I) | ||||||
vout .= missing | ||||||
else | ||||||
if I isa CartesianIndex | ||||||
readblock!(a.parents[I], vout, array_range...) | ||||||
else | ||||||
vout .= I | ||||||
end | ||||||
end | ||||||
end | ||||||
|
@@ -170,10 +171,10 @@ function _concat_diskarray_block_io(f, a::ConcatDiskArray, inds...) | |||||
#Shorten array range to shape of actual array | ||||||
array_range = ntuple(j -> array_range[j], ID) | ||||||
outer_range = fix_outerrangeshape(outer_range, array_range) | ||||||
if ismissing(myar) | ||||||
f(outer_range, array_range, missing) | ||||||
else | ||||||
if myar isa AbstractArray | ||||||
f(outer_range, array_range, cI) | ||||||
else | ||||||
f(outer_range, array_range, myar) | ||||||
end | ||||||
end | ||||||
end | ||||||
|
@@ -189,13 +190,13 @@ function concat_chunksize(parents) | |||||
newchunks = map(s -> Vector{Union{RegularChunks,IrregularChunks}}(undef, s), size(parents)) | ||||||
for i in CartesianIndices(parents) | ||||||
array = parents[i] | ||||||
ismissing(array) && continue | ||||||
!isa(array,AbstractArray) && continue | ||||||
felixcremer marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
chunks = eachchunk(array) | ||||||
foreach(chunks.chunks, i.I, newchunks) do c, ind, newc | ||||||
if !isassigned(newc, ind) | ||||||
newc[ind] = c | ||||||
elseif c != newc[ind] | ||||||
throw(ArgumentError("Chunk sizes don't forma grid")) | ||||||
throw(ArgumentError("Chunk sizes don't form a grid")) | ||||||
end | ||||||
end | ||||||
end | ||||||
|
Uh oh!
There was an error while loading. Please reload this page.