Skip to content

Make AbstractVariable a subtype of AbstractDiskArray #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

lupemba
Copy link
Contributor

@lupemba lupemba commented May 3, 2025

This PR requires JuliaIO/DiskArrays.jl#255 and JuliaIO/DiskArrays.jl#260

See #32 for background.

This is a draft of how we can make AbstractVariable <: AbstractDiskArray plus how to use the AbstractSubDiskArray for views.

This draft does not address:

  • Performance
  • (solved) Documentation
  • PermutedDiskArray

This PR will probably contain a few breaking changes but most things will behave identically. One notable change is that SubVariable is not longer a subtype of AbstractVariable.

This update will requirer packages that implement the CommonDataModel interface to stop defining getindex and setindex. Instead they should define DiskArrays.readblock!and DiskArrays.writeblock! which most of them already do.

@@ -462,6 +462,9 @@ Base.BroadcastStyle(::Type{<:ReducedGroupedVariable}) = ReducedGroupedVariableSt
Base.BroadcastStyle(::DefaultArrayStyle,::ReducedGroupedVariableStyle) = ReducedGroupedVariableStyle()
Base.BroadcastStyle(::ReducedGroupedVariableStyle,::DefaultArrayStyle) = ReducedGroupedVariableStyle()

Base.BroadcastStyle(::DiskArrays.ChunkStyle,::ReducedGroupedVariableStyle) = ReducedGroupedVariableStyle()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReducedGroupedVariable use different broadcasting than DiskArrays. This might lead to some confusion.

@@ -203,7 +203,7 @@ Defines and return the variable in the data set `ds`
copied from the variable `src`. The dimension name, attributes
and data are copied from `src` as well as the variable name (unless provide by `name`).
"""
function defVar(dest::AbstractDataset,varname::SymbolOrString,srcvar::AbstractVariable; kwargs...)
function defVar(dest::AbstractDataset,varname::SymbolOrString,srcvar::Union{AbstractVariable, SubVariable}; kwargs...)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated ::AbstractVariable to ::Union{AbstractVariable, SubVariable} the places I thought it made sense but I might have missed some.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't SubVariable <: AbstractVariable anyway ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is indeed the case

struct SubVariable{T,N,TA,TI,TAttrib,TV} <: AbstractVariable{T,N}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, see the changes to types.jl

https://github.com/JuliaGeo/CommonDataModel.jl/pull/35/files#diff-525588e68b2421901be164965272940effa093de733a3fd36c4b9a4344b8c20cR55

SubVariable is no longer a subtype of AbstractVariable but is now a subtype of AbstractSubDiskArray

struct SubVariable{T,N,P,I,L} <: DiskArrays.AbstractSubDiskArray{T,N,P,I,L}

This is done to use the DiskArray implementation of views as discussed in JuliaGeo/NCDatasets.jl#274

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have add ::Union{AbstractVariable, SubVariable} all the places needed to extend the following functions for SubVariable: filter, coord, ancillaryvariables, groupby, select. I think that should cover the public interface of CommonDataModel

@@ -310,6 +310,20 @@ function Base.show(io::IO,v::AbstractVariable)
end


function DiskArrays.haschunks(v::AbstractVariable)
storage, chunksizes = chunking(v)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried reuse the exiting chunking method.

@lupemba
Copy link
Contributor Author

lupemba commented May 3, 2025

@rafaqz @tiemvanderdeure @Alexander-Barth,
Here is my work so far. The tests passes when I add the DiskArrays changes from JuliaIO/DiskArrays.jl#255 and JuliaIO/DiskArrays.jl#260

@rafaqz
Copy link
Member

rafaqz commented May 3, 2025

Great, we should get those DiskArrays PRs in then (sorry got distracted with other things)

@tiemvanderdeure
Copy link
Contributor

Yeah we just need JuliaIO/DiskArrays.jl#249 and then we can merge JuliaIO/DiskArrays.jl#255. I think both are pretty much ready to merge?

@rafaqz
Copy link
Member

rafaqz commented May 5, 2025

I just need to fix views in 249

@lupemba lupemba marked this pull request as draft June 23, 2025 08:55
aout,
indexes::Vararg{OrdinalRange, N}) where {T,N}

aout .= Base.getindex(gr,indexes...)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently Base.getindex is defined for ReducedGroupedVariable so the DiskArray.jl implementation is not used. An alternative would be to use the DiskArray getindex and implement more logic in readblock!. This is more ReducedGroupedVariable and I can not see any benefit of doing it right now.

@lupemba lupemba marked this pull request as ready for review June 27, 2025 12:20
@lupemba
Copy link
Contributor Author

lupemba commented Jun 27, 2025

@rafaqz @tiemvanderdeure @Alexander-Barth,

This PR is now ready for review. There is still some unresolved comments that we will have to discuss before merging the PR.
I have made draft PRs to all the "datasets" packages to test that the this update will work.

Most of the changes are small but TIFFDatasets required a bit more work since it was not using DiskArrays.

I think the path forward should be.

  • Review this PR and solve outstanding comments.
  • Do some minor performance tests
  • Merge this PR and release CommonDataModel.jl v0.4
  • Review NCDatasets.jl PR and rerun CI.
  • Release new NCDatasets.jl version. Some of the other packages uses NCDatasets as a test dependency.
  • Update the remaining packages.

@lupemba
Copy link
Contributor Author

lupemba commented Jun 27, 2025

The CI / Documentation is failing because https://psl.noaa.gov/thredds/fileServer/Datasets/noaa.oisst.v2.highres/sst.day.mean.2023.nc is unavailable at the moment.

@felixcremer
Copy link
Member

Is this PR also closing #8 or is there anything else left for full DiskArrays compatibility?
I guess this also supersedes #9 as a way to achieve the same goal.

Is there something I could help with pushing this forward?

@lupemba
Copy link
Contributor Author

lupemba commented Aug 11, 2025

Hi, I just did some minor adjustments and added some more tests.
I think that the PR is now in a good condition and we should be able to move forward. The next step will be to get an approval from @Alexander-Barth, squash merge the PR and then make a new breaking release. Once it is done we can update all the related PRs for the dataset packages and handle any bugs that might pop up. The majority of users will only be affected once we release the new dataset packages.

Note that the Documentation works with JuliaGeo/NCDatasets.jl#279

Is there something I could help with pushing this forward?

@felixcremer, some performance tests would be good. I haven't really done any yet. There are some tests defined in commondatamodel/test/perf but they require linux and sudo access. Also suggestions for extra tests to increase code coverage is welcome.

@felixcremer
Copy link
Member

I just started playing around with it in Rasters and this will also need some overhaul in Rasters.jl to bring it to the new behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants