Skip to content

Commit 82d9954

Browse files
authored
Merge pull request #481 from JuliaDataCubes/la/new_dim
new dim fix
2 parents b0da856 + cfff98d commit 82d9954

File tree

2 files changed

+108
-8
lines changed

2 files changed

+108
-8
lines changed

docs/src/UserGuide/read.md

Lines changed: 91 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,17 @@
22

33
This section describes how to read files, URLs, and directories into YAXArrays and datasets.
44

5-
## Read Zarr
5+
## open_dataset
6+
7+
The usual method for reading any format is using this function. See its `docstring` for more information.
8+
9+
````@docs
10+
open_dataset
11+
````
12+
13+
Now, let's explore different examples.
14+
15+
### Read Zarr
616

717
Open a Zarr store as a `Dataset`:
818

@@ -23,7 +33,7 @@ Individual arrays can be accessed using subsetting:
2333
ds.tas
2434
````
2535

26-
## Read NetCDF
36+
### Read NetCDF
2737

2838
Open a NetCDF file as a `Dataset`:
2939

@@ -55,7 +65,7 @@ end
5565

5666
This code will ensure that the data is only accessed by one thread at a time, i.e. making it actual single-threaded but thread-safe.
5767

58-
## Read GDAL (GeoTIFF, GeoJSON)
68+
### Read GDAL (GeoTIFF, GeoJSON)
5969

6070
All GDAL compatible files can be read as a `YAXArrays.Dataset` after loading [ArchGDAL](https://yeesian.com/ArchGDAL.jl/latest/):
6171

@@ -68,11 +78,11 @@ path = download("https://github.com/yeesian/ArchGDALDatasets/raw/307f8f0e584a39a
6878
ds = open_dataset(path)
6979
````
7080

71-
## Load data into memory
81+
### Load data into memory
7282

7383
For datasets or variables that could fit in RAM, you might want to load them completely into memory. This can be done using the `readcubedata` function. As an example, let's use the NetCDF workflow; the same should be true for other cases.
7484

75-
### readcubedata
85+
#### readcubedata
7686

7787
:::tabs
7888

@@ -99,4 +109,79 @@ ds_loaded["tos"] # Load the variable of interest; the loaded status is shown for
99109

100110
:::
101111

102-
Note how the loading status changes from `loaded lazily` to `loaded in memory`.
112+
Note how the loading status changes from `loaded lazily` to `loaded in memory`.
113+
114+
## open_mfdataset
115+
116+
There are situations when we would like to open and concatenate a list of dataset paths along a certain dimension. For example, to concatenate a list of `NetCDF` files along a new `time` dimension, one can use:
117+
118+
::: details creation of NetCDF files
119+
120+
````@example open_list_netcdf
121+
using YAXArrays, NetCDF, Dates
122+
using YAXArrays: YAXArrays as YAX
123+
124+
dates_1 = [Date(2020, 1, 1) + Dates.Day(i) for i in 1:3]
125+
dates_2 = [Date(2020, 1, 4) + Dates.Day(i) for i in 1:3]
126+
127+
a1 = YAXArray((lon(1:5), lat(1:7)), rand(5, 7))
128+
a2 = YAXArray((lon(1:5), lat(1:7)), rand(5, 7))
129+
130+
a3 = YAXArray((lon(1:5), lat(1:7), YAX.time(dates_1)), rand(5, 7, 3))
131+
a4 = YAXArray((lon(1:5), lat(1:7), YAX.time(dates_2)), rand(5, 7, 3))
132+
133+
savecube(a1, "a1.nc")
134+
savecube(a2, "a2.nc")
135+
savecube(a3, "a3.nc")
136+
savecube(a4, "a4.nc")
137+
````
138+
:::
139+
140+
### along a new dimension
141+
142+
````@example open_list_netcdf
143+
using YAXArrays, NetCDF, Dates
144+
using YAXArrays: YAXArrays as YAX
145+
import DimensionalData as DD
146+
147+
files = ["a1.nc", "a2.nc"]
148+
149+
dates_read = [Date(2024, 1, 1) + Dates.Day(i) for i in 1:2]
150+
ds = open_mfdataset(DD.DimArray(files, YAX.time(dates_read)))
151+
````
152+
153+
and even opening files along a new `Time` dimension that already have a `time` dimension
154+
155+
````@example open_list_netcdf
156+
files = ["a3.nc", "a4.nc"]
157+
ds = open_mfdataset(DD.DimArray(files, YAX.Time(dates_read)))
158+
````
159+
160+
Note that opening along a new dimension name without specifying values also works; however, it defaults to `1:length(files)` for the dimension values.
161+
162+
````@example open_list_netcdf
163+
files = ["a1.nc", "a2.nc"]
164+
ds = open_mfdataset(DD.DimArray(files, YAX.time))
165+
````
166+
167+
### along a existing dimension
168+
169+
Another use case is when we want to open files along an existing dimension. In this case, `open_mfdataset` will concatenate the paths along the specified dimension
170+
171+
````@example open_list_netcdf
172+
using YAXArrays, NetCDF, Dates
173+
using YAXArrays: YAXArrays as YAX
174+
import DimensionalData as DD
175+
176+
files = ["a3.nc", "a4.nc"]
177+
178+
ds = open_mfdataset(DD.DimArray(files, YAX.time()))
179+
````
180+
181+
where the contents of the `time` dimension are the merged values from both files
182+
183+
````@ansi open_list_netcdf
184+
ds["time"]
185+
````
186+
187+
providing us with a wide range of options to work with.

src/DatasetAPI/Datasets.jl

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -348,7 +348,11 @@ open_mfdataset(g::Vector{<:AbstractString}; kwargs...) =
348348
merge_datasets(map(i -> open_dataset(i; kwargs...), g))
349349

350350
function merge_new_axis(alldatasets, firstcube,var,mergedim)
351-
newdim = DD.rebuild(mergedim,1:length(alldatasets))
351+
newdim = if !(typeof(DD.lookup(mergedim)) <: DD.NoLookup)
352+
DD.rebuild(mergedim, DD.val(mergedim))
353+
else
354+
DD.rebuild(mergedim, 1:length(alldatasets))
355+
end
352356
alldiskarrays = map(ds->ds.cubes[var].data,alldatasets).data
353357
newda = diskstack(alldiskarrays)
354358
newdims = (DD.dims(firstcube)...,newdim)
@@ -407,10 +411,21 @@ end
407411

408412

409413
"""
410-
open_dataset(g; driver=:all)
414+
open_dataset(g; skip_keys=(), driver=:all)
411415
412416
Open the dataset at `g` with the given `driver`.
413417
The default driver will search for available drivers and tries to detect the useable driver from the filename extension.
418+
419+
### Keyword arguments
420+
421+
- `skip_keys` are passed as symbols, i.e., `skip_keys = (:a, :b)`
422+
- `driver=:all`, common options are `:netcdf` or `:zarr`.
423+
424+
Example:
425+
426+
````julia
427+
ds = open_dataset(f, driver=:zarr, skip_keys = (:c,))
428+
````
414429
"""
415430
function open_dataset(g; skip_keys=(), driver = :all)
416431
str_skipkeys = string.(skip_keys)

0 commit comments

Comments
 (0)