Skip to content

Commit f589cf0

Browse files
committed
Updated page "Ho do I"
1 parent 42ca5ed commit f589cf0

File tree

1 file changed

+125
-5
lines changed

1 file changed

+125
-5
lines changed

docs/howdoi.md

Lines changed: 125 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,135 @@
11
# How do I ...
22

3-
## ... speed-up/parallelize datacube creation
4-
53
## ... create datacubes from a directory of GeoTIFFs
64

5+
Files with GeoTIFF format cannot be opened directly by `zappend` unless
6+
you add [rioxarray](https://corteva.github.io/rioxarray/) to your
7+
Python environment.
8+
9+
Then write your own [slice source](guide.md#slice-sources) and
10+
use configuration setting [`slice_source`](config.md#slice_source):
11+
12+
```python
13+
import glob
14+
import numpy as np
15+
import rioxarray as rxr
16+
import xarray as xr
17+
from zappend.api import zappend
18+
19+
def get_dataset_from_geotiff(tiff_path):
20+
ds = rxr.open_rasterio(tiff_path)
21+
# Add missing time dimension
22+
slice_time = get_slice_time(tiff_path)
23+
slice_ds = ds.expand_dims("time", axis=0)
24+
slice_ds.coords["time"] = xr.Dataset(np.array([slice_time]), dims="time")
25+
try:
26+
yield slice_ds
27+
finally:
28+
ds.close()
29+
30+
zappend(sorted(glob.glob("inputs/*.tif")),
31+
slice_source=get_dataset_from_geotiff,
32+
target_dir="output/tif-cube.zarr")
33+
```
34+
35+
In the example above, function `get_slice_time()` returns the time label
36+
of a given GeoTIFF file as a value of type `np.datetime64`.
37+
38+
## ... create datacubes from datasets without append dimension
39+
40+
`zappend` expects the append dimension to exist in slice datasets and
41+
expects that at least one variable exists that makes use of that dimension.
42+
For example, if you are appending spatial 2-d images with dimensions x and y
43+
along a dimension time, you need to first expand the images into the time
44+
dimension. Here the 2-d image dataset is called `image_ds` and `slice_time`
45+
is its associated time value of type `np.datetime64`.
46+
47+
```python
48+
slice_ds = image_ds.expand_dims("time", axis=0)
49+
slice_ds.coords["time"] = xr.Dataset(np.array([slice_time]), dims="time")
50+
```
51+
52+
See also [How do I create datacubes from a directory of GeoTIFFs](#create-datacubes-from-a-directory-of-geotiffs)
53+
above.
54+
755
## ... dynamically update global metadata attributes
856

9-
## ... avoid cumulative memory usage
57+
Refer to section about [target attributes](guide.md#attributes)
58+
in the user guide how to use dynamic attributes.
59+
60+
## ... find out what is limiting the performance
61+
62+
Use the [logging](guide.md#logging) configuration see which processing steps
63+
use most of the time.
64+
Use the [profiling](guide.md#profiling) configuration to inspect in more
65+
detail which parts of the processing are the bottlenecks.
66+
67+
## ... speed-up/parallelize datacube creation
68+
69+
TODO
1070

1171
## ... mitigate network data transfer becoming a bottleneck
1272

13-
## ... find out what is limiting the performance
73+
TODO
74+
75+
## ... avoid cumulative memory usage
76+
77+
TODO
78+
79+
## ... write a log file
80+
81+
Use the following [logging](guide.md#logging) configuration:
82+
83+
```json
84+
{
85+
"logging": {
86+
"version": 1,
87+
"formatters": {
88+
"normal": {
89+
"format": "%(asctime)s %(levelname)s %(message)s",
90+
"style": "%"
91+
}
92+
},
93+
"handlers": {
94+
"console": {
95+
"class": "logging.StreamHandler",
96+
"formatter": "normal"
97+
},
98+
"file": {
99+
"class": "logging.FileHandler",
100+
"formatter": "normal",
101+
"filename": "zappend.log",
102+
"mode": "w",
103+
"encoding": "utf-8"
104+
}
105+
106+
},
107+
"loggers": {
108+
"zappend": {
109+
"level": "INFO",
110+
"handlers": ["console", "file"]
111+
}
112+
}
113+
}
114+
}
115+
```
116+
117+
## ... address common errors
118+
119+
### Error `Target parent directory does not exist`
120+
121+
For security reasons, `zappend` does not create target directories
122+
automatically. You should make sure the parent directory exist before
123+
calling `zappend`.
124+
125+
### Error `Target is locked`
126+
127+
In this case the target lock file still exists, which means that a former
128+
rollback did not complete nominally. You can no longer trust the integrity of
129+
any existing target dataset. The recommended way is to remove the lock file
130+
and any target datasets artifact. You can do that manually or use the
131+
configuration setting `force_new`.
132+
133+
### Error `Append dimension 'foo' not found in dataset`
14134

15-
## ... create a log file
135+
Refer to [How do I create datacubes from datasets without append dimension](#create-datacubes-from-datasets-without-append-dimension).

0 commit comments

Comments
 (0)