Skip to content

Commit 65d8263

Browse files
tasansaldmitriyrepinBrianMichellDmitriy Repin
authored
MDIO v1 beta release (TGSAI#641)
* update project metadata and deps * update project metadata and deps * add schemas * Relocate quickstart notebook to tutorials directory * update docs dependencies * add new docs * remove incorrect exclude * remove duplicate doc directive * fix creation notebook * Add basic unit test for v1 dataset schema validation * update lockfile * fix broken creation nb * update lockfile * lint v1 files * update lock file * schema_v1-dataset_builder-add_dimension * V1 schema review (TGSAI#553) * Update from list to discrete values for coordinate metadata * Add docs to help users understand difference * Update docs and fix case sensitivity. * Linting * Add CoordianteMetadata to docs * First take on add_dimension(), add_coordinate(), add_variable() * Finished add_dimension, add_coordinate, add_variable * Work on build * Generalize _to_dictionary() * build * [v1] Update dependencies to latest (TGSAI#567) * Update dependencies to latest versions * Update linter type-checking code to 'TC' in pyproject.toml https://astral.sh/blog/ruff-v0.8.0#new-error-codes-for-flake8-type-checking-rules * Refactor: Move Zarr codec imports to top-level * disable safety in CI (temporary) * Refactor: Replace Zarr codec imports with numcodecs equivalents * Refactor: Remove unused numcodecs imports and related methods * pin zarr due to zarr 3.0.9 bug * Dataset Build - pass one * unpin zarr because breaking bug fixed (TGSAI#569) * Revert .container changes * PR review: remove DEVELOPER_NOTES.md * PR Review: add_coordinate() should accept only data_type: ScalarType * PR review: add_variable() data_type remove default * RE review: do not add dimension variable * PR Review: get api version from the package version * PR Review: remove add_dimension_coordinate * PR Review: add_coordinate() remove data_type default value * PR Review: improve unit tests by extracting common functionality in validate* functions * Remove the Dockerfile changes. They are not supposed to be a part of this PR * PR Review: run ruff * PR Review: fix pre-commit errors * remove some noqa overrides * Implement MDIO Dataset builder to create in-memory instance of schemas.v1.dataset.Dataset (TGSAI#568) * schema_v1-dataset_builder-add_dimension * First take on add_dimension(), add_coordinate(), add_variable() * Finished add_dimension, add_coordinate, add_variable * Work on build * Generalize _to_dictionary() * build * Dataset Build - pass one * Revert .container changes * PR review: remove DEVELOPER_NOTES.md * PR Review: add_coordinate() should accept only data_type: ScalarType * PR review: add_variable() data_type remove default * RE review: do not add dimension variable * PR Review: get api version from the package version * PR Review: remove add_dimension_coordinate * PR Review: add_coordinate() remove data_type default value * PR Review: improve unit tests by extracting common functionality in validate* functions * Remove the Dockerfile changes. They are not supposed to be a part of this PR * PR Review: run ruff * PR Review: fix pre-commit errors * remove some noqa overrides --------- Co-authored-by: Altay Sansal <[email protected]> * Writing XArray / Zarr * gitignore * to_zarr() fix compression * Fix precommit issues * Use only make_campos_3d_acceptance_dataset * PR Review: address the review comments * Update _get_fill_value for StructuredType * Fix fill type issue for the Structured Types * Improve code coverage * Fix spelling * Revert "Fix spelling" This reverts commit 0447659. * extend per-file ignores for PLR2004 and remove noqa overrides in specific tests * Refactor tests: clarify Zarr-related test names, fix type hints, and clean unused `# noqa` comments. * MDIO v1 Templates and Template Registry (TGSAI#573) * Templates and TemplateRegistry * Fix pre-commit issues * Rever dev container changes * PR Review: address issues * PR Review: register default templates at registry initialization * update deps * address issues with VS Code dev containers (see issue 559) (TGSAI#576) * Templates and TemplateRegistry * Fix pre-commit issues * Rever dev container changes * PR Review: address issues * PR Review: register default templates at registry initialization * Dockerfile.dev --------- Co-authored-by: Altay Sansal <[email protected]> * segy_to_mdio_v1 (TGSAI#577) * Templates and TemplateRegistry * Fix pre-commit issues * Rever dev container changes * PR Review: address issues * PR Review: register default templates at registry initialization * Dockerfile.dev * segy_to_mdio_v1 * Clean up * Prototype review notes * Add dev comment * Add notes that will be deleted later * segy_to_mdio_v1 pass 1 * indexing_v1 and blocked_io_v1 * Remove DEV notes * Clean up * Document bug location * Work around IndexError * Clean temporary code * More clean up * Remove *_1 infrastructure files * Restore accidently removed dask.array * Created an issue reproducer * Make the required template properties public * Simplify type converter * Improve templates * Move test_type_converter.py * Move test_type_converter.py * Revert to use the original grid * Integrate segy_to_mdio_v1_customized, fix indexing * Add dimension coordinates in tem,plates * Write statistics to Zarr * Delete factory_v1.py * Complete integrationtest. Fix coordinates * Fir pre-commit errors * PR review: fix trace_worker docstring * Review: address some of the issue * Fix bug * dding todo for sum squares calculation * Refactor ChunkIterator * Refactor ChunkIterator into ChunkIteratorV1 * Remove segy_to_mdio_v1_customized, dataset_serializer.to_zarr * Add support for trace headers without using _FillValue * Use StorageLocation in trace_worker_v1 * Fix statistics attribute name * PR review changes * PR Improvements: do a single write * TODO: chunked write for non-dimensional coordinates and trace_mask * Update StorageLocation to use fsspec * Reformat with pre-commit * Use domain name in get_grid_plan * Fix non-dim coords and chunk_samples=False * Convert test_3d_import_v1 to V1 * Fix test_meta_dataset_read * remove whitespace * clean up comments * update deps in lockfile * simplify dim and non-dim coordinate handling after scan * remove compatibility tests * add filtering capability to header worker * accept subset filter to pass to workers * make v1 grid planner awesome * double to single underscores in test names * fix broken test harnesses due to incorrect Sequence import * clean up dev comment * clean up whitespace * use new module name * clean up segy_to_mdio_v1 * fix whitespace and remove unnecessary list call * these are defined as float64 in template Previous check was passing due to an error in assignment during creation of coordinate variables * fix missing dimension coordinate for vertical axis * fix incorrect dtype comparison for time variable * simplify and fix critical bugs * rename v1 out of things and get rid of old code * remove fixed todo * remove more v1 from names * rename chunk iterator * fix dimensionality in tests due to new (missing) vertical dimension coordinate * add todo for numpy ingestion * fix references to non-v1 naming * extract grid operations to its own function * fix typo Co-authored-by: Brian Michell <[email protected]> * add todo for simplifying storage location * Remove no_fill_var_names, add domain var to Seismic3DPreStackShotTemplate * Part 2 of the previous commit * pre-commit formatting * remove dev mount --------- Co-authored-by: Dmitriy Repin <[email protected]> Co-authored-by: Altay Sansal <[email protected]> Co-authored-by: Brian Michell <[email protected]> * Make some integration tests for work with new `segy_to_mdio` (TGSAI#599) * Fix integration import tests * Fix integration import tests * mask_and_scale=False * PR Review * pre-commit * PR Review issues * add todo for headers * update line length limit to 120 in pyproject.toml * compact nested code for improved readability in validation tests * compact coordinate and dimension name definitions in 2D/3D prestack shot templates * refactor names in header validation in SEG-Y export tests * remove v1 suffix * compact code by merging multi-line blocks into single lines where possible * bump prettier to v3.1.0 and remove prettier-plugin-toml * update lock file --------- Co-authored-by: Altay Sansal <[email protected]> * remove developer tests * Serialize text and binary headers (TGSAI#600) * Fix integration import tests * mask_and_scale=False * pre-commit * PR Review issues * serialize-text-and-binary-headers * remove dev test data * add back whitespace * revert import changes * fix attribute initialization in `_add_text_binary_headers` * Add tests * refactor: improve type annotations and docstrings in test utilities * fix formatting * remove redundant `str()` casting in `xr.open_dataset` calls --------- Co-authored-by: Altay Sansal <[email protected]> * shot_point (TGSAI#602) * Add template: Offset + Azimuth binned CDP gathers (COCA) (TGSAI#605) * update helper to support structured types in variable validation * add Seismic3DPreStackCocaTemplate and corresponding unit tests * register Seismic3DPreStackCocaTemplate in template registry * reorganize template registrations in template_registry and remove depth ones from shots. * use registered templates instead of listing them all by hand. * simplify template instantiation in unit tests * fix default templates and add missing ones * refactor default template assertions using shared constant * Eager memory allocation fix (TGSAI#609) * Implement fixes to ensure lazy allocation of data arrays on serialization * Avoid unnecessary copies of data in memory * Linting * Eliminate immediate overwrite of `data` bug * Remove unused import * Set appropriate fill value for lazy arrays * Clean up header value handler * Resolve data serialization issues * Ensure all encodings are captured * Simplify dataset coordinate population logic by removing unused imports and redundant variable handling * Refactor `_workers.py` to streamline variable handling, replace manual Variable creation with direct assignment, and resolve redundant imports. * make better use of grid * fix type hint * make better use of grid * fix(regression): make dataset serialization less eager * update zarr * remove comment --------- Co-authored-by: Altay Sansal <[email protected]> * Fix memory and core utilization regressions * Export functionality for MDIO v1 ingested files (TGSAI#611) * Export part 1 * Enable header value validation * Revert the test names back * Remove Endianness, new_chunks API args and traceDomain, * PR review * lint * create/use new api location and lint * allow configuring opener chunks * clarify xarray open parameters * fix regression of not-opening with native dask re-chunking * fix regression of not-opening with native dask re-chunking * make export rechunker work with named dimension sizes and chunks * make StorageLocation available at library level and update mdio to segy example * pre-open with zarr backend and simplify dataset slicing after lazy loading * better opener docs * more explicit xarray selection * rename trace variable name to default variable name * remove the guard for setting storage options to empty dictionary. new zarr is ok with None. * update lockfile * fix broken tests and inconsistent type hints * clean up comments * clarify binary header scaling * make test names clearer * fix broken unit tests due to storage_options handling --------- Co-authored-by: Altay Sansal <[email protected]> * v1 implementation of AutoChannelWrap grid override (TGSAI#632) * AutoChannelWrap over updated-v1 * Fix test * rename function for new behaviour and improve type hint for grid_overrides * simplify metadata handling * lint * gridOverride is not required * remove unnecessary byte order change, handled upstream. * remove rtol adds, tests pass. * remove expected behaviour comment * clean up tests * use grouped assignments to fix PLR915 * add comments to clarify --------- Co-authored-by: Altay Sansal <[email protected]> * Move to Zarr v3 as default for on disk storage format (TGSAI#630) * remove all zarr v2 refs and fix fill_value attributes * fix codec initialization for zarr3 * use correct kwargs for compressor definition * fix fill value for structs * fix numpy imports * fix creation logic * make numpy import namespace * ensure fill value is correct for structured arrays * fill value all fields * remove legacy test for bug in v2 * fix codec related issues and warning spamming * use UPath instead of StorageLocation and remove all v0 stuff * undo warning suppression for now * remove v0 dataset schema * make immutable metadata tuples, performance optimizations. consistent code styling as well - remove old zarr APIs - Ensure grid attrs (map and live) get compressed properly. - move grid_map slicing to worker from main process - * fix cloud i/o issue (TGSAI#637) * snake-case to camelCase (TGSAI#638) * Fix output URI handling for remote stores (TGSAI#639) * fix output uri handling for remote stores * switch from `as_uri` to `as_posix` for compatibility with xarray * allow legacy v2 support (TGSAI#640) * Reorganize code and simplify schemas and logic everywhere (TGSAI#642) * reorg and simplify * fix comparison of stats * fix regression in dataset attribute serialization * ensure histogram alias is compared correctly * update docs references * fix broken refs * remove top level metadata ref * remove blosc config refs (we now get from zarr) * delete removed stats metadata wrapper * update deps and remove safety - reason for removal: pyupio/safety#673 * fix numpy rng lint errors * exclude lower level members * remove singleton from template registry title * make template registry api ref with autodoc * First pass review and alignment of templates (TGSAI#643) * rename things to be more sensible and add angle gathers configuration to PreStackCdp templates. - add missing 2d test * align shot data template with prod * fix tests for 3d pre-stack shot * remove deleted attribute (processingStage) * rename gatherType for coca * lint and fix 1 bug * rename gather -> ensemble or raw field data * add missing 2d shot * fix docstrings * fix wrong validation namings * Fix ingestion of coordinates without full dimensions (TGSAI#644) * fix correct ingestion for coordinates that don't share all dimensions * add todo for verification of reduced dimensions * Disable unimplemented tests (TGSAI#647) * add todo markers for disabled tests. * set coverage minimum to 85% due to disabled tests * remove todo, it has correct behaviour, also rename .build_dataset `header` to `header_dtype` for clarity (TGSAI#648) * set version to 1.0.0 * unpin hardcoded version from tests --------- Co-authored-by: Dmitriy Repin <[email protected]> Co-authored-by: Brian Michell <[email protected]> Co-authored-by: Dima From Texas <[email protected]> Co-authored-by: Dmitriy Repin <[email protected]>
1 parent 494fb4e commit 65d8263

File tree

108 files changed

+8463
-4372
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

108 files changed

+8463
-4372
lines changed

.devcontainer/Dockerfile.dev

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# USAGE:
2+
# This file will be used by the VS Code DevContainer extension
3+
# to create a development environment for the mdio-python project.
4+
# HOW TO MANUALLY BUILD AND DEBUG THE CONTAINER
5+
# docker build -t mdio-dev -f .devcontainer/Dockerfile.dev .
6+
# docker run -it --rm --entrypoint /bin/bash --name mdio-dev mdio-dev
7+
# NOTES:
8+
# 1. The container will be run as the non-root user 'vscode' with UID 1000.
9+
# 2. The virtual environment will be setup at /home/vscode/.venv
10+
# 3. The project source code will be host-mounted at /workspaces/mdio-python
11+
ARG PYTHON_VERSION="3.13"
12+
ARG LINUX_DISTRO="bookworm"
13+
ARG UV_VERSION="0.6.11"
14+
FROM mcr.microsoft.com/devcontainers/python:1-${PYTHON_VERSION}-${LINUX_DISTRO}
15+
16+
ENV USERNAME="vscode"
17+
USER $USERNAME
18+
19+
COPY --chown=$USERNAME:$USERNAME ./ /workspaces/mdio-python
20+
21+
WORKDIR /workspaces/mdio-python
22+
23+
ARG UV_VERSION
24+
# Install UV as described in https://devblogs.microsoft.com/ise/dockerizing-uv/
25+
RUN python3 -m pip install --no-cache-dir uv==${UV_VERSION}
26+
# Prevent uv from trying to create hard links, which does not work in a container
27+
# that mounts local file systems (e.g. VS Code Dev Containers)
28+
ENV UV_LINK_MODE=copy
29+
# Add path to the site-packages
30+
ENV PYTHONUSERBASE=/home/$USERNAME/.local
31+
ENV PATH="$PYTHONUSERBASE/bin:$PATH"
32+
33+
# Initialize virtual environment in the container
34+
ENV VIRTUAL_ENV="/home/$USERNAME/.venv"
35+
ENV UV_PROJECT_ENVIRONMENT=$VIRTUAL_ENV
36+
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
37+
RUN uv venv $VIRTUAL_ENV
38+
39+
# Install the project in the editable mode
40+
# https://setuptools.pypa.io/en/latest/userguide/development_mode.html
41+
# This allows for live reloading of the code during development
42+
RUN uv pip install -e .
43+
# Install "extras" (development dependencies) in pyproject.toml
44+
RUN uv sync --group dev
45+
# Now one can run:
46+
# pre-commit run --all-files

.devcontainer/devcontainer.json

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,12 @@
22
// README at: https://github.com/devcontainers/templates/tree/main/src/python
33
{
44
"build": {
5-
"dockerfile": "Dockerfile",
5+
"dockerfile": "Dockerfile.dev",
66
"context": ".."
77
},
88
// Use 'postCreateCommand' to run commands after the container is created.
99
"postCreateCommand": {
10-
"post_create_script": "bash ./.devcontainer/post-install.sh"
10+
// "post_create_script": "bash ./.devcontainer/post-install.sh"
1111
},
1212
// Forward 8787 to enable us to view dask dashboard
1313
"forwardPorts": [8787],
@@ -16,8 +16,9 @@
1616
// Configure properties specific to VS Code.
1717
"vscode": {
1818
"settings": {
19-
"python.terminal.activateEnvInCurrentTerminal": true,
20-
"python.defaultInterpreterPath": "/opt/venv/bin/python"
19+
"python.testing.pytestArgs": ["tests"],
20+
"python.testing.unittestEnabled": false,
21+
"python.testing.pytestEnabled": true
2122
},
2223
"extensions": [
2324
"ms-python.python",
@@ -27,17 +28,19 @@
2728
"ms-toolsai.jupyter-renderers",
2829
"vscode-icons-team.vscode-icons",
2930
"wayou.vscode-todo-highlight",
30-
"streetsidesoftware.code-spell-checker"
31+
"streetsidesoftware.code-spell-checker",
32+
"eamodio.gitlens",
33+
"visualstudioexptteam.vscodeintellicode",
34+
"richie5um2.vscode-sort-json"
3135
]
3236
}
3337
},
3438
// Uncomment to connect as root instead. More info: https://aka.ms/dev-containers-non-root.
3539
// "remoteUser": "root",
3640
"updateRemoteUserUID": true,
41+
"workspaceMount": "source=${localWorkspaceFolder},target=/workspaces/mdio-python,type=bind",
42+
"workspaceFolder": "/workspaces/mdio-python",
3743
"mounts": [
38-
// Re-use local Git configuration
39-
"source=${localEnv:HOME}/.gitconfig,target=/home/vscode/.gitconfig_tmp,type=bind,consistency=cached",
40-
"source=${localEnv:HOME}/.gitconfig,target=/root/.gitconfig_tmp,type=bind,consistency=cached",
41-
"source=${localEnv:SCRATCH_DIR}/${localEnv:USER},target=/scratch/,type=bind,consistency=cached"
44+
// "source=${localWorkspaceFolder}/../DATA/,target=/DATA/,type=bind,consistency=cached"
4245
]
4346
}

.github/workflows/tests.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ jobs:
1313
matrix:
1414
include:
1515
- { python: "3.13", os: "ubuntu-latest", session: "pre-commit" }
16-
- { python: "3.13", os: "ubuntu-latest", session: "safety" }
1716
# - { python: "3.13", os: "ubuntu-latest", session: "mypy" }
1817
# - { python: "3.12", os: "ubuntu-latest", session: "mypy" }
1918
# - { python: "3.11", os: "ubuntu-latest", session: "mypy" }

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,6 @@ repos:
4646
stages: [pre-commit, pre-push, manual]
4747
args: [--markdown-linebreak-ext=md]
4848
- repo: https://github.com/pre-commit/mirrors-prettier
49-
rev: v2.7.1
49+
rev: v3.1.0
5050
hooks:
5151
- id: prettier

docs/api_reference.md

Lines changed: 1 addition & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,5 @@
11
# API Reference
22

3-
## Readers / Writers
4-
5-
```{eval-rst}
6-
.. automodule:: mdio.api.accessor
7-
:members:
8-
```
9-
103
## Data Converters
114

125
### Seismic Data
@@ -33,21 +26,10 @@ and
3326
```{eval-rst}
3427
.. automodule:: mdio.converters.segy
3528
:members:
36-
:exclude-members: grid_density_qc, parse_index_types, get_compressor
29+
:exclude-members: grid_density_qc, parse_index_types, get_compressor, populate_dim_coordinates, populate_non_dim_coordinates
3730
3831
.. automodule:: mdio.converters.mdio
3932
:members:
40-
41-
.. automodule:: mdio.converters.numpy
42-
:members:
43-
```
44-
45-
## Convenience Functions
46-
47-
```{eval-rst}
48-
.. automodule:: mdio.api.convenience
49-
:members:
50-
:exclude-members: create_rechunk_plan, write_rechunked_values
5133
```
5234

5335
## Core Functionality
@@ -58,17 +40,3 @@ and
5840
.. automodule:: mdio.core.dimension
5941
:members:
6042
```
61-
62-
### Creation
63-
64-
```{eval-rst}
65-
.. automodule:: mdio.core.factory
66-
:members:
67-
```
68-
69-
### Data I/O
70-
71-
```{eval-rst}
72-
.. automodule:: mdio.core.serialization
73-
:members:
74-
```

docs/conf.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
"sphinx.ext.napoleon",
1818
"sphinx.ext.intersphinx",
1919
"sphinx.ext.autosummary",
20+
"sphinxcontrib.autodoc_pydantic",
2021
"sphinx.ext.autosectionlabel",
2122
"sphinx_click",
2223
"sphinx_copybutton",
@@ -38,6 +39,7 @@
3839
intersphinx_mapping = {
3940
"python": ("https://docs.python.org/3", None),
4041
"numpy": ("https://numpy.org/doc/stable/", None),
42+
"pydantic": ("https://docs.pydantic.dev/latest/", None),
4143
"zarr": ("https://zarr.readthedocs.io/en/stable/", None),
4244
}
4345

@@ -50,6 +52,14 @@
5052
autoclass_content = "class"
5153
autosectionlabel_prefix_document = True
5254

55+
autodoc_pydantic_field_list_validators = False
56+
autodoc_pydantic_field_swap_name_and_alias = True
57+
autodoc_pydantic_field_show_alias = False
58+
autodoc_pydantic_model_show_config_summary = False
59+
autodoc_pydantic_model_show_validator_summary = False
60+
autodoc_pydantic_model_show_validator_members = False
61+
autodoc_pydantic_model_show_field_summary = False
62+
5363
html_theme = "furo"
5464

5565
myst_number_code_blocks = ["python"]

docs/data_models/chunk_grids.md

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
```{eval-rst}
2+
:tocdepth: 3
3+
```
4+
5+
```{currentModule} mdio.builder.schemas.chunk_grid
6+
7+
```
8+
9+
# Chunk Grid Models
10+
11+
```{article-info}
12+
:author: Altay Sansal
13+
:date: "{sub-ref}`today`"
14+
:read-time: "{sub-ref}`wordcount-minutes` min read"
15+
:class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light
16+
```
17+
18+
The variables in MDIO data model can represent different types of chunk grids.
19+
These grids are essential for managing multi-dimensional data arrays efficiently.
20+
In this breakdown, we will explore four distinct data models within the MDIO schema,
21+
each serving a specific purpose in data handling and organization.
22+
23+
MDIO implements data models following the guidelines of the Zarr v3 spec and ZEPs:
24+
25+
- [Zarr core specification (version 3)](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html)
26+
- [ZEP 1 — Zarr specification version 3](https://zarr.dev/zeps/accepted/ZEP0001.html)
27+
- [ZEP 3 — Variable chunking](https://zarr.dev/zeps/draft/ZEP0003.html)
28+
29+
## Regular Grid
30+
31+
The regular grid models are designed to represent a rectangular and regularly
32+
paced chunk grid.
33+
34+
```{eval-rst}
35+
.. autosummary::
36+
RegularChunkGrid
37+
RegularChunkShape
38+
```
39+
40+
For 1D array with `size = 31`{l=python}, we can divide it into 5 equally sized
41+
chunks. Note that the last chunk will be truncated to match the size of the array.
42+
43+
`{ "name": "regular", "configuration": { "chunkShape": [7] } }`{l=json}
44+
45+
Using the above schema resulting array chunks will look like this:
46+
47+
```bash
48+
←─ 7 ─→ ←─ 7 ─→ ←─ 7 ─→ ←─ 7 ─→ ↔ 3
49+
┌───────┬───────┬───────┬───────┬───┐
50+
└───────┴───────┴───────┴───────┴───┘
51+
```
52+
53+
For 2D array with shape `rows, cols = (7, 17)`{l=python}, we can divide it into 9
54+
equally sized chunks.
55+
56+
`{ "name": "regular", "configuration": { "chunkShape": [3, 7] } }`{l=json}
57+
58+
Using the above schema, the resulting 2D array chunks will look like below.
59+
Note that the rows and columns are conceptual and visually not to scale.
60+
61+
```bash
62+
←─ 7 ─→ ←─ 7 ─→ ↔ 3
63+
┌───────┬───────┬───┐
64+
│ ╎ ╎ │ ↑
65+
│ ╎ ╎ │ 3
66+
│ ╎ ╎ │ ↓
67+
├╶╶╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
68+
│ ╎ ╎ │ ↑
69+
│ ╎ ╎ │ 3
70+
│ ╎ ╎ │ ↓
71+
├╶╶╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
72+
│ ╎ ╎ │ ↕ 1
73+
└───────┴───────┴───┘
74+
```
75+
76+
## Rectilinear Grid
77+
78+
The [RectilinearChunkGrid](RectilinearChunkGrid) model extends
79+
the concept of chunk grids to accommodate rectangular and irregularly spaced chunks.
80+
This model is useful in data structures where non-uniform chunk sizes are necessary.
81+
[RectilinearChunkShape](RectilinearChunkShape) specifies the chunk sizes for each
82+
dimension as a list allowing for irregular intervals.
83+
84+
```{eval-rst}
85+
.. autosummary::
86+
RectilinearChunkGrid
87+
RectilinearChunkShape
88+
```
89+
90+
:::{note}
91+
It's important to ensure that the sum of the irregular spacings specified
92+
in the `chunkShape` matches the size of the respective array dimension.
93+
:::
94+
95+
For 1D array with `size = 39`{l=python}, we can divide it into 5 irregular sized
96+
chunks.
97+
98+
`{ "name": "rectilinear", "configuration": { "chunkShape": [[10, 7, 5, 7, 10]] } }`{l=json}
99+
100+
Using the above schema resulting array chunks will look like this:
101+
102+
```bash
103+
←── 10 ──→ ←─ 7 ─→ ← 5 → ←─ 7 ─→ ←── 10 ──→
104+
┌──────────┬───────┬─────┬───────┬──────────┐
105+
└──────────┴───────┴─────┴───────┴──────────┘
106+
```
107+
108+
For 2D array with shape `rows, cols = (7, 25)`{l=python}, we can divide it into 12
109+
rectilinear (rectangular bur irregular) chunks. Note that the rows and columns are
110+
conceptual and visually not to scale.
111+
112+
`{ "name": "rectilinear", "configuration": { "chunkShape": [[3, 1, 3], [10, 5, 7, 3]] } }`{l=json}
113+
114+
```bash
115+
←── 10 ──→ ← 5 → ←─ 7 ─→ ↔ 3
116+
┌──────────┬─────┬───────┬───┐
117+
│ ╎ ╎ ╎ │ ↑
118+
│ ╎ ╎ ╎ │ 3
119+
│ ╎ ╎ ╎ │ ↓
120+
├╶╶╶╶╶╶╶╶╶╶┼╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
121+
│ ╎ ╎ ╎ │ ↕ 1
122+
├╶╶╶╶╶╶╶╶╶╶┼╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
123+
│ ╎ ╎ ╎ │ ↑
124+
│ ╎ ╎ ╎ │ 3
125+
│ ╎ ╎ ╎ │ ↓
126+
└──────────┴─────┴───────┴───┘
127+
```
128+
129+
## Model Reference
130+
131+
:::{dropdown} RegularChunkGrid
132+
:animate: fade-in-slide-down
133+
134+
```{eval-rst}
135+
.. autopydantic_model:: RegularChunkGrid
136+
137+
----------
138+
139+
.. autopydantic_model:: RegularChunkShape
140+
```
141+
142+
:::
143+
:::{dropdown} RectilinearChunkGrid
144+
:animate: fade-in-slide-down
145+
146+
```{eval-rst}
147+
.. autopydantic_model:: RectilinearChunkGrid
148+
149+
----------
150+
151+
.. autopydantic_model:: RectilinearChunkShape
152+
```
153+
154+
:::

0 commit comments

Comments
 (0)