Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
3fc7889
Initial dim slicing: WIP no groups handling, Slicer untested as yet.
pp-mo Mar 6, 2025
8c05e8f
Start of indexer testing (WIP: incomplete).
pp-mo Mar 25, 2025
43b8056
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 2, 2025
2eddbab
Small improvements.
pp-mo Sep 3, 2025
a407070
Generalise testing + extend to Slicer tests.
pp-mo Sep 3, 2025
d01e707
More tests; small fixes; more docs and api-docs.
pp-mo Sep 7, 2025
05b5071
Add whatsnew.
pp-mo Sep 7, 2025
f1ce8fe
Add whastnew for the new utilities page.
pp-mo Sep 7, 2025
96f1bde
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 7, 2025
4413ea0
Merge branch 'main' into dim_slicer
pp-mo Sep 8, 2025
7e6f028
Define .slicer for Ncdata; support full == checking on datasets and v…
pp-mo Oct 2, 2025
fc7af0f
Simplify usage modes and API of indexing utilities.
pp-mo Oct 2, 2025
5c59ad1
Fix tests for simplified indexing features.
pp-mo Oct 2, 2025
e5c7d29
Add tests for new core object methods.
pp-mo Oct 2, 2025
77a2ed9
Fix docstrings + doctests.
pp-mo Oct 2, 2025
e4524d6
Replace dataset difference with equality tests in examples.
pp-mo Oct 2, 2025
18728a4
Document relation between equality testing and difference utilities.
pp-mo Oct 2, 2025
73c3733
Added whatsnew for dataset/variable equality support.
pp-mo Oct 2, 2025
28e8518
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 2, 2025
23b39e2
Merge branch 'main' into dim_slicer
pp-mo Oct 2, 2025
d4fc389
Don't keep fragments when building html.
pp-mo Oct 2, 2025
6ddfea8
Improved indexing whatsnew.
pp-mo Oct 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/changelog_fragments/161.doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Added a `userguide page <userdocs/user_guide/utilities.html>`_ summarising all the utility features in :mod:`ncdata.utils`.
1 change: 1 addition & 0 deletions docs/changelog_fragments/68.feat.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Added utilities to extract sub-regions by indexing on dimensions: :func:`~ncdata.utils.index_by_dimensions` and :class:`~ncdata.utils.Slicer`.
3 changes: 3 additions & 0 deletions docs/userdocs/user_guide/common_operations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ Example :
The utility function :func:`~ncdata.utils.rename_dimension` is provided for this.
See : :ref:`howto_rename_dimension`.

.. _copy_notes:

Copying
-------
All core objects support a ``.copy()`` method. See for instance
Expand Down Expand Up @@ -132,6 +134,7 @@ comprehensive and may be very costly for instance comparing large data arrays, b
also allow more nuanced and controllable checking, e.g. to skip data array comparisons
or ignore variable ordering.

.. _object_creation:

Object Creation
---------------
Expand Down
49 changes: 48 additions & 1 deletion docs/userdocs/user_guide/howtos.rst
Original file line number Diff line number Diff line change
Expand Up @@ -288,7 +288,7 @@ attribute already exists or not.
.. Note::

Assigning attributes when *creating* a dataset, variable or group is somewhat
simpler, discussed :ref:`here <todo>`.
simpler, discussed :ref:`here <object_creation>`.


.. _howto_create_variable:
Expand Down Expand Up @@ -356,6 +356,53 @@ It can be freely overwritten by the user.
valid dimensions, and that ``.data`` arrays match the dimensions.


.. howto_copy:

Make a copy of data
-------------------
Use the :meth:`ncdata.NcData.copy` method to make a copy.

.. testsetup::

>>> from ncdata.utils import dataset_differences

.. doctest::

>>> data2 = data.copy()
>>> assert dataset_differences(data, data2) == []

Note that this creates all-new independent ncdata objects, but all variable data arrays
will be linked to the originals (to avoid making copies).

See: :ref:`copy_notes`

.. howto_slice:

Extract a subsection by indexing
--------------------------------
The neatest way is usually to use a :class:`~ncdata.utils.Slicer`.

.. testsetup::

>>> from ncdata import NcData, NcDimension
>>> from ncdata.utils import Slicer
>>> full_data = NcData(dimensions=[NcDimension("time", 5), NcDimension("level", 6), NcDimension("z", 3)])
>>> for nn, dim in full_data.dimensions.items():
... full_data.variables.add(NcVariable(nn, dimensions=[nn], data=np.arange(dim.size)))

.. doctest::

>>> slice_TLZ = Slicer(full_data, ["time", "level", "z"])
>>> data_region = slice_TLZ[:3, 1::2, 2]

.. doctest::

>>> print({nn: full_data.variables[nn].data for nn in full_data.dimensions})
{'time': array([0, 1, 2, 3, 4]), 'level': array([0, 1, 2, 3, 4, 5]), 'z': array([0, 1, 2])}
>>> print({nn: data_region.variables[nn].data for nn in data_region.dimensions})
{'time': array([0, 1, 2]), 'level': array([1, 3, 5])}


Read data from a NetCDF file
----------------------------
Use the :func:`ncdata.netcdf4.from_nc4` function to load a dataset from a netCDF file.
Expand Down
1 change: 1 addition & 0 deletions docs/userdocs/user_guide/user_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,6 @@ Detailed explanations, beyond the basic tutorial-style introductions
design_principles
data_objects
common_operations
utilities
general_topics
howtos
148 changes: 148 additions & 0 deletions docs/userdocs/user_guide/utilities.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
Utilities and Conveniences
==========================
This section provide a short overview of various more involved operations which are
provided in the :mod:`~ncdata.utils` module. In all cases, more detail is available in
the `API pages <../../details/api/ncdata.utils.html>`_

Rename Dimensions
-----------------
The :func:`~ncdata.utils.rename_dimension` utility does this, in a way which ensures a
safe and consistent result.

Dataset Equality Testing
------------------------
The function :func:`~ncdata.utils.dataset_differences` produces a list of messages
detailing all the ways in which two datasets are different.

For Example:
^^^^^^^^^^^^
.. testsetup::

>>> from ncdata import NcData, NcDimension, NcVariable
>>> from ncdata.utils import dataset_differences
>>> import numpy as np

.. doctest::

>>> data1 = NcData(
... dimensions=[NcDimension("x", 5)],
... variables=[NcVariable("vx", dimensions=["x"], data=np.arange(5))]
... )
>>> data2 = data1.copy()
>>> print(dataset_differences(data1, data2))
[]

.. doctest::

>>> data2.dimensions["x"].unlimited = True
>>> data2.variables["vx"].data = np.array([1, 3]) # NB must be a *new* array !

.. doctest::

>>> diffs = dataset_differences(data1, data2)
>>> for msg in diffs:
... print(msg)
Dataset "x" dimension has different "unlimited" status : False != True
Dataset variable "vx" shapes differ : (5,) != (2,)

.. note::
To compare isolated variables, a subsidiary routine
:func:`~ncdata.utils.variable_differences` is also provided.

Sub-indexing
------------
A new dataset can be derived by indexing over dimensions, analagous to sub-indexing
an array. This operation indexes all the variables appropriately, to produce a new
independent dataset which is complete and self-consistent.

The function :func:`~ncdata.utils.index_by_dimensions` provides indexing where the
indices are passed as arguments or keywords for the specific dimensions.

For example:

.. testsetup::

>>> from ncdata.utils import index_by_dimensions

.. doctest::

>>> data = NcData(
... dimensions=[NcDimension("y", 4), NcDimension("x", 10)],
... variables=[NcVariable(
... "v1", dimensions=["y", "x"],
... data=np.arange(40).reshape((4, 10))
... )]
... )

.. doctest::

>>> subdata = index_by_dimensions(data, y=2, x=slice(None, 4))
>>> print(subdata)
<NcData: <'no-name'>
dimensions:
x = 4
<BLANKLINE>
variables:
<NcVariable(int64): v1(x)>
>
>>> print(subdata.variables["v1"].data)
[20 21 22 23]

Slicing syntax
^^^^^^^^^^^^^^
The :class:`~ncdata.utils.Slicer` class is provided to enable the same operation to be
expressed using multi-dimensional slicing syntax.

So for example, the above is more neatly expressed like this ...

.. testsetup::

>>> from ncdata.utils import Slicer

.. doctest::

>>> data_slicer = Slicer(data, ["y", "x"])
>>> subdata2 = data_slicer[2, :4]

.. doctest::

>>> dataset_differences(subdata, subdata2) == []
True


Consistency Checking
--------------------
The :func:`~ncdata.utils.save_errors` function provides a general
correctness-and-consistency check.

For example:

.. testsetup::

>>> from ncdata.utils import save_errors

.. doctest::

>>> data_bad = data.copy()
>>> array = data_bad.variables["v1"].data
>>> data_bad.variables["v1"].data = array[:2]
>>> data_bad.variables.add(NcVariable("q", data={"x": 4}))

.. doctest::

>>> for msg in save_errors(data_bad):
... print(msg)
Variable 'v1' data shape = (2, 10), does not match that of its dimensions = (4, 10).
Variable 'q' has a dtype which cannot be saved to netcdf : dtype('O').


See : :ref:`correctness-checks`


Data Copying
------------
The :func:`~ncdata.utils.ncdata_copy` makes structural copies of datasets.
However, this can be easily be accessed as :meth:`ncdata.NcData.copy`, which is the same
operation.

See: :ref:`copy_notes`
3 changes: 3 additions & 0 deletions lib/ncdata/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,14 @@

from ._compare_nc_datasets import dataset_differences, variable_differences
from ._copy import ncdata_copy
from ._dim_indexing import Slicer, index_by_dimensions
from ._rename_dim import rename_dimension
from ._save_errors import save_errors

__all__ = [
"Slicer",
"dataset_differences",
"index_by_dimensions",
"ncdata_copy",
"rename_dimension",
"save_errors",
Expand Down
Loading
Loading