diff --git a/docs/changelog_fragments/68.feat.rst b/docs/changelog_fragments/68.feat.rst index 20eae62..1d8fed0 100644 --- a/docs/changelog_fragments/68.feat.rst +++ b/docs/changelog_fragments/68.feat.rst @@ -3,4 +3,4 @@ The :class:`ncdata.NcData` objects can be indexed with the ``[]`` operation, or specifed dimensions with the :meth:`~ncdata.NcData.slicer` method. This is based on the new :meth:`~ncdata.utils.index_by_dimensions()` utility method and :class:`~ncdata.utils.Slicer` class. -See: :ref:`indexing_overview` \ No newline at end of file +See: :ref:`utils_indexing` \ No newline at end of file diff --git a/docs/conf.py b/docs/conf.py index d9f62c1..32f3f87 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -93,7 +93,13 @@ # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. # This pattern also affects html_static_path and html_extra_path. -exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"] +exclude_patterns = [ + "_build", + "Thumbs.db", + ".DS_Store", + "changelog_fragments", + "details/api/modules.rst", +] # -- Options for HTML output ------------------------------------------------- diff --git a/docs/details/developer_notes.rst b/docs/details/developer_notes.rst index ee27069..d0b052a 100644 --- a/docs/details/developer_notes.rst +++ b/docs/details/developer_notes.rst @@ -11,12 +11,21 @@ A new change-note fragment file should be included in each PR, but is normally c with a ``towncrier`` command-line command: * shortly, with ``towncrier create --content "mynotes..." ..rst`` -* ... or for longer forms, use ``towncrier create --edit``. -* Here, "" is one of feat/doc/bug/dev/misc. Which are: user features; - bug fixes; documentation changes; general developer-relevant changes; - or "miscellaneous". + + ... or, for longer content, use ``towncrier create --edit``. + +* Here, "" is one of: + + * "feat": user features + * "doc": documentation changes + * "bug": bug fixes + * "def": general developer-relevant changes + * "misc": miscellaneous + (For reference, these categories are configured in ``pyproject.toml``). + * the fragment files are stored in ``docs/changelog_fragments``. + * N.B. for this to work well, every change should be identified with a matching github issue. If there are multiple associated PRs, they should all be linked to the issue. @@ -26,17 +35,20 @@ Documentation build For a full docs-build: -* a simple ``$ make html`` will do for now +* The most useful way is simply ``$ cd docs`` and ``$ make html-keeplog``. + * Note: the plainer ``$ make html`` is the same, but "-keeplog", in addition, preserves the + changelog fragments **and** reverts the change_log.rst after the html build: + This stops you accidentally including a "built" changelog when making further commits. * The ``docs/Makefile`` wipes the API docs and invokes sphinx-apidoc for a full rebuild * It also calls towncrier to clear out the changelog fragments + update ``docs/change_log.rst``. - This should be reverted before pushing your PR -- i.e. leave changenotes in the fragments. -* the results is then available at ``docs/_build/html/index.html``. +* ( *assuming "-keeplog"*: fragments and change_notes.rst are then reverted, undoing the towncrier build ). +* the result is then available at ``docs/_build/html/index.html``. .. note:: - * the above is just for *local testing*, if required. + * the above is just for **local testing**, when required. * For PRs (and releases), we also provide *automatic* builds on GitHub, - via `ReadTheDocs `_ + via ReadTheDocs_. Release actions @@ -44,15 +56,15 @@ Release actions #. Update the :ref:`change-log page ` in the details section - #. ensure all major changes + PRs are referenced in the :ref:`change_notes` section. + #. start with ``$ towncrier build`` - * The starting point for this is now just : ``$ towncrier build``. + #. ensure all major changes + PRs are referenced in the :ref:`change_notes` section. #. update the "latest version" stated in the :ref:`development_status` section #. Cut a release on GitHub - * this triggers a new docs version on `ReadTheDocs `_. + * this triggers a new docs version on ReadTheDocs_. #. Build the distribution @@ -109,3 +121,6 @@ Release actions * wait a few hours.. * check that the new version appears in the output of ``$ conda search ncdata`` + + +.. _ReadTheDocs: https://readthedocs.org/projects/ncdata diff --git a/docs/userdocs/user_guide/common_operations.rst b/docs/userdocs/user_guide/common_operations.rst index 0637a69..454b1d3 100644 --- a/docs/userdocs/user_guide/common_operations.rst +++ b/docs/userdocs/user_guide/common_operations.rst @@ -55,6 +55,19 @@ Example : >>> dataset.variables["x"].avals["units"] = "m s-1" + +There is also an :meth:`~ncdata.NameMap.addall` method, which adds multiple content +objects in one operation. + +.. doctest:: python + + >>> vars = [NcVariable(name) for name in ("a", "b", "c")] + >>> dataset.variables.addall(vars) + >>> list(dataset.variables) + ['x', 'a', 'b', 'c'] + +.. _operations_rename: + Rename ------ A component can be renamed with the :meth:`~ncdata.NameMap.rename` method. This changes @@ -67,6 +80,18 @@ Example : >>> dataset.variables.rename("x", "y") +result: + +.. doctest:: python + + >>> print(dataset.variables.get("x")) + None + >>> print(dataset.variables.get("y")) + ): y() + y:units = 'm s-1' + > + + .. warning:: Renaming a dimension will not rename references to it (i.e. in variables), which obviously may cause problems. @@ -123,14 +148,29 @@ Equality Testing ---------------- We implement equality operations ``==`` / ``!=`` for all the core data objects. -However, simple equality testing on :class:`@ncdata.NcData` and :class:`@ncdata.NcVariable` -objects can be very costly if it requires comparing large data arrays. +.. doctest:: + + >>> vA = dataset.variables["a"] + >>> vB = dataset.variables["b"] + >>> vA == vB + False + +.. doctest:: + + >>> dataset == dataset.copy() + True + +.. warning:: + Equality testing for :class:`~ncdata.NcData` and :class:`~ncdata.NcVariable` actually + calls the :func:`ncdata.utils.dataset_differences` and + :func:`ncdata.utils.variable_differences` utilities. + + This can be very costly if it needs to compare large data arrays. If you need to avoid comparing large (and possibly lazy) arrays then you can use the :func:`ncdata.utils.dataset_differences` and -:func:`ncdata.utils.variable_differences` utility functions. -These functions also provide multiple options to enable more tolerant comparison, -such as allowing variables to have a different ordering. +:func:`ncdata.utils.variable_differences` utility functions directly instead. +These provide a ``check_var_data=False`` option, to ignore differences in data content. See: :ref:`utils_equality` diff --git a/docs/userdocs/user_guide/data_objects.rst b/docs/userdocs/user_guide/data_objects.rst index 28b3f77..3b4d914 100644 --- a/docs/userdocs/user_guide/data_objects.rst +++ b/docs/userdocs/user_guide/data_objects.rst @@ -186,7 +186,9 @@ However, for most operations on attributes, it is much easier to use the ``.aval property instead. This accesses *the same attributes*, but in the form of a simple "name: value" dictionary. -Thus for example, to fetch an attribute you would usually write just : +Get attribute value +^^^^^^^^^^^^^^^^^^^ +For example, to fetch an attribute you would usually write just : .. testsetup:: @@ -205,23 +207,15 @@ and **not** : .. doctest:: python - >>> # WRONG: this reads an NcAttribute, not its value + >>> # WRONG: this get the NcAttribute object, not its value >>> unit = dataset.variables["x"].attributes["units"] -or: - -.. doctest:: python - - >>> # WRONG: this gets NcAttribute.value as a character array, not a string + >>> # WRONG: this returns a character array, not a string >>> unit = dataset.variables["x"].attributes["units"].value -or even (which is at least correct): - -.. doctest:: python - - >>> unit = dataset.variables["x"].attributes["units"].as_python_value() - +Set attribute value +^^^^^^^^^^^^^^^^^^^ Likewise, to **set** a value, you would normally just .. doctest:: python @@ -236,9 +230,11 @@ and **not** >>> dataset.variables["x"].attributes["units"].value = "K" -Note also, that as the ``.avals`` is a dictionary, you can use standard dictionary -methods such as ``update`` and ``get`` to perform other operations in a relatively -natural, Pythonic way. +``.avals`` as a dictionary +^^^^^^^^^^^^^^^^^^^^^^^^^^ +Note also, that as ``.avals`` is a dictionary, you can use standard dictionary +methods such as ``pop``, ``update`` and ``get`` to perform other operations in a +relatively natural, Pythonic way. .. doctest:: python @@ -247,6 +243,12 @@ natural, Pythonic way. >>> dataset.attributes.update({"experiment": "A407", "expt_run": 704}) +.. note:: + The new ``.avals`` property effectively replaces the old + :meth:`~ncdata.NcData.get_attrval` and :meth:`~ncdata.NcData.set_attrval` methods, + which are now deprecated and will eventually be removed. + + .. _data-constructors: Core Object Constructors diff --git a/docs/userdocs/user_guide/howtos.rst b/docs/userdocs/user_guide/howtos.rst index f821817..9cfebbd 100644 --- a/docs/userdocs/user_guide/howtos.rst +++ b/docs/userdocs/user_guide/howtos.rst @@ -377,8 +377,8 @@ See: :ref:`copy_notes` Extract a subsection by indexing -------------------------------- -The nicest way is usually just to use the :meth:`~ncdata.Ncdata.slicer` method to specify -dimensions to index, and then index the result. +The nicest way is usually to use the NcData :meth:`~ncdata.Ncdata.slicer` method to +specify dimensions to index, and then index the result. .. testsetup:: @@ -388,22 +388,22 @@ dimensions to index, and then index the result. >>> for nn, dim in full_data.dimensions.items(): ... full_data.variables.add(NcVariable(nn, dimensions=[nn], data=np.arange(dim.size))) -.. doctest:: - - >>> for dimname in full_data.dimensions: - ... print(dimname, ':', full_data.variables[dimname].data) - x : [0 1 2 3 4 5 6] - y : [0 1 2 3 4 5] - .. doctest:: >>> data_region = full_data.slicer("y", "x")[3, 1::2] +effect: + .. doctest:: + >>> for dimname in full_data.dimensions: + ... print("(original)", dimname, ':', full_data.variables[dimname].data) + (original) x : [0 1 2 3 4 5 6] + (original) y : [0 1 2 3 4 5] + >>> for dimname in data_region.dimensions: - ... print(dimname, ':', data_region.variables[dimname].data) - x : [1 3 5] + ... print("(new)", dimname, ':', data_region.variables[dimname].data) + (new) x : [1 3 5] You can also slice data directly, which simply acts on the dimensions in order: @@ -413,7 +413,7 @@ You can also slice data directly, which simply acts on the dimensions in order: >>> data_region_2 == data_region True -See: :ref:`indexing_overview` +See: :ref:`utils_indexing` Read data from a NetCDF file @@ -454,8 +454,8 @@ Use the ``dim_chunks`` argument in the :func:`ncdata.netcdf4.from_nc4` function >>> from ncdata.netcdf4 import from_nc4 >>> ds = from_nc4(filepath, dim_chunks={"time": 3}) - >>> print(ds.variables["time"].data.chunksize) - (3,) + >>> print(ds.variables["time"].data.chunks) + ((3, 3, 3, 1),) Save data to a new file @@ -531,8 +531,28 @@ Use :func:`ncdata.xarray.to_xarray` and :func:`ncdata.xarray.from_xarray`. >>> from ncdata.xarray import from_xarray, to_xarray >>> dataset = xarray.open_dataset(filepath) >>> ncdata = from_xarray(dataset) - >>> + + >>> print(ncdata) + + variables: + + + global attributes: + :experiment = 'A301.7' + > + >>> ds2 = to_xarray(ncdata) + >>> print(ds2) + Size: 8B + Dimensions: () + Data variables: + vx float64 8B nan + Attributes: + experiment: A301.7 Note that: @@ -573,7 +593,7 @@ passed using specific dictionary keywords, e.g. ... iris_load_kwargs={'constraints': 'air_temperature'}, ... xr_save_kwargs={'unlimited_dims': ('time',)}, ... ) - ... + Combine data from different input files into one output ------------------------------------------------------- diff --git a/docs/userdocs/user_guide/utilities.rst b/docs/userdocs/user_guide/utilities.rst index 859bb25..e8b6faf 100644 --- a/docs/userdocs/user_guide/utilities.rst +++ b/docs/userdocs/user_guide/utilities.rst @@ -9,12 +9,16 @@ Rename Dimensions The :func:`~ncdata.utils.rename_dimension` utility does this, in a way which ensures a safe and consistent result. +See: :ref:`operations_rename` + + .. _utils_equality: Dataset Equality Testing ------------------------ -The function :func:`~ncdata.utils.dataset_differences` produces a list of messages -detailing all the ways in which two datasets are different. +The functions :func:`~ncdata.utils.dataset_differences` and +:func:`~ncdata.utils.variable_differences` produce a list of messages detailing all the +ways in which two datasets or variables are different. For Example: ^^^^^^^^^^^^ @@ -47,24 +51,32 @@ For Example: Dataset "x" dimension has different "unlimited" status : False != True Dataset variable "vx" shapes differ : (5,) != (2,) -.. note:: - To compare isolated variables, a subsidiary routine - :func:`~ncdata.utils.variable_differences` is also provided. +For a short-form test that two things are the same, you can just check that the +results ``== []``. + +By default, these functions compare **everything** about the two arguments. +However, they also have multiple keywords which allow certain *types* of differences to +be ignored, E.G. ``check_dims_order=False``, ``check_var_data=False``. .. note:: - The ``==`` and ``!-`` operations on :class:`ncdata.NcData` and - :class:`ncdata.NcVariable` are implemented to call these utility functions. - However, lacking a keyword interface to enable any tolerance options, the operations - compare absolutely everything, and so can be very performance intensive if large data - arrays are present. + The ``==`` and ``!=`` operations on :class:`ncdata.NcData` and + :class:`ncdata.NcVariable` use these utility functions to check for differences. + + .. warning:: + As they lack a keyword interface, these operations provide no tolerance options, + so they always check absolutely everything. Especially, they perform **full + data-array comparisons**, which can have very high performance costs if data + arrays are large. -.. _indexing_overview: +.. _utils_indexing: Sub-indexing ------------ A new dataset can be derived by indexing over dimensions, analagous to sub-indexing -an array. This operation indexes all the variables appropriately, to produce a new -independent dataset which is complete and self-consistent. +an array. + +This operation indexes all the variables appropriately, to produce a new, independent +dataset which is complete and self-consistent. The basic indexing operation is provided in three forms: @@ -197,6 +209,8 @@ Consistency Checking The :func:`~ncdata.utils.save_errors` function provides a general correctness-and-consistency check. +See: :ref:`correctness-checks` + For example: .. testsetup:: @@ -218,13 +232,10 @@ For example: Variable 'q' has a dtype which cannot be saved to netcdf : dtype('O'). -See : :ref:`correctness-checks` - - Data Copying ------------ -The :func:`~ncdata.utils.ncdata_copy` makes structural copies of datasets. -However, this can be easily be accessed as :meth:`ncdata.NcData.copy`, which is the same -operation. +The :func:`~ncdata.utils.ncdata_copy` function makes structural copies of datasets. +However, this can now be more easily accessed as :meth:`ncdata.NcData.copy`, which is +the same operation. See: :ref:`copy_notes` \ No newline at end of file diff --git a/lib/ncdata/utils/__init__.py b/lib/ncdata/utils/__init__.py index 297b3cf..f44fe95 100644 --- a/lib/ncdata/utils/__init__.py +++ b/lib/ncdata/utils/__init__.py @@ -6,12 +6,12 @@ from ._rename_dim import rename_dimension from ._save_errors import save_errors -__all__ = [ - "Slicer", +__all__ = [ # noqa: RUF022 + "rename_dimension", "dataset_differences", + "variable_differences", "index_by_dimensions", - "ncdata_copy", - "rename_dimension", + "Slicer", "save_errors", - "variable_differences", + "ncdata_copy", ] diff --git a/lib/ncdata/utils/_compare_nc_datasets.py b/lib/ncdata/utils/_compare_nc_datasets.py index f70003f..1d1d1ff 100644 --- a/lib/ncdata/utils/_compare_nc_datasets.py +++ b/lib/ncdata/utils/_compare_nc_datasets.py @@ -37,6 +37,8 @@ def dataset_differences( :class:`~ncdata.NcData` objects. File paths are opened with the :mod:`netCDF4` module. + See: :ref:`equality_testing` + Parameters ---------- dataset_or_path_1 : str or Path or netCDF4.Dataset or NcData @@ -93,6 +95,26 @@ def dataset_differences( A list of "error" strings, describing differences between the inputs. If empty, no differences were found. + Examples + -------- + .. doctest:: + + >>> data = NcData( + ... name="a", + ... variables=[NcVariable("b", data=[1, 2, 3, 4])], + ... attributes={"a1": 4} + ... ) + >>> data2 = data.copy() + >>> data2.avals.update({"a1":3, "v":7}) + >>> data2.variables["b"].data = np.array([1, 7, 3, 99]) # must be an array! + >>> print('\n'.join(dataset_differences(data, data2))) + Dataset attribute lists do not match: ['a1'] != ['a1', 'v'] + Dataset "a1" attribute values differ : 4 != 3 + Dataset variable "b" data contents differ, at 2 points: @INDICES[(1,), (3,)] : LHS=[2, 4], RHS=[7, 99] + + See Also + -------- + :func:`~ncdata.utils.variable_differences` """ ds1_was_path = not hasattr(dataset_or_path_1, "variables") ds2_was_path = not hasattr(dataset_or_path_2, "variables") @@ -322,6 +344,8 @@ def variable_differences( r""" Compare variables. + See: :ref:`equality_testing` + Parameters ---------- v1, v2 : NcVariable @@ -347,6 +371,9 @@ def variable_differences( A list of "error" strings, describing differences between the inputs. If empty, no differences were found. + See Also + -------- + :func:`~ncdata.utils.dataset_differences` """ errs = [] diff --git a/lib/ncdata/utils/_copy.py b/lib/ncdata/utils/_copy.py index f631df4..7ae7c04 100644 --- a/lib/ncdata/utils/_copy.py +++ b/lib/ncdata/utils/_copy.py @@ -17,6 +17,8 @@ def ncdata_copy(ncdata: NcData) -> NcData: The operation makes fresh copies of all ncdata objects, but does not copy variable data arrays. + See: :ref:`copy_notes` + Parameters ---------- ncdata @@ -27,6 +29,28 @@ def ncdata_copy(ncdata: NcData) -> NcData: ncdata identical but distinct copy of input + Notes + ----- + This operation is now also available as an object method: + :meth:`~ncdata.NcData.copy`. + + Syntactically, this is generally more convenient, but the operation is identical. + + For example: + + .. testsetup:: + + >>> from ncdata import NcData + >>> from ncdata.utils import ncdata_copy + >>> data = NcData() + + .. doctest:: + + >>> data1 = ncdata_copy(data) + >>> data2 = data.copy() + >>> data1 == data2 + True + """ return NcData( name=ncdata.name, diff --git a/lib/ncdata/utils/_rename_dim.py b/lib/ncdata/utils/_rename_dim.py index 8043b22..8f86d6f 100644 --- a/lib/ncdata/utils/_rename_dim.py +++ b/lib/ncdata/utils/_rename_dim.py @@ -46,6 +46,8 @@ def rename_dimension(ncdata: NcData, name_from: str, name_to: str) -> None: This function calls ``ncdata.dimensions.rename``, but then it *also* renames the dimension in all the variables which reference it, including those in sub-groups. + See: :ref:`operations_rename` + Parameters ---------- ncdata : NcData diff --git a/lib/ncdata/utils/_save_errors.py b/lib/ncdata/utils/_save_errors.py index b6b0713..2002245 100644 --- a/lib/ncdata/utils/_save_errors.py +++ b/lib/ncdata/utils/_save_errors.py @@ -180,29 +180,15 @@ def _save_errors_inner( def save_errors(ncdata: NcData) -> List[str]: """ - Scan a dataset for it's consistency and completeness. + Scan a dataset for consistency and completeness. - Reports on anything that will make this fail to save. + See: :ref:`correctness-checks` + + Describe any aspects of this dataset which would prevent it from saving (cause an + error). If there are any such problems, then an attempt to save the ncdata to a netcdf file will fail. If there are none, then a save should succeed. - The checks made are roughly the following - - (1) check names in all components (dimensions, variables, attributes and groups): - - * all names are valid netcdf names - * all element names match their key in the component, - i.e. "component[key].name == key" - - (2) check that all attribute values have netcdf-compatible dtypes. - (E.G. no object or compound (recarray) dtypes). - - (3) check that, for all contained variables : - - * it's dimensions are all present in the enclosing dataset - * it has an attached data array, of a netcdf-compatible dtype - * the shape of it's data matches the lengths of it's dimensions - Parameters ---------- ncdata @@ -213,5 +199,25 @@ def save_errors(ncdata: NcData) -> List[str]: errors A list of strings, error messages describing problems with the dataset. If no errors, returns an empty list. + + Notes + ----- + The checks made are roughly the following: + + **(1)** check names in all components (dimensions, variables, attributes and groups): + + * all names are valid netcdf names + * all element names match their key in the component, + i.e. ``component[key].name == key`` + + **(2)** check that all attribute values have netcdf-compatible dtypes. + + * ( E.G. no object or compound (recarray) dtypes ) + + **(3)** check that, for all contained variables: + + * its dimensions are all present in the enclosing dataset + * it has an attached data array, of a netcdf-compatible dtype + * the shape of its data matches the lengths of its dimensions """ return _save_errors_inner(ncdata)