11"""
2- Operations for making caching keys for a given dataset.
2+ Operations for making cache keys based on dataset geometry.
3+
4+ Some operations such as :func:`~.operations.triangulate.triangulate_dataset`
5+ only depend on the dataset geometry and are expensive to compute.
6+ For applications that need to derive data from the dataset geometry
7+ it would be useful if the derived data could be reused between different runs of the same application
8+ or between multiple time slices of the same geometry distributed across multiple files.
9+ This module provides :func:`.make_cache_key` to assist in this process
10+ by deriving a cache key from the important parts of a dataset geometry.
11+ Applications can use this cache key
12+ as part of a filename when save derived geometry data to disk
13+ or as a key to an in-memory cache of derived geometry.
14+
15+ The derived cache keys will be identical between different instances of an application,
16+ and between different files in multi-file datasets split over an unlimited dimension.
17+
18+ This module does not provide an actual cache implementation.
319"""
420import hashlib
521import marshal
1228
1329def hash_attributes (hash : "hashlib._Hash" , attributes : dict ) -> None :
1430 """
15- Updates the provided hash with with a marshal serialised byte representation of the given attribute dictionary.
31+ Adds the contents of an :attr:`attributes dictionary <xarray.DataArray.attrs>`
32+ to a hash.
1633
1734 Parameters
1835 ----------
1936 hash : hashlib-style hash instance
20- The hash instance to update with the given attribute dict .
37+ The hash instance to add the attribute dictionary to .
2138 This must follow the interface defined in :mod:`hashlib`.
22- attributes: dict
23- Expects a marshal compatible dictionary.
39+ attributes : dict
40+ A dictionary of attributes from a :class:`~xarray.Dataset` or :class:`~xarray.DataArray`.
41+
42+ Notes
43+ -----
44+ The attribute dictionary is serialized to bytes using :func:`marshal.dumps`.
45+ This is an implementation detail that may change in future releases.
2446 """
2547 # Prepend the marshal encoding version
2648 marshal_version = 4
@@ -36,32 +58,44 @@ def hash_attributes(hash: "hashlib._Hash", attributes: dict) -> None:
3658
3759def hash_string (hash : "hashlib._Hash" , value : str ) -> None :
3860 """
39- Updates the provided hash with with a utf-8 encoded byte representation of the provided string .
61+ Adds a :class:`string <str>` to a hash .
4062
4163 Parameters
4264 ----------
4365 hash : hashlib-style hash instance
44- The hash instance to update with the given attribute dict .
66+ The hash instance to add the string to .
4567 This must follow the interface defined in :mod:`hashlib`.
46- attributes: str
47- Expects a string that can be encoded in utf-8.
68+ value : str
69+ Any unicode string.
70+
71+ Notes
72+ -----
73+ The string is UTF-8 encoded as part of being added to the hash.
74+ This is an implementation detail that may change in future releases.
4875 """
49- # Prepend the str length
76+ # Prepend the length of the string to the hash
77+ # to prevent malicious datasets generating overlapping string hashes.
5078 hash_int (hash , len (value ))
5179 hash .update (value .encode ('utf-8' ))
5280
5381
5482def hash_int (hash : "hashlib._Hash" , value : int ) -> None :
5583 """
56- Updates the provided hash with an encoded byte representation of the provided int .
84+ Adds an :class:`int` to a hash .
5785
5886 Parameters
5987 ----------
6088 hash : hashlib-style hash instance
61- The hash instance to update with the given attribute dict .
89+ The hash instance to add the integer to .
6290 This must follow the interface defined in :mod:`hashlib`.
63- attributes: int
64- Expects an int that can be represented in a numpy int32.
91+ value : int
92+ Any int representable as an :data:`numpy.int32`
93+
94+ Notes
95+ -----
96+ The int is cast to a :data:`numpy.int32` as part of being added to the hash.
97+ This is an implementation detail that may change in the future
98+ if larger integers are required.
6599 """
66100 with numpy .errstate (over = 'raise' ):
67101 # Manual overflow check as older numpy versions dont throw the exception
@@ -73,17 +107,16 @@ def hash_int(hash: "hashlib._Hash", value: int) -> None:
73107
74108def make_cache_key (dataset : xarray .Dataset , hash : "hashlib._Hash | None" = None ) -> str :
75109 """
76- Generate a key suitable for caching data derived from the geometry of a dataset.
110+ Derive a cache key from the geometry of a dataset.
77111
78112 Parameters
79113 ----------
80114 dataset : xarray.Dataset
81115 The dataset to generate a cache key from.
82- hash : hashlib._Hash
116+ hash : :mod:` hashlib`-compatible hash instance, optional
83117 An instance of a hashlib hash class.
84- Defaults to `hashlib.blake2b`, which is secure enough and fast enough for most purposes.
85- The hash algorithm does not need to be cryptographically secure,
86- so faster algorithms such as `xxhash` can be swapped in if desired.
118+ Defaults to :func:`hashlib.blake2b` with a digest size of 32,
119+ which is secure enough and fast enough for most purposes.
87120
88121 Returns
89122 -------
0 commit comments