Skip to content

Commit 368181c

Browse files
kthyngdcherian
andauthored
Using regex package for match (#408)
* Using regex package for match The built-in re package does not allow for global flags like "(?i)" to be anywhere but the start of a pattern string now. The package `regex` still allows this, so it is optionally used for the `match` function if available. * test added * updating whats new * Update whats-new.rst * updates to PR * changed to import full modules due to mypy error * oops on the previous commit. hopefully better now. * added regex to environment.yaml * updated custom-criteria.md * Update doc/whats-new.rst Co-authored-by: Deepak Cherian <[email protected]> * Add ignore for mypy * added link to regex package in docs and regex to doc env * example in doc was not being shown but now is * Update doc/custom-criteria.md * Update doc/custom-criteria.md Co-authored-by: Deepak Cherian <[email protected]> * Update doc/custom-criteria.md --------- Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: dcherian <[email protected]>
1 parent b751c7c commit 368181c

File tree

8 files changed

+51
-4
lines changed

8 files changed

+51
-4
lines changed

cf_xarray/accessor.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,11 @@ def _get_custom_criteria(
207207
List[str], Variable name(s) in parent xarray object that matches axis, coordinate, or custom `key`
208208
"""
209209

210+
try:
211+
from regex import match as regex_match
212+
except ImportError:
213+
from re import match as regex_match # type: ignore
214+
210215
if isinstance(obj, DataArray):
211216
obj = obj._to_temp_dataset()
212217

@@ -223,13 +228,13 @@ def _get_custom_criteria(
223228
if key in criteria_map:
224229
for criterion, patterns in criteria_map[key].items():
225230
for var in obj.variables:
226-
if re.match(patterns, obj[var].attrs.get(criterion, "")):
231+
if regex_match(patterns, obj[var].attrs.get(criterion, "")):
227232
results.update((var,))
228233
# also check name specifically since not in attributes
229234
elif (
230235
criterion == "name"
231236
and isinstance(var, str)
232-
and re.match(patterns, var)
237+
and regex_match(patterns, var)
233238
):
234239
results.update((var,))
235240
return list(results)

cf_xarray/tests/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,3 +67,4 @@ def LooseVersion(vstring):
6767
has_scipy, requires_scipy = _importorskip("scipy")
6868
has_shapely, requires_shapely = _importorskip("shapely")
6969
has_pint, requires_pint = _importorskip("pint")
70+
has_regex, requires_regex = _importorskip("regex")

cf_xarray/tests/test_accessor.py

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,13 @@
3434
rotds,
3535
vert,
3636
)
37-
from . import raise_if_dask_computes, requires_cftime, requires_pint, requires_scipy
37+
from . import (
38+
raise_if_dask_computes,
39+
requires_cftime,
40+
requires_pint,
41+
requires_regex,
42+
requires_scipy,
43+
)
3844

3945
mpl.use("Agg")
4046

@@ -1585,6 +1591,17 @@ def test_custom_criteria() -> None:
15851591
assert_identical(ds.cf["temp"], ds["temperature"])
15861592

15871593

1594+
@requires_regex
1595+
def test_regex_match():
1596+
# test that having a global regex expression flag later in the expression will work if
1597+
# regex is found
1598+
vocab = {"temp": {"name": "tem|(?i)temp"}}
1599+
ds = xr.Dataset()
1600+
ds["Tempblah"] = [0, 1, 2]
1601+
with cf_xarray.set_options(custom_criteria=vocab):
1602+
assert_identical(ds.cf["temp"], ds["Tempblah"])
1603+
1604+
15881605
def test_cf_standard_name_table_version() -> None:
15891606

15901607
url = (

ci/doc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ dependencies:
1818
- pandas
1919
- pooch
2020
- pint
21+
- regex
2122
- furo
2223
- pip:
2324
- git+https://github.com/xarray-contrib/cf-xarray

ci/environment.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ dependencies:
1313
- pandas
1414
- pint
1515
- pooch
16+
- regex
1617
- scipy
1718
- shapely
1819
- xarray

doc/custom-criteria.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,3 +102,20 @@ cfxr.set_options(custom_criteria=salt_criteria)
102102
103103
ds.cf[["salinity"]]
104104
```
105+
106+
## More complex matches with `regex`
107+
108+
Here is an example of a more complicated custom criteria, which requires the package [`regex`](https://github.com/mrabarnett/mrab-regex) to be installed since a behavior (allowing global flags like "(?i)" for matching case insensitive) was recently deprecated in the `re` package. The custom criteria, called "vocab", matches – case insensitive – to the variable alias "sea_ice_u" a variable whose name includes "sea" and "ice" and "u" but not "qc" or "status", or "sea" and "ice" and "x" and "vel" but not "qc" or "status".
109+
110+
```{code-cell}
111+
import cf_xarray as cfxr
112+
import xarray as xr
113+
114+
vocab = {"sea_ice_u": {"name": "(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*u)|(?i)^(?!.*(qc|status))(?=.*sea)(?=.*ice)(?=.*x)(?=.*vel)"}}
115+
ds = xr.Dataset()
116+
ds["sea_ice_velocity_x"] = [0,1,2]
117+
118+
with cfxr.set_options(custom_criteria=vocab):
119+
seaiceu = ds.cf["sea_ice_u"]
120+
seaiceu
121+
```

doc/whats-new.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,11 @@
33
What's New
44
----------
55

6+
v0.7.8 (unreleased)
7+
===================
8+
9+
- Optionally use the `regex` package to continue supporting global flags in regular expressions that are not at start of pattern. (:pr:`408`). By `Kristen Thyng`_
10+
611
v0.7.7 (Jan 14, 2023)
712
=====================
813

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ dependencies = [
2121
dynamic = ["version"]
2222

2323
[project.optional-dependencies]
24-
all = ["matplotlib", "pint", "shapely"]
24+
all = ["matplotlib", "pint", "shapely", "regex"]
2525

2626
[project.urls]
2727
homepage = "https://cf-xarray.readthedocs.io"

0 commit comments

Comments
 (0)