Skip to content

Commit b40ea48

Browse files
chahankluseverinChahan Kropf
authored
Feature/relative matching distance (#1080)
* Add function to test if lat and lon are in geographic coords * Add test for geographic coords in dist_to_coast * Add test for geographic coords in coord_on_land * Switch branch feature/remove_implicite_use_crs to black format * Extend allowed extent for geo coords * Modify allowed extents in geo_coords check function to allow for bounds * Include possible wrong lat lon extension in error message * Add check for geo coords in get_country_code * Add test for geocoords in convert_wgs_to_utm * Update error message in coord_on_land * Harmonize error messages * Adapt test get_country_geometries_fail * Check for geo coords in match_coordinates * Add check for lat in get_gridcellarea * Modify check geo coords function to please linter * Add test for geo coords in calc of coriolis param * Add tests for check_if_geo_coords function * Incorporate suggestions from review 1 * Start adapating match_coordinates function to other crs * Add handling of unit in match_centroids * Rename deg to degrees * Add function to infer unit of coordinates * Use infer_unit_coords in match_centroids function * Add tests for unit_infer and unit-adapted match_coordinates * Make test_tack_land_params compliant with new requirement of geodetic coords * Correct wrong is_geodetic crs attribute * Add restriction for haversine with non-degree units * Finish adding test for handling of different units * Add test invalid unit for match coords and docstrings * Update docstrings * Try to fix new linter issues * Correct error in logger warning * Force check_antimeridian to False in distance euclidean if unit is not degree * Remove outdated test for raise of ValueError when check_antimeridian is attempted in non-degree units * Add function to get coordinates from axis of coordinates'crs * Update unit names to be consistent with epsg notation and remove unit kw from match_centroids * Clarify docstring check_if_geo_coords * Make conversion between radians and degrees more consistent * Remove deprecated infer_unit_coords function * Update tests * Force input coords and centroids to have crs attribute in match_centroids * Re-harmonize unit names and raise error when crs of centroids or coords is None * Add missing tests * Fix linter issues * Refix linter * Wrap valuerror raising in check_if_geo_coords * Update tests * Remove useless info in get_crs_unit docstring * Raise separate exceptions for missing crs in coord_gdf and centroids Co-authored-by: Chahan M. Kropf <[email protected]> * Fix error in _nearest_neighbor_approx docstring * Make clearer error message for _nearest_neighbor with non-allowed units * Simplify crs check in match_centroids * Correct typo and update tests for new error messages in match_centroids * Only warn for check_antimeridian when it is true and add tests * Allow for units other than degree km and m by keeping distance threshold in the given unit * Update thresholds in test to be consisten with handling of thresholds in non-degree units * Remove mention to calling functions in error message in _nearest_neighbor_haversine Co-authored-by: Chahan M. Kropf <[email protected]> * Remove mention to calling functions in error message in _nearest_neighbor_approx Co-authored-by: Chahan M. Kropf <[email protected]> * Update docstrings match_centroids * update docstrings for match_coordinates * Update changelog * Update docstring to removed grid centroids + black * Add error to docstring and return value * Replace default threshold by dynamic one and apply black * Update tests to use dynamic threshold * Simplify non degree crs handling * Use crs instead of unit * Update test to use crs as input * Require explicit threshold for single centroid coordinates * Add docstrings for km to degree and inverse * Add docstrings for estimate threshold * Update impact tests * Add test for projected crs * Add docstring in impact calc * Update impact calc test to use correct distance threshold * Add impact calc test with other crs * Improve docstrings * Separate check geo coordinates into two methods * Improve docstrings * Correct docstrings * Add new coordinates methods to changelog * Improve wording * Fix resolution to be only positive * Add coordinate checks for bounds * Add test test resolution * Add test assign other crs * Update changelog * Revert bounds and normalize degree check as test were broadly failing. * Check that lon normalize values are actually proper longitudes * Revert "Check that lon normalize values are actually proper longitudes" This reverts commit 86bb4f5. * Raise error if no res above min_threshold can be found in get_resolution_1d * Add test exception exp.assign_centroids when crs not equal * Add test get_unit_crs for undefined unit * Add test unknown unit get_gridcell_area * Add test estimate_matching_threshold * check geocoords in lon_normalize and handle empty arrays in check geocoords * Update tests lon_normalize * Increase allowed lon extent in is_geo_coords to avoid failing tests with ibtracks * Update offensive value for test_get_country_geometries_fail * Add handling of nans in is_geo_coords * Update changelog * Update climada/engine/impact_calc.py * Clarify meaning of highest resolution in docstrings Co-authored-by: Chahan M. Kropf <[email protected]> * Clarify meaning of highest resolution in docstrings Co-authored-by: Chahan M. Kropf <[email protected]> * Correct typo in docstring Co-authored-by: Luca Severino <[email protected]> * Fix typo in docstring Co-authored-by: Luca Severino <[email protected]> * Resolve linter issue --------- Co-authored-by: luseverin <[email protected]> Co-authored-by: luseverin <[email protected]> Co-authored-by: Chahan Kropf <[email protected]>
1 parent ab7556a commit b40ea48

File tree

9 files changed

+897
-163
lines changed

9 files changed

+897
-163
lines changed

CHANGELOG.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,17 @@ Removed:
2828
- Added optional parameter to `geo_im_from_array`, `plot_from_gdf`, `plot_rp_imp`, `plot_rp_intensity`,
2929
`plot_intensity`, `plot_fraction`, `_event_plot` to mask plotting when regions are too far from data points [#1047](https://github.com/CLIMADA-project/climada_python/pull/1047). To recreate previous plots (no masking), the parameter can be set to None.
3030
- Added instructions to install Climada petals on Euler cluster in `doc.guide.Guide_Euler.ipynb` [#1029](https://github.com/CLIMADA-project/climada_python/pull/1029)
31-
31+
- Added util methods to handle crs coordinates consistently: `is_geo_coords`, `check_if_geo_coords`, `get_crs_unit`, `estimate_matching_threshold`, `degree_to_km`, and `km_to_degree` [#1080](https://github.com/CLIMADA-project/climada_python/pull/1080)
3232
- `ImpactFunc` and `ImpactFuncSet` now support equality comparisons via `==` [#1027](https://github.com/CLIMADA-project/climada_python/pull/1027)
3333

3434
### Changed
35-
35+
- Changed default distance threshold for nearest neighbor matching in `util.coordinates.match_coordinates` from a fixed value of 100km to twice the highest resolution of the coords_to_assign [#1080](https://github.com/CLIMADA-project/climada_python/pull/1080).
3636
- Changed the default mask_distance in `util.plot.geo_im_from_array` to 0.03 to avoid white gaps in gridded hazard data with comparably low resolution (>80 centroids per axis) [#1073](https://github.com/CLIMADA-project/climada_python/pull/1073)
3737
- Increased speed of `util.plot.add_shapes` by avoiding for loops, substantially speeding up `Hazard.plot_intensity` and other functions. [#1073](https://github.com/CLIMADA-project/climada_python/pull/1073)
38+
- Update `util.coordinates.match_centroids`, `util.coordinates.match_coordinates`, so that they also
39+
accept coordinates that are not defined in degree. [#1080](https://github.com/CLIMADA-project/climada_python/pull/1080)
40+
- Implement cheap test to check that input coordinates at least seem geographic for functions that require
41+
geographic coordinates as input (e.g. `util.coordinates.dist_to_coast`, `util.coordinates.coord_on_land`, `util.coordinates.lon_normalize`, `util.coordinates.lon_bounds`). [#1080](https://github.com/CLIMADA-project/climada_python/pull/1080)
3842
- `Hazard.local_exceedance_intensity`, `Hazard.local_return_period` and `Impact.local_exceedance_impact`, `Impact.local_return_period`, using the `climada.util.interpolation` module: New default (no binning), binning on decimals, and faster implementation [#1012](https://github.com/CLIMADA-project/climada_python/pull/1012)
3943
- World Bank indicator data is now downloaded directly from their API via the function `download_world_bank_indicator`, instead of relying on the `pandas-datareader` package [#1033](https://github.com/CLIMADA-project/climada_python/pull/1033)
4044
- `Exposures.write_hdf5` pickles geometry data in WKB format, which is faster and more sustainable. [#1051](https://github.com/CLIMADA-project/climada_python/pull/1051)

climada/engine/impact_calc.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,12 @@ def impact(
9292
Default: True
9393
assign_centroids : bool, optional
9494
indicates whether centroids are assigned to the self.exposures object.
95+
The default centroids assignment uses Euclidean distance and a threshold equal to
96+
twice the highest resolution (smallest distance between any two nearest neighbours)
97+
of the centroids (assuming a regular grid).
98+
Recommendation: assign the centroids directly with control
99+
over the distance metric and the distance threshold using
100+
`exposure.assign_centroids(hazard)`. In this case, set assign_centroids to ``False``.
95101
Centroids assignment is an expensive operation; set this to ``False`` to save
96102
computation time if the hazards' centroids are already assigned to the exposures
97103
object.
@@ -114,10 +120,20 @@ def impact(
114120
>>> imp = impcalc.impact(insured=True)
115121
>>> imp.aai_agg
116122
123+
>>> haz = Hazard.from_hdf5(HAZ_DEMO_H5) # Set hazard
124+
>>> impfset = ImpactFuncSet.from_excel(ENT_TEMPLATE_XLS)
125+
>>> exp = Exposures(pd.read_excel(ENT_TEMPLATE_XLS))
126+
>>> #Adjust threshold and distance to centroids/exposure resolution
127+
>>> exp.assign_centroids(hazard, threshold=1.5, distance='euclidean')
128+
>>> impcalc = ImpactCal(exp, impfset, haz)
129+
>>> imp = impcalc.impact(insured=True, assign_centroids=False)
130+
>>> imp.aai_agg
131+
117132
See also
118133
--------
119134
apply_deductible_to_mat : apply deductible to impact matrix
120135
apply_cover_to_mat : apply cover to impact matrix
136+
climada.entity.exposures.assign_centroids : assign centroids to exposures explicitly
121137
"""
122138
# TODO: consider refactoring, making use of Exposures.hazard_impf
123139
# check for compatibility of exposures and hazard type

climada/engine/test/test_impact_calc.py

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,28 @@ def test_calc_impact_TC_pass(self):
210210
self.assertAlmostEqual(6.570532945599105e11, impact.tot_value)
211211
self.assertAlmostEqual(6.512201157564421e09 * x, impact.aai_agg, 5)
212212

213+
def test_calc_impact_TC_change_crs_pass(self):
214+
"""Test compute impact"""
215+
crs = "EPSG:4087"
216+
exp = ENT.exposures.copy()
217+
exp.to_crs(crs)
218+
haz = deepcopy(HAZ)
219+
haz.centroids.to_crs(crs)
220+
icalc = ImpactCalc(exp, ENT.impact_funcs, haz)
221+
impact = icalc.impact()
222+
self.assertEqual(icalc.n_events, len(impact.at_event))
223+
self.assertEqual(0, impact.at_event[0])
224+
self.assertEqual(0, impact.at_event[7225])
225+
self.assertAlmostEqual(1.472482938320243e08, impact.at_event[13809], delta=1)
226+
self.assertAlmostEqual(7.076504723057620e10, impact.at_event[12147], delta=1)
227+
self.assertEqual(0, impact.at_event[14449])
228+
self.assertEqual(icalc.n_exp_pnt, len(impact.eai_exp))
229+
self.assertAlmostEqual(1.518553670803242e08, impact.eai_exp[0], delta=1)
230+
self.assertAlmostEqual(1.373490457046383e08, impact.eai_exp[25], 6)
231+
self.assertAlmostEqual(1.066837260150042e08, impact.eai_exp[49], 6)
232+
self.assertAlmostEqual(6.570532945599105e11, impact.tot_value)
233+
self.assertAlmostEqual(6.512201157564421e09, impact.aai_agg, 5)
234+
213235
def test_calc_impact_RF_pass(self):
214236
haz = Hazard.from_hdf5(get_test_file("test_hazard_US_flood_random_locations"))
215237
exp = Exposures.from_hdf5(
@@ -574,10 +596,13 @@ def test_single_exp_zero_mdr(self):
574596
eai_exp = np.array([aai_agg])
575597
at_event = np.array([imp_evt, 0])
576598
exp.set_geometry_points()
599+
exp.assign_centroids(haz, threshold=1)
577600
impf_tc = ImpfTropCyclone.from_emanuel_usa()
578601
impf_set = ImpactFuncSet([impf_tc])
579602
impf_set.check()
580-
imp = ImpactCalc(exp, impf_set, haz).impact(save_mat=True)
603+
imp = ImpactCalc(exp, impf_set, haz).impact(
604+
assign_centroids=False, save_mat=True
605+
)
581606
check_impact(self, imp, haz, exp, aai_agg, eai_exp, at_event, at_event)
582607

583608

climada/entity/exposures/base.py

Lines changed: 41 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -591,65 +591,81 @@ def assign_centroids(
591591
self,
592592
hazard,
593593
distance="euclidean",
594-
threshold=u_coord.NEAREST_NEIGHBOR_THRESHOLD,
594+
threshold=None,
595595
overwrite=True,
596596
):
597-
"""Assign for each exposure coordinate closest hazard coordinate.
597+
"""Assign for each exposure coordinate the closest hazard coordinate.
598598
The Exposures ``gdf`` will be altered by this method. It will have an additional
599599
(or modified) column named ``centr_[hazard.HAZ_TYPE]`` after the call.
600600
601601
Uses the utility function ``u_coord.match_centroids``. See there for details
602602
and parameters.
603603
604604
The value -1 is used for distances larger than ``threshold`` in point distances.
605-
In case of raster hazards the value -1 is used for centroids outside of the raster.
606605
607606
Parameters
608607
----------
609608
hazard : Hazard
610-
Hazard to match (with raster or vector centroids).
609+
Hazard to match. The hazard centroids must be in the same coordinate
610+
reference system (crs) as the exposures.
611611
distance : str, optional
612-
Distance to use in case of vector centroids.
612+
Distance to use for matching exposures points to hazard centroids
613613
Possible values are "euclidean", "haversine" and "approx".
614614
Default: "euclidean"
615615
threshold : float
616-
If the distance (in km) to the nearest neighbor exceeds `threshold`,
616+
Assignement threshold in units of the exposure's crs.
617+
If the distance to the nearest neighbor exceeds `threshold`,
617618
the index `-1` is assigned.
618-
Set `threshold` to 0, to disable nearest neighbor matching.
619-
Default: 100 (km)
619+
Set `threshold` to 0, to disable nearest neighbor matching and enforce
620+
exact matching.
621+
Default: twice the highest resolution (smallest distance between any two nearest neighbours) of the hazard centroids
620622
overwrite: bool
621623
If True, overwrite centroids already present. If False, do
622624
not assign new centroids. Default is True.
623625
624626
See Also
625627
--------
626-
climada.util.coordinates.match_grid_points: method to associate centroids to
627-
exposure points when centroids is a raster
628628
climada.util.coordinates.match_coordinates:
629629
method to associate centroids to exposure points
630+
climada.util.coordinates.estimate_matching_threshold:
631+
method to calculate the default threshold
632+
climada.util.coordinates.km_to_degree:
633+
method to approximately convert kilometers to degrees
634+
climada.util.coordinates.degree_to_km:
635+
method to approximately convert degrees to kilometers
636+
630637
Notes
631638
-----
632-
The default order of use is:
633-
634-
1. if centroid raster is defined, assign exposures points to
635-
the closest raster point.
636-
2. if no raster, assign centroids to the nearest neighbor using
637-
euclidian metric
638-
639-
Both cases can introduce innacuracies for coordinates in lat/lon
640-
coordinates as distances in degrees differ from distances in meters
641-
on the Earth surface, in particular for higher latitude and distances
642-
larger than 100km. If more accuracy is needed, please use 'haversine'
643-
distance metric. This however is slower for (quasi-)gridded data,
644-
and works only for non-gridded data.
639+
For coordinates in lat/lon coordinates distances in degrees differ from
640+
distances in meters on the Earth surface, in particular for higher
641+
latitude and distances larger than 100km. If more accuracy for degree
642+
coordinates is needed, please use 'haversine' distance metric,
643+
which however is slower.
644+
645+
Caution: nearest neighbourg matching can introduce serious artefacts
646+
such as:
647+
- exposure and hazard centroids with shifted grids can lead
648+
to systematically wrong assignements.
649+
- hazard centroids covering larger areas than exposures may lead
650+
to sub-optimal matching if the threshold is too large
651+
- projected crs often diverge at the anti-meridian and close points
652+
on either side will be at a large distance. For proper handling
653+
of the anti-meridian please use degree coordinates in EPSG:4326.
654+
This might be relevant for countries like the Fidji or the US that
655+
cross the anti-meridian.
656+
657+
Users are free to implement their own matching alrogithm and save the
658+
matching centroid index in the appropriate column ``centr_[hazard.HAZ_TYPE]``.
645659
"""
646660
haz_type = hazard.haz_type
647661
centr_haz = INDICATOR_CENTR + haz_type
648662
if centr_haz in self.gdf:
649-
LOGGER.info("Exposures matching centroids already found for %s", haz_type)
650663
if overwrite:
651664
LOGGER.info("Existing centroids will be overwritten for %s", haz_type)
652665
else:
666+
LOGGER.info(
667+
"Exposures matching centroids already found for %s", haz_type
668+
)
653669
return
654670

655671
LOGGER.info(
@@ -659,7 +675,7 @@ def assign_centroids(
659675
)
660676

661677
if not u_coord.equal_crs(self.crs, hazard.centroids.crs):
662-
raise ValueError("Set hazard and exposure to same CRS first!")
678+
raise ValueError("Set hazard and exposure to the same CRS first!")
663679
# Note: equal_crs is tested here, rather than within match_centroids(),
664680
# because exp.gdf.crs may not be defined, but exp.crs must be defined.
665681

climada/entity/exposures/test/test_base.py

Lines changed: 54 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,59 @@ def test_assign_pass(self):
103103
self.assertEqual(exp.gdf.shape[0], len(exp.hazard_centroids("FL")))
104104
np.testing.assert_array_equal(exp.hazard_centroids("FL"), expected_result)
105105

106+
def test_assign_meter_crs_pass(self):
107+
"""Check that attribute `assigned` is correctly set."""
108+
np_rand = np.random.RandomState(123456789)
109+
110+
haz = Hazard.from_raster(
111+
[HAZ_DEMO_FL], haz_type="FL", window=Window(10, 20, 50, 60)
112+
)
113+
ncentroids = haz.centroids.size
114+
crs_meters = "EPSG:4087"
115+
haz.centroids.to_crs(crs_meters, inplace=True)
116+
exp = Exposures(
117+
crs=haz.centroids.crs,
118+
lon=np.concatenate(
119+
[
120+
haz.centroids.lon,
121+
haz.centroids.lon + 0.001 * (-0.5 + np_rand.rand(ncentroids)),
122+
]
123+
),
124+
lat=np.concatenate(
125+
[
126+
haz.centroids.lat,
127+
haz.centroids.lat + 0.001 * (-0.5 + np_rand.rand(ncentroids)),
128+
]
129+
),
130+
)
131+
expected_result = np.concatenate([np.arange(ncentroids), np.arange(ncentroids)])
132+
133+
# make sure that it works for both float32 and float64
134+
for test_dtype in [np.float64, np.float32]:
135+
haz.centroids.gdf["lat"] = haz.centroids.lat.astype(test_dtype)
136+
haz.centroids.gdf["lon"] = haz.centroids.lon.astype(test_dtype)
137+
exp.assign_centroids(haz)
138+
self.assertEqual(exp.gdf.shape[0], len(exp.hazard_centroids("FL")))
139+
np.testing.assert_array_equal(exp.hazard_centroids("FL"), expected_result)
140+
141+
def test_assign_no_equal_crs(self):
142+
haz = Hazard.from_raster(
143+
[HAZ_DEMO_FL], haz_type="FL", window=Window(10, 20, 50, 60)
144+
)
145+
crs_km = "EPSG:4326"
146+
crs_meters = "EPSG:4087"
147+
exp = Exposures(
148+
crs=crs_meters,
149+
lon=haz.centroids.lon,
150+
lat=haz.centroids.lat,
151+
)
152+
haz.centroids.to_crs(crs_km, inplace=True)
153+
with self.assertRaises(ValueError) as cm:
154+
exp.assign_centroids(haz)
155+
self.assertEqual(
156+
"Set hazard and exposure to the same CRS first!", str(cm.exception)
157+
)
158+
106159
def test__init__meta_type(self):
107160
"""Check if meta of type list raises a ValueError in __init__"""
108161
with self.assertRaises(TypeError) as cm:
@@ -212,7 +265,7 @@ def test_assign_raster_pass(self):
212265
0.0,
213266
],
214267
)
215-
exp.assign_centroids(haz)
268+
exp.assign_centroids(haz, threshold=u_coord.km_to_degree(100))
216269

217270
expected_result = [
218271
# constant y-value, varying x-value

climada/hazard/test/test_tc_tracks.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1459,7 +1459,7 @@ def test_track_land_params(self):
14591459
lon_test = np.array([170, 179.18, 180.05])
14601460
lat_test = np.array([-60, -16.56, -16.85])
14611461
on_land = np.array([False, True, True])
1462-
lon_shift = np.array([-360, 0, 360])
1462+
lon_shift = np.array([360, 360, 360])
14631463
# ensure both points are considered on land as is
14641464
np.testing.assert_array_equal(
14651465
u_coord.coord_on_land(lat=lat_test, lon=lon_test), on_land

climada/hazard/trop_cyclone/trop_cyclone_windfields.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -902,6 +902,8 @@ def _coriolis_parameter(lat: np.ndarray) -> np.ndarray:
902902
cp : np.ndarray of same shape as input
903903
Coriolis parameter.
904904
"""
905+
u_coord.check_if_geo_coords(lat, 0)
906+
905907
return 2 * V_ANG_EARTH * np.sin(np.radians(np.abs(lat)))
906908

907909

0 commit comments

Comments
 (0)