Add a function to fill NaNs in grids by interpolation#440
Add a function to fill NaNs in grids by interpolation#440Phssilva wants to merge 8 commits intofatiando:mainfrom
Conversation
|
💖 Thank you for opening your first pull request in this repository! 💖 A few things to keep in mind:
⭐ No matter what, we are really grateful that you put in the effort to do this! ⭐ |
| def test_fill_nans(): | ||
| """ | ||
| This function tests the fill_nans function. | ||
| """ | ||
|
|
||
| grid = np.array([[1, np.nan, 3], | ||
| [4, 5, np.nan], | ||
| [np.nan, 7, 8]]) | ||
| filled_grid = fill_nans(grid) | ||
| assert np.isnan(filled_grid).sum() == 0 |
There was a problem hiding this comment.
The function is supposed to take an xarray.DataArray but the test gives it a numpy array. The test should match the expected use of the function. Please make the grid into a DataArray.
It would also be good to check if the final grid has the correct values in the NaNs. Right now, this only checks that the NaNs aren't there but the values could be completely wrong and we'd never know.
There was a problem hiding this comment.
Hi Leo!
A DataArray was added to the test function, and a check was made on the filled values in the DataArray. Could you please verify if the changes made are correct? Thank you for your attention.
| import pandas as pd | ||
| import xarray as xr | ||
| from scipy.spatial import cKDTree | ||
| from sklearn.impute import KNNImputer |
There was a problem hiding this comment.
Please use verde.KNeighbors instead.
There was a problem hiding this comment.
I used the KNeighbors class and used the predict and fit methods to fill in the values of the data array.
|
|
||
| Parameters | ||
| ---------- | ||
| grid : :class:`xarray.Dataset` or :class:`xarray.DataArray` |
There was a problem hiding this comment.
Should be only a DataArray and not a Dataset.
| for i, idx in enumerate(unknown_indices): | ||
| grid[tuple(idx)] = predicted_values[i] | ||
|
|
||
| return grid |
There was a problem hiding this comment.
The output grid should be a copy of the input grid. We want to avoid changing the input values in-place. The code above will actually alter the input and could cause problems for users since their original grid with NaNs is now gone.
There was a problem hiding this comment.
I used a copy of the grid in the variable filled_grid, which is returned at the end of the function.
| expected_values = xr.DataArray([[1, 1, 3], | ||
| [4, 5, 3], | ||
| [4, 7, 8]]) | ||
|
|
There was a problem hiding this comment.
The DataArrays should contain coordinates as well. They should be as close as possible to the format of a real dataset. That's how we make the tests more robust.
| unknown_indices = np.argwhere(np.isnan(grid.values)) | ||
|
|
||
| knn_imputer = vd.KNeighbors() | ||
| easting, northing = not_nan_values[:, 0], not_nan_values[:, 1] |
There was a problem hiding this comment.
Use the actual coordinates of the grid instead of generating indices. This makes the interpolation work even if the grid is not uniform.
| not_nan_values = np.argwhere(~np.isnan(grid.values)) | ||
| unknown_indices = np.argwhere(np.isnan(grid.values)) |
There was a problem hiding this comment.
Use the verde.grid_to_table function and then drop the NaNs from it. It's easier and preserves the coordinates of the grid.
| predicted_values = knn_imputer.predict((easting, northing)) | ||
|
|
||
| for i, idx in enumerate(unknown_indices): | ||
| filled_grid[tuple(idx)] = predicted_values[i] |
There was a problem hiding this comment.
Instead of this, you could use knn.grid and pass in the coordinates of the original grid.
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Co-authored-by: Leonardo Uieda <leo@uieda.com>
It has been added the function that fills NaN data in a grid and a test has been performed for this function.
Please review, and I'm available for further revisions.
Relevant issues/PRs: