Skip to content

Add a function to fill NaNs in grids by interpolation#440

Open
Phssilva wants to merge 8 commits intofatiando:mainfrom
Phssilva:issue#439
Open

Add a function to fill NaNs in grids by interpolation#440
Phssilva wants to merge 8 commits intofatiando:mainfrom
Phssilva:issue#439

Conversation

@Phssilva
Copy link
Copy Markdown

It has been added the function that fills NaN data in a grid and a test has been performed for this function.

Please review, and I'm available for further revisions.

Relevant issues/PRs:

@welcome
Copy link
Copy Markdown

welcome Bot commented Mar 19, 2024

💖 Thank you for opening your first pull request in this repository! 💖

A few things to keep in mind:

No matter what, we are really grateful that you put in the effort to do this!

Comment thread verde/utils.py Outdated
Comment thread verde/utils.py Outdated
@leouieda leouieda changed the title issue #439 Add a function to fill NaNs in grids by interpolation Mar 21, 2024
Copy link
Copy Markdown
Member

@leouieda leouieda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Phssilva thank you for the implementation! I left some comments in the code with changes that are needed.

I really appreciate you taking the time to do this!

Comment thread verde/tests/test_utils.py Outdated
Comment on lines +34 to +43
def test_fill_nans():
"""
This function tests the fill_nans function.
"""

grid = np.array([[1, np.nan, 3],
[4, 5, np.nan],
[np.nan, 7, 8]])
filled_grid = fill_nans(grid)
assert np.isnan(filled_grid).sum() == 0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is supposed to take an xarray.DataArray but the test gives it a numpy array. The test should match the expected use of the function. Please make the grid into a DataArray.

It would also be good to check if the final grid has the correct values in the NaNs. Right now, this only checks that the NaNs aren't there but the values could be completely wrong and we'd never know.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Leo!
A DataArray was added to the test function, and a check was made on the filled values in the DataArray. Could you please verify if the changes made are correct? Thank you for your attention.

Comment thread verde/utils.py Outdated
import pandas as pd
import xarray as xr
from scipy.spatial import cKDTree
from sklearn.impute import KNNImputer
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use verde.KNeighbors instead.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the KNeighbors class and used the predict and fit methods to fill in the values of the data array.

Comment thread verde/utils.py Outdated

Parameters
----------
grid : :class:`xarray.Dataset` or :class:`xarray.DataArray`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be only a DataArray and not a Dataset.

Comment thread verde/utils.py Outdated
for i, idx in enumerate(unknown_indices):
grid[tuple(idx)] = predicted_values[i]

return grid
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output grid should be a copy of the input grid. We want to avoid changing the input values in-place. The code above will actually alter the input and could cause problems for users since their original grid with NaNs is now gone.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a copy of the grid in the variable filled_grid, which is returned at the end of the function.

Copy link
Copy Markdown
Member

@leouieda leouieda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the work @Phssilva! I left some more comments to get this on its way.

Comment thread verde/tests/test_utils.py
Comment on lines +44 to +47
expected_values = xr.DataArray([[1, 1, 3],
[4, 5, 3],
[4, 7, 8]])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DataArrays should contain coordinates as well. They should be as close as possible to the format of a real dataset. That's how we make the tests more robust.

Comment thread verde/tests/test_utils.py Outdated
Comment thread verde/tests/test_utils.py Outdated
Comment thread verde/tests/test_utils.py Outdated
Comment thread verde/utils.py Outdated
Comment thread verde/utils.py
unknown_indices = np.argwhere(np.isnan(grid.values))

knn_imputer = vd.KNeighbors()
easting, northing = not_nan_values[:, 0], not_nan_values[:, 1]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the actual coordinates of the grid instead of generating indices. This makes the interpolation work even if the grid is not uniform.

Comment thread verde/utils.py Outdated
Comment thread verde/utils.py Outdated
Comment thread verde/utils.py
Comment on lines +702 to +703
not_nan_values = np.argwhere(~np.isnan(grid.values))
unknown_indices = np.argwhere(np.isnan(grid.values))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the verde.grid_to_table function and then drop the NaNs from it. It's easier and preserves the coordinates of the grid.

Comment thread verde/utils.py
Comment on lines +709 to +712
predicted_values = knn_imputer.predict((easting, northing))

for i, idx in enumerate(unknown_indices):
filled_grid[tuple(idx)] = predicted_values[i]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, you could use knn.grid and pass in the coordinates of the original grid.

Phssilva and others added 6 commits April 11, 2024 19:57
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Co-authored-by: Leonardo Uieda <leo@uieda.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants