Allow FieldTimeSeries to be read from NetCDF without reading the original architecture#5271
Allow FieldTimeSeries to be read from NetCDF without reading the original architecture#5271
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5271 +/- ##
=======================================
Coverage 73.51% 73.51%
=======================================
Files 395 395
Lines 22190 22193 +3
=======================================
+ Hits 16313 16316 +3
Misses 5877 5877
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| end | ||
|
|
||
| function reconstruct_grid(ds) | ||
| function reconstruct_grid(ds; arch=nothing) |
There was a problem hiding this comment.
also the kwarg should be architecture if using a name (that's what FieldTimeSeries uses as well)
There was a problem hiding this comment.
i think a positional arg would be fine here; in that case you can keep arch.
There was a problem hiding this comment.
if this is only called by FieldTimeSeries, then I suggest just making arch a mandatory positional argument. architecture=nothing isn't valid.
Note that we cannot save grids with a "native architecture". This produces problems in a variety of places, both with GPU grids and distributed grids (although the distributed case is to be solved in the future, not yet). Thus CPU() is default. GPU() is supported as an option, if we want to do analysis on GPU. But there is no concept of reconstructing a grid on the architecture that it was saved on.
There was a problem hiding this comment.
I think the default behavior should be to reconstruct the grid in the same architecture as it was originally constructed, no?
There was a problem hiding this comment.
Here's the rationale:
* the default becomes invalid _most_ of the time (because we cannot reconstruct distributed fields or GPU fields) * likewise, _most_ of the time (100% of the time in our current examples) we load from file to CPU * even if we can deserialize a grid directly onto an available GPU device, we cannot do this for fields. In other words, CuArray are serialized to Array. Likewise GPU grid should be serialized to CPU grid.Why do you think it would be preferred to change the default behavior? Just as an example, this would require peppering most post processing scripts with
architecture=CPU()-- quite a bit of boilerplate.
I don't understand the statement that it's not possible to reconstruct the grid on a GPU. Does this have to do with serialization to JLD2? On NetCDF, serialization isn't possible, and the grid is reconstructed by just re-issuing the construction_arguments(grid), which are saved on the NetCDF. So we don't serialize and then de-serialize. We just re-build the grid with arguments that ensure the new grid is the same as the original grid that generated the NetCDF file.
There was a problem hiding this comment.
If you don't have a GPU you cannot reconstruct a grid on GPU.
There was a problem hiding this comment.
Yes, that's obviously true and hence there's an architecture kwarg, but if do have a GPU surely you'd like to rebuild a GPU-written grid on it, no?
There was a problem hiding this comment.
Also could you please answer my questions in #5271 (comment)? I'm confused as to why you're saying it's not possible to reconstruct a grid on a GPU.
glwagner
left a comment
There was a problem hiding this comment.
some significant changes asked for, but once those are done i am happy.
On main, if I try to read from a GPU-generated NetCDF file into a
FieldTimeSeriesI get the following error if I don't have a GPU:This PR makes it so that we can bypass this by passing an
architecture keyword toreconstruct_grid()and, if the user passes it, we never have to instantiate the original architecture the file was written in.Here's a MWE for the record: