-
Notifications
You must be signed in to change notification settings - Fork 16
Added generic fallback method to to_device
#2362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code for this is identical to what is written above, so it doesn't make sense to add this if everything goes through the same thing anyway.
Also, I don't think this fallback should be added, since there could be correctness issue if
Adapt.adapt
didn't throw an error, but the object isn't meant to be put onto the GPU.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, not really. What's written above limits
to_device
to a union of certain ClimaCore data structures. We could either extendto_device
via a generic fallback (as in my PR), or remove the restrictions, if we wantto_device
to be more versatile.I suppose you are correct to be more cautious, but I submitted this PR due to an issue I faced. Specifically, the full orographic gravity wave pipeline requires loading in an external orography file and doing some preprocessing analysis on this dataset. These steps are done on the CPU.
Now, in one of the tests, see for example the link below, I explicitly move the arrays initialised on the CPU to the GPU via
to_device
. These GPU arrays are then used in the GPU orographic gravity wave parameterization. One of the obstacles was the existingto_device
would not work to move instances ofThermodynamicsParameters
to the GPU, and this PR resolves that issue.I understand that my integral test is not very idiomatic Clima, but that is because I wanted to integrate CPU preprocessing with GPU ClimaAtmos computations into one integral test. But if you have a better solution to this problem, please let me know! :)
https://github.com/ray-chew/ClimaAtmos.jl/blob/2def4f0003c8326e80126ee9db29eb09d6e2b06a/test/parameterized_tendencies/gravity_wave/orographic_gravity_wave/ogwd_3d_gpu_integral.jl#L216C2-L219C1
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Aside: @ray-chew We should drop support for the
Fields.bycolumn
usage in the linked test above ; I also think we may be able to replaceinterp_latlong2cg
with theSpaceVaryingInput
utility. )There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the whole
bycolumn
,parent
, and etc part of the test is the CPU part I mentioned in my reply to Kevin. Which is the reason for me moving between host and device withto_device
.In ClimaAtmos #3867 point 3, I mentioned that we should move all these to GPU-friendly code. However, if we can already use the existing machinery in a preprocessing step, the cost of refactoring these right now is too high with too little benefits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ph-kev : Given @ray-chew 's workflow in ClimaAtmos - is there a reasonable alternative that avoids this generic method addition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ray-chew
@ph-kev and I are a bit confused by this. The
thermo_params
shouldn't need to be moved to the device because it is already isbits.For example:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ph-kev @imreddyTeja : Confirming that I can replicate your results where
c
is aClimaCore.Fields.Field (VIJFH layout)
andparams
has the same type as above. @ray-chew and I looked over the test setup where this issue popped up - turns out it was related to theBroadcast space mismatch
error since the test problem involves computing subgrid variables on the CPU on a lat-long grid given some source dataset, and then moving them to the GPU : inconsistent spaces meant that theto_device
was being used as a somewhat hacky solution. A better solution seems to be to use theFields.Field(Fields.field_values(x),S)
and ensuring that the target spaceS
is always identical (thermo_params
don't need additional manipulation). (His test case now runs on GPU following this change). We can discuss this further, but I'm closing this issue for now.Thanks @ray-chew @ph-kev @imreddyTeja.