-
Notifications
You must be signed in to change notification settings - Fork 7
Execute GEOS via calling gcm_run.j and create new model specific tasks. #677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
| ''' | ||
| Method to provide "forecast" directory to geos class | ||
| If paths are provided, it is combined with the forecast directory and returned | ||
| ''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better! But, FWIW, where I'd like to get with these is something like this:
| ''' | |
| Method to provide "forecast" directory to geos class | |
| If paths are provided, it is combined with the forecast directory and returned | |
| ''' | |
| """Set structure of 'forecast' directory for GEOS | |
| Args: | |
| paths: One or more paths to concatenate into a nested forecast directory structure. | |
| Returns: | |
| The full forecast directory path, appended to `self.cycle_forecast_dir`, as a single string. | |
| Examples: | |
| For example, if `obj.cycle_forecast_dir = "/path/to/cycle"` then: | |
| >>> obj.forecast_dir("forecast") | |
| "/path/to/cycle/forecast" | |
| >>> obj.forecast_dir(["nested", "dir", "structure"]) | |
| "/path/to/cycle/nested/dir/structure" | |
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more examples of Google-style docstrings, see here:
https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
Also, you don't have to change it here (because we use forecast_dir a lot, and this PR touches a lot of things!), but just flagging that it's best to avoid functions that can take multiple types whenever possible. In this case, this should either always take a string and literally concatenate it, or (perhaps, better) it should always take a list of strings (even if there's just one path). That requires much less code with, IMHO, much clearer and more consistent behavior.
def forecast_dir(self, paths: list[str]) -> str:
os.makedirs(self.cycle_forecast_dir, 0o755, exist_ok=true)
return os.path.join(self.cycle_forecast_dir, *paths)| ds = ds.rename({'aice': 'aice_h', 'hi': 'hi_h', 'hs': 'hs_h'}) | ||
|
|
||
| # Save as a new file | ||
| ds.to_netcdf(dst_history, mode='w') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest being explicit about format here.
| ds.to_netcdf(dst_history, mode='w') | |
| ds.to_netcdf(dst_history, mode='w', format='NETCDF4') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would write a *.nc4 file, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format="NETCDF4" has nothing to do with the file extension. It will write to whatever the full value of dst_history is. If dst_history = "somefile.crazy", it will write a file called somefile.crazy.
The point here is to (1) make it clear to other/future programmers that we're writing NetCDF4 files; and (2) make this code agnostic to the current dependencies installed and to xarray.to_netcdf's internal rules about how it decides what kinds of "netcdf" files to write by default (e.g., right now, I think the rule is something like: Use NetCDF4 if h5netcdf or netcdf4 is installed; otherwise, fall back to netcdf3...but that might change in the future).
|
|
||
| except Exception: | ||
| logger.abort('Copying failed, see if source files exists') | ||
| except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, avoid catch-all except Exception.
If shutil is most likely to throw an OSError, we can catch that explicitly and let all other errors bubble up naturally.
except OSError as e:
logger.abort("...")Better yet, in general, we should start avoiding this pattern altogether and just let exceptions get raised natively by Python and bubble up through the call stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand OSError is a subset of Exception. Are you suggesting also to avoid as e part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, OSError is a subclass of Exception, but that's exactly the point: It's bad engineering practice to be overly indiscriminate in handling exceptions (i.e., except Exception). Basically, the problem is that SystemExit (triggered by some system error calls), KeyboardInterrupt (the exception triggered by pressing Control-C), and lots of other things are all rolled up under Exception. In most cases, you do not want SystemExit and KeyboardInterrupt to be handled in the same ways as other things (and usually, you don't want these handled at all but raised immediately).
As a specific example: As written right now, if you try to interrupt this code with Control-C while it happens to be inside this try-except block, you will not interrupt immediately but rather keep executing via the logger.abort statement.
Here is a more detailed writeup: https://jerrynsh.com/stop-using-exceptions-like-this-in-python/
All that said: logger.abort will just re-raise the same exception with a logging message, so the practical consequences of this are relatively minor. But we could simplify a lot of our code by just not bothering with logger.abort in the first place and just letting exceptions raise themselves --- Python is perfectly good at printing out backtraces and other useful information when it raises errors without it needing to be wrapped in a logger.
|
|
||
| except Exception: | ||
| logger.abort('Moving failed, see if source files exist') | ||
| except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto to above.
| "sea_ice_snow_thickness" | ||
| ]), | ||
| qd.window_length("PT6H"), | ||
| qd.window_offset("PT3H"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The window_offset causes failures since its been removed from swell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had that change in my full PR. I don't think this PR will work without the remainder of the parts.
Following the comments on my previous PR, I created a more "lightweight" PR. I realize this is not an easy PR for others who never executed GEOS within SWELL to review. Changes to R2D2 and GEOS model version hasn't helped in terms of simplifying this either. This is still a work in progress, and I don't even think it will work in this form, but if #626 goes in first I could simplify this a little bit more.
I added files to ignore at the end. I realize I should be more selective in my commits in terms of what goes in. Also better docstrings are required (I see @ashiklom's comments in other PRs). With those in mind..
There are two main things happening here (read the section about
gcm_run.jat the end of the description for model execution task):gcm_run.jdirectly. To faciliate this aforecastdirectory was created under{swell_exp_dir}/GEOSgcm/forecast. Thisforecastfolder is a replication of a GEOS experiment folder, with only a few changes regarding where HOMDIR, EXPDIR are defined. Model execution happens under{swell_exp_dir}/GEOSgcm/forecast/scratchsimilar to typical GEOS model runs.Why was this change necessary:
/RCfiles) in the forecast directory. This creates incompatibility while running/testing different GEOSgcm versions.forecastdir can't be updated in finalflow.cylcif it is templated in a time dependent way.subprocesssimply didn't run with GEOSv12 on Milan nodes, I tried many combinations, didn't pass beyond initialization.gcm_run.j. If users make mistake in terms of requesting sufficient SLURM nodes, GEOS tries submitting hundreds of instances to compensate lack of compute resources, then NCCS will yell at you.gcm_run.jandgcm_setup.jscripts are being or will be modernized. This is work underway but might take a long time (especiallygcm_run.j).gcm_run.jin SWELL, some parts should be erased or commented out. Or, my idea is that there could be conditional sections ingcm_run.jsaySWELL_active, thengcm_run.jcan skip those sections, which are mainlypostprocessinganyway.3dfgat_coupled_cycle(not the best name) is added.Ignore these changes (has no impact or related to grid change):
src/swell/configuration/jedi/interfaces/geos_marine/model/background_error.yaml
src/swell/tasks/generate_b_climatology.py
Finally, little primer on
gcm_run.jLet's consider
gcm_run.jin 4 stages:In the current implementation, SWELL handles 2 & 3 via python and
subprocessand 1 is assumed to be set properly by the user, which caused trouble with the NCCS. For DA purposes 4, postprocessing is explicitly handled by SWELL but that is not the focus of this PR.In this proposed implementation, the main difference is that we rely on
gcm_run.jfor 2 and 3 by conducting surgical edits viaPrepCoupledGeosRundirat few locations and runninggcm_run.jdirectly from Cylc (which doesn't capture failed exit status):I created the
3dfgat_coupled_cyclesuite for testing, should work by default if anyone has time to check it out.