Conversation
There was a problem hiding this comment.
Pull request overview
This pull request restructures the cmip7-prep project to follow standard Python package conventions by moving source code to a src/cmip7_prep/ layout and reorganizing data files. The PR also adds a new utility script find_unmapped_vars.py to analyze cmor_driver output and categorize variables that failed to generate.
Changes:
- Restructured package layout from flat to src-layout with
src/cmip7_prep/containing all modules - Added new modules for vertical interpolation, regridding, pipeline orchestration, MOM6 utilities, and caching
- Modified parameter names from
tables_pathtotables_rootin cmor_writer for clearer semantics - Added
find_unmapped_vars.pyscript to parse and categorize failed variables from cmor_driver output - Removed obsolete
testone.pyscript
Reviewed changes
Copilot reviewed 8 out of 19 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/conftest.py | Removed redundant comment about top-level import |
| src/cmip7_prep/vertical.py | New module for vertical coordinate handling and pressure level interpolation |
| src/cmip7_prep/regrid.py | New module for ESMF-based regridding with caching support |
| src/cmip7_prep/pipeline.py | New module orchestrating native file opening, realization, and regridding |
| src/cmip7_prep/mom6_static.py | New module for MOM6 ocean model grid utilities |
| src/cmip7_prep/mapping_compat.py | Updated packaged resource path construction (contains critical bug) |
| src/cmip7_prep/cmor_writer.py | Renamed tables_path to tables_root, updated CV file reference |
| src/cmip7_prep/cmor_utils.py | New module with CMOR utility functions |
| src/cmip7_prep/cache_tools.py | New module for regridder and FX field caching |
| src/cmip7_prep/init.py | New package initialization file |
| scripts/testone.py | Deleted obsolete test script |
| scripts/find_unmapped_vars.py | New script to analyze cmor_driver output for failed variables |
| scripts/cmor_driver.py | Updated to use tables_root instead of tables_path, added 3hr frequency |
| pyproject.toml | Updated package structure to src-layout (contains critical packaging bug) |
| data/data/* | Reorganized data files into data/data/ subdirectory |
Comments suppressed due to low confidence (2)
src/cmip7_prep/cmor_writer.py:191
- The controlled vocabulary file path is set to
"../tables-cvs/cmor-cvs.json"using a relative path. This is fragile because:
- It's wrapped in
str()which is redundant since it's already a string - The relative path assumes a specific directory structure that may not be consistent across environments
- This contradicts the pattern used for other files which reference files in the
tables/directory
The path should be constructed relative to tables_root like: str(tables_root / "tables-cvs/cmor-cvs.json") or use a similar approach as with the table files.
src/cmip7_prep/mapping_compat.py:30
- The
packaged_mapping_resourcefunction uses a manual path construction (Path(__file__).parent.parent.parent / "data" / filename) but still imports and callsas_file()on aPathobject, which is incorrect. Theas_file()function fromimportlib.resourcesexpects aTraversableobject fromfiles(), not aPath.
Additionally, the function no longer uses the files import that was removed, but the implementation should be consistent with packaged_dataset_json in cmor_utils.py which correctly uses files("cmip7_prep").joinpath(f"data/{filename}").
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 19 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (3)
src/cmip7_prep/cmor_writer.py:76
self.tables_rootis stored as the incomingtables_rootvalue (which may be astr), but later code uses path operations likeself.tables_root / "tables". This will raise aTypeErrorwhen callers pass a string. Convert once in__init__(e.g.,self.tables_root = Path(tables_root)) and keep the attribute consistently typed asPath.
src/cmip7_prep/cmor_writer.py:191_controlled_vocabulary_fileis being set to a relative path (../tables-cvs/cmor-cvs.json), which depends on the current working directory and may not exist when running from other locations. Prefer resolving this relative totables_root(or validate existence and raise a clear error) so CMOR can reliably find the CV file.
src/cmip7_prep/mapping_compat.py:30packaged_mapping_resource()builds a filesystem path viaPath(__file__).parent.parent.parent / "data" / filenameand then passes thatPathintoimportlib.resources.as_file().as_file()expects animportlib.resources.abc.Traversable(typically fromimportlib.resources.files()), so this will raise aTypeErrorand is also brittle in installed wheels. Usefiles("cmip7_prep").joinpath(f"data/{filename}")(and ensure the data directory is packaged undercmip7_prep/data).
mvertens
left a comment
There was a problem hiding this comment.
What happened to the contents of the data directory? It looks like cesm_to_cmip7.yaml is still referenced but is no longer in this repository.
|
@mvertens my mistake - it should be okay now. |
|
@jedwardsrb - according to what I have been reading - small datasets should go under src/. But I can see how in this case it might be more transparent to have it at the top. What determined your decision to put the data/ directory parallel to src/? |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 25 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (3)
src/cmip7_prep/cmor_writer.py:190
- The
str()call is redundant here since"../tables-cvs/cmor-cvs.json"is already a string literal. The code can be simplified to just"../tables-cvs/cmor-cvs.json".
src/cmip7_prep/mapping_compat.py:30 - The
packaged_mapping_resourcefunction is trying to useas_file()on aPathobject, which is incorrect. Theas_file()function fromimportlib.resourcesexpects aTraversableobject fromfiles(), not aPath.
The correct approach should be one of:
- Use
files("cmip7_prep").joinpath("../data/" + filename)and thenas_file()on that - Simply return the Path directly without using
as_file()context manager - Use
files("data").joinpath(filename)if data is treated as a separate package
The current implementation will fail at runtime because Path objects don't have the required interface for as_file().
src/cmip7_prep/cmor_writer.py:643
- The code uses
Path(__file__).parent / "data" / "ocean_geometry.nc"to access data files, which assumes data is located atsrc/cmip7_prep/data/. However, based on the pyproject.toml configuration, data files are at the rootdata/directory, not inside the package. This path will fail at runtime because the data files won't be found at this location.
Restructure project directory tree to be a better python package.
Add a script to analyze cmor_driver output and list variables which failed to generate - in two categories.