Skip to content

Comments

Dev021726#18

Open
jedwards4b wants to merge 10 commits intomainfrom
dev021726
Open

Dev021726#18
jedwards4b wants to merge 10 commits intomainfrom
dev021726

Conversation

@jedwards4b
Copy link
Contributor

Restructure project directory tree to be a better python package.
Add a script to analyze cmor_driver output and list variables which failed to generate - in two categories.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request restructures the cmip7-prep project to follow standard Python package conventions by moving source code to a src/cmip7_prep/ layout and reorganizing data files. The PR also adds a new utility script find_unmapped_vars.py to analyze cmor_driver output and categorize variables that failed to generate.

Changes:

  • Restructured package layout from flat to src-layout with src/cmip7_prep/ containing all modules
  • Added new modules for vertical interpolation, regridding, pipeline orchestration, MOM6 utilities, and caching
  • Modified parameter names from tables_path to tables_root in cmor_writer for clearer semantics
  • Added find_unmapped_vars.py script to parse and categorize failed variables from cmor_driver output
  • Removed obsolete testone.py script

Reviewed changes

Copilot reviewed 8 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/conftest.py Removed redundant comment about top-level import
src/cmip7_prep/vertical.py New module for vertical coordinate handling and pressure level interpolation
src/cmip7_prep/regrid.py New module for ESMF-based regridding with caching support
src/cmip7_prep/pipeline.py New module orchestrating native file opening, realization, and regridding
src/cmip7_prep/mom6_static.py New module for MOM6 ocean model grid utilities
src/cmip7_prep/mapping_compat.py Updated packaged resource path construction (contains critical bug)
src/cmip7_prep/cmor_writer.py Renamed tables_path to tables_root, updated CV file reference
src/cmip7_prep/cmor_utils.py New module with CMOR utility functions
src/cmip7_prep/cache_tools.py New module for regridder and FX field caching
src/cmip7_prep/init.py New package initialization file
scripts/testone.py Deleted obsolete test script
scripts/find_unmapped_vars.py New script to analyze cmor_driver output for failed variables
scripts/cmor_driver.py Updated to use tables_root instead of tables_path, added 3hr frequency
pyproject.toml Updated package structure to src-layout (contains critical packaging bug)
data/data/* Reorganized data files into data/data/ subdirectory
Comments suppressed due to low confidence (2)

src/cmip7_prep/cmor_writer.py:191

  • The controlled vocabulary file path is set to "../tables-cvs/cmor-cvs.json" using a relative path. This is fragile because:
  1. It's wrapped in str() which is redundant since it's already a string
  2. The relative path assumes a specific directory structure that may not be consistent across environments
  3. This contradicts the pattern used for other files which reference files in the tables/ directory

The path should be constructed relative to tables_root like: str(tables_root / "tables-cvs/cmor-cvs.json") or use a similar approach as with the table files.
src/cmip7_prep/mapping_compat.py:30

  • The packaged_mapping_resource function uses a manual path construction (Path(__file__).parent.parent.parent / "data" / filename) but still imports and calls as_file() on a Path object, which is incorrect. The as_file() function from importlib.resources expects a Traversable object from files(), not a Path.

Additionally, the function no longer uses the files import that was removed, but the implementation should be consistent with packaged_dataset_json in cmor_utils.py which correctly uses files("cmip7_prep").joinpath(f"data/{filename}").

jedwards4b and others added 3 commits February 20, 2026 16:18
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 19 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (3)

src/cmip7_prep/cmor_writer.py:76

  • self.tables_root is stored as the incoming tables_root value (which may be a str), but later code uses path operations like self.tables_root / "tables". This will raise a TypeError when callers pass a string. Convert once in __init__ (e.g., self.tables_root = Path(tables_root)) and keep the attribute consistently typed as Path.
    src/cmip7_prep/cmor_writer.py:191
  • _controlled_vocabulary_file is being set to a relative path (../tables-cvs/cmor-cvs.json), which depends on the current working directory and may not exist when running from other locations. Prefer resolving this relative to tables_root (or validate existence and raise a clear error) so CMOR can reliably find the CV file.
    src/cmip7_prep/mapping_compat.py:30
  • packaged_mapping_resource() builds a filesystem path via Path(__file__).parent.parent.parent / "data" / filename and then passes that Path into importlib.resources.as_file(). as_file() expects an importlib.resources.abc.Traversable (typically from importlib.resources.files()), so this will raise a TypeError and is also brittle in installed wheels. Use files("cmip7_prep").joinpath(f"data/{filename}") (and ensure the data directory is packaged under cmip7_prep/data).

Copy link
Contributor

@mvertens mvertens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to the contents of the data directory? It looks like cesm_to_cmip7.yaml is still referenced but is no longer in this repository.

@jedwards4b
Copy link
Contributor Author

@mvertens my mistake - it should be okay now.

@mvertens
Copy link
Contributor

@jedwardsrb - according to what I have been reading - small datasets should go under src/. But I can see how in this case it might be more transparent to have it at the top. What determined your decision to put the data/ directory parallel to src/?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 25 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (3)

src/cmip7_prep/cmor_writer.py:190

  • The str() call is redundant here since "../tables-cvs/cmor-cvs.json" is already a string literal. The code can be simplified to just "../tables-cvs/cmor-cvs.json".
    src/cmip7_prep/mapping_compat.py:30
  • The packaged_mapping_resource function is trying to use as_file() on a Path object, which is incorrect. The as_file() function from importlib.resources expects a Traversable object from files(), not a Path.

The correct approach should be one of:

  1. Use files("cmip7_prep").joinpath("../data/" + filename) and then as_file() on that
  2. Simply return the Path directly without using as_file() context manager
  3. Use files("data").joinpath(filename) if data is treated as a separate package

The current implementation will fail at runtime because Path objects don't have the required interface for as_file().
src/cmip7_prep/cmor_writer.py:643

  • The code uses Path(__file__).parent / "data" / "ocean_geometry.nc" to access data files, which assumes data is located at src/cmip7_prep/data/. However, based on the pyproject.toml configuration, data files are at the root data/ directory, not inside the package. This path will fail at runtime because the data files won't be found at this location.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants