Skip to content

[Bug]: pydantic causing simple parse script to fail on build #141

@riley-brady

Description

@riley-brady

What happened?

When attempting to run Build.build(custom_parser) I get a pydantic error that kills the catalog build, despite following the tutorial and successfully running the parser on test files.

What did you expect to happen?

A catalog to build successfully

Minimal Complete Verifiable Example

I am trying to build a simple custom parser. I am following [this guide](https://ecgtools.readthedocs.io/en/latest/how-to/use-a-custom-parser.html).

0. Make some fake data.

for var in ['pr', 'tas', 'tasmax']:
    ds = xr.DataArray([1,2,3], dims='time').rename(var).to_dataset()
    ds.to_netcdf(f"./data/{var}_daily_EC-Earth3_historical.nc")


1. Put together list of paths.
```python
root_path = pathlib.Path("./data/")
files = list(root_path.glob("*.nc"))
# Convert to string paths since the builder only takes string paths.
files = [str(f) for f in files]
  1. I built a simple custom parser to test this out, following the guide.
def parse_dummy(file):
    fpath = pathlib.Path(file)
    info = {}

    try:
        # Just extracting metadata from the filename in this case. Same errors
        # occur when including loading into a dataset.
        variable, temporal_resolution, model, scenario = fpath.stem.split('_')
        info = {
                'variable': variable,
                'temporal': temporal_resolution,
                'source': model,
                'path': str(file),
            }

        return info

    except Exception:
        return {INVALID_ASSET: file, TRACEBACK: traceback.format_exc()}
  1. I tested this on a simple file and it was successful
parse_dummy(files[0])
{'variable': 'pr',
 'temporal': 'daily',
 'source': 'EC-Earth3',
 'path': './data/pr_daily_EC-Earth3_historical.nc'}
  1. Now I make a builder object. The object successfully returns the expected list of files.
# Tried the pathlib object here following the demo but got an error that pydantic
# wanted strings only.
cat_builder = Builder(files)
>>> Builder(paths=['data/pr_daily_EC-Earth3_historical.nc', 'data/tasmax_daily_EC-Earth3_historical.nc', 'data/tas_daily_EC-Earth3_historical.nc'], storage_options={}, depth=0, exclude_patterns=[], include_patterns=[], joblib_parallel_kwargs={})
  1. Build the catalog with the parse script.
cat_builder.build(parse_dummy)

Relevant log output

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Input In [84], in <cell line: 1>()
----> 1 cat_builder.build(parse_fake)

File ~/miniconda3/envs/analysis_py39/lib/python3.9/site-packages/pydantic/decorator.py:40, in pydantic.decorator.validate_arguments.validate.wrapper_function()

File ~/miniconda3/envs/analysis_py39/lib/python3.9/site-packages/pydantic/decorator.py:133, in pydantic.decorator.ValidatedFunction.call()

File ~/miniconda3/envs/analysis_py39/lib/python3.9/site-packages/pydantic/decorator.py:130, in pydantic.decorator.ValidatedFunction.init_model_instance()

File ~/miniconda3/envs/analysis_py39/lib/python3.9/site-packages/pydantic/main.py:342, in pydantic.main.BaseModel.__init__()

ValidationError: 2 validation errors for Build
parsing_func
  field required (type=value_error.missing)
args
  1 positional arguments expected but 2 given (type=type_error)

Anything else we need to know?

pydantic version: '1.10.2'
ecgtools version: '2022.10.7'

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions