Skip to content

[ECmean] startdate and enddate management #168

@mcadau

Description

@mcadau

machine

Lumi

catalog

climatedt-phase1

version

main

version

main

What happened?

While working on #162, I tried running aqua analysis multiple times with a few months of data, to check whether NotEnoughdataError, in case than a time span smaller than required months threshold, was triggered properly.
During these runs, I noticed ECmean (which theoretically requires at least 12 months of data) was never failing on this context, even providing 1 month of data: plots are generated with no major issues on this.

Looking more in detail, it seems that reader_data, which basically retrieves data via the Reader, does not take into account startdate and enddate (line 79-87 of ECmean CLI):

    # Try to read the data, if dataset is not available return None
    try:
        reader = Reader(
            model=model, exp=exp, source=source, catalog=catalog, 
            regrid=regrid, **reader_kwargs
        )
        xfield = reader.retrieve()
        if regrid is not None:
            xfield = reader.regrid(xfield)

As such, startdate and enddate are actually extracted from config:

        startdate = get_arg(args, 'startdate', dataset.get('startdate'))
        enddate = get_arg(args, 'enddate', dataset.get('enddate'))

But they are never taken into account from the Reader extraction:

data_atm = reader_data(model=model, exp=exp, source=source_atm,
                       catalog=catalog, keep_vars=atm_vars, regrid=regrid,
                       reader_kwargs=reader_kwargs)

As such, reader retrieves ALL data from the source, and time_check, later, looks into startdate and enddate and check if data are correct.

So,

  1. if you give startdate and enddate with less than 12 months, ECmean doesn't break, but generates the plots according to the year those data you selected as startdate and enddate belong to; I don't know if this is intentional, but it seems like a bug, or at least misleading, since I ask for a couple of months and I obtain plots about months that I maybe didn't mean to ask
  2. I think that retrieving every month from the source is quite inefficient if you need a source subset smaller than the whole source timeframe. Am I missing something?

Are you interested in making a pull request?

Maybe

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions