diff --git a/doc/api/esmvalcore.esgf.rst b/doc/api/esmvalcore.esgf.rst index 58f84f9401..abbb1722ba 100644 --- a/doc/api/esmvalcore.esgf.rst +++ b/doc/api/esmvalcore.esgf.rst @@ -5,6 +5,7 @@ esmvalcore.esgf --------------- .. automodule:: esmvalcore.esgf :no-inherited-members: + :no-index: esmvalcore.esgf.facets ---------------------- diff --git a/doc/api/esmvalcore.io.esgf.rst b/doc/api/esmvalcore.io.esgf.rst new file mode 100644 index 0000000000..17b10e1b69 --- /dev/null +++ b/doc/api/esmvalcore.io.esgf.rst @@ -0,0 +1,8 @@ +esmvalcore.io.esgf +================== +.. automodule:: esmvalcore.io.esgf + :no-inherited-members: + +esmvalcore.io.esgf.facets +------------------------- +.. automodule:: esmvalcore.io.esgf.facets diff --git a/doc/api/esmvalcore.io.local.rst b/doc/api/esmvalcore.io.local.rst new file mode 100644 index 0000000000..63f3655327 --- /dev/null +++ b/doc/api/esmvalcore.io.local.rst @@ -0,0 +1,5 @@ +esmvalcore.io.local +=================== + +.. automodule:: esmvalcore.io.local + :no-inherited-members: diff --git a/doc/api/esmvalcore.io.rst b/doc/api/esmvalcore.io.rst index 5d41a029c0..831f07c25a 100644 --- a/doc/api/esmvalcore.io.rst +++ b/doc/api/esmvalcore.io.rst @@ -7,12 +7,17 @@ In the future, this module may be extended with support for writing output data. The interface is defined in the :mod:`esmvalcore.io.protocol` module and the other modules here provide an implementation for a particular data source. +esmvalcore.io +------------- +.. automodule:: esmvalcore.io + +Submodules +`````````` + .. toctree:: :maxdepth: 1 - esmvalcore.io.protocol + esmvalcore.io.esgf esmvalcore.io.intake_esgf - -esmvalcore.io -------------- -.. automodule:: esmvalcore.io + esmvalcore.io.local + esmvalcore.io.protocol diff --git a/doc/quickstart/configure.rst b/doc/quickstart/configure.rst index c34fab928a..9b1383e7ca 100644 --- a/doc/quickstart/configure.rst +++ b/doc/quickstart/configure.rst @@ -732,8 +732,8 @@ There are three modules available as part of ESMValCore that provide data source - :mod:`esmvalcore.io.intake_esgf`: Use the `intake-esgf `_ library to load data that is available from ESGF. -- :mod:`esmvalcore.local`: Use :mod:`glob` patterns to find files on a filesystem. -- :mod:`esmvalcore.esgf`: Use the legacy `esgf-pyclient +- :mod:`esmvalcore.io.local`: Use :mod:`glob` patterns to find files on a filesystem. +- :mod:`esmvalcore.io.esgf`: Use the legacy `esgf-pyclient `_ library to find and download data from ESGF. @@ -755,7 +755,7 @@ commands: esmvaltool config copy data-local-esmvaltool.yml This will use the :mod:`esmvalcore.io.intake_esgf` module to access data -that is available through ESGF and use :mod:`esmvalcore.local` to find +that is available through ESGF and use :mod:`esmvalcore.io.local` to find observational and reanalysis datasets that have been :ref:`CMORized with ESMValTool ` (``OBS6`` and ``OBS`` projects for CMIP6- and CMIP5-style CMORization @@ -805,7 +805,7 @@ and tailor it for your system. .. note:: Deduplicating data found via :mod:`esmvalcore.io.intake_esgf` data sources - and the :mod:`esmvalcore.local` data sources has not yet been implemented. + and the :mod:`esmvalcore.io.local` data sources has not yet been implemented. Therefore it is recommended not to use the configuration option ``search_data: complete`` when using both data sources for the same project. The ``search_data: quick`` option can be safely used. @@ -831,7 +831,7 @@ This is particularly useful for native datasets which do not follow the CMOR standard by default and consequently produce a lot of warnings when handled by Iris. This can be configured using the ``ignore_warnings`` argument to -:class:`esmvalcore.local.LocalDataSource`. +:class:`esmvalcore.io.local.LocalDataSource`. Here is an example on how to ignore specific warnings when loading data from the ``EMAC`` model in its native format: @@ -964,7 +964,7 @@ The ``esmvaltool run`` command can automatically download the files required to run a recipe from ESGF for the projects CMIP3, CMIP5, CMIP6, CORDEX, and obs4MIPs. Refer to :ref:`config-data-sources` for instructions on how to set this up. This -section describes additional configuration options for the :mod:`esmvalcore.esgf` +section describes additional configuration options for the :mod:`esmvalcore.io.esgf` module, which is based on the legacy esgf-pyclient_ library. Most users will not need this. @@ -987,7 +987,7 @@ will not need this. Configuration file ------------------ An optional configuration file can be created for configuring how the -:class:`esmvalcore.esgf.ESGFDataSource` uses esgf-pyclient_ +:class:`esmvalcore.io.esgf.ESGFDataSource` uses esgf-pyclient_ to find and download data. The name of this file is ``~/.esmvaltool/esgf-pyclient.yml``. @@ -1076,7 +1076,7 @@ but it may be useful to understand its content. The settings from this file are being moved to the :ref:`new configuration system `. In particular, the ``input_dir``, ``input_file``, and ``ignore_warnings`` settings have already -been replaced by the :class:`esmvalcore.local.LocalDataSource` that can be +been replaced by the :class:`esmvalcore.io.local.LocalDataSource` that can be configured via :ref:`data sources `. The developer configuration file will be installed along with ESMValCore and can also be viewed on GitHub: @@ -1121,7 +1121,7 @@ Preprocessor output files ------------------------- The filename to use for preprocessed data is configured using ``output_file``, -similar to the filename template in :class:`esmvalcore.local.LocalDataSource`. +similar to the filename template in :class:`esmvalcore.io.local.LocalDataSource`. Note that the extension ``.nc`` (and if applicable, a start and end time) will automatically be appended to the filename. diff --git a/doc/quickstart/find_data.rst b/doc/quickstart/find_data.rst index 939e65f378..920e734023 100644 --- a/doc/quickstart/find_data.rst +++ b/doc/quickstart/find_data.rst @@ -810,7 +810,7 @@ file name than for the netCDF4 variable name. To apply the extra facets for this purpose, simply use the corresponding tag in the applicable ``filename_template`` or ``dirname_template`` in -:class:`esmvalcore.local.LocalDataSource`. +:class:`esmvalcore.io.local.LocalDataSource`. For example, given the extra facets @@ -834,5 +834,5 @@ a corresponding entry in the configuration file could look like: The same replacement mechanism can be employed everywhere where tags can be used, particularly in ``dirname_template`` and ``filename_template`` in -:class:`esmvalcore.local.LocalDataSource`, and in ``output_file`` in +:class:`esmvalcore.io.local.LocalDataSource`, and in ``output_file`` in :ref:`config-developer.yml `. diff --git a/esmvalcore/_recipe/check.py b/esmvalcore/_recipe/check.py index ea528c71f8..1aae4b6aef 100644 --- a/esmvalcore/_recipe/check.py +++ b/esmvalcore/_recipe/check.py @@ -16,7 +16,7 @@ import esmvalcore.preprocessor from esmvalcore.exceptions import InputFilesNotFound, RecipeError -from esmvalcore.local import _parse_period +from esmvalcore.io.local import _parse_period from esmvalcore.preprocessor import TIME_PREPROCESSORS, PreprocessingTask from esmvalcore.preprocessor._multimodel import _get_operator_and_kwargs from esmvalcore.preprocessor._other import _get_var_info diff --git a/esmvalcore/_recipe/recipe.py b/esmvalcore/_recipe/recipe.py index 4d11956467..7476c125b9 100644 --- a/esmvalcore/_recipe/recipe.py +++ b/esmvalcore/_recipe/recipe.py @@ -14,7 +14,8 @@ import yaml -from esmvalcore import __version__, esgf +import esmvalcore.io.esgf +from esmvalcore import __version__ from esmvalcore._provenance import get_recipe_provenance from esmvalcore._task import DiagnosticTask, ResumeTask, TaskSet from esmvalcore.config._config import TASKSEP @@ -22,7 +23,7 @@ from esmvalcore.config._diagnostics import TAGS from esmvalcore.dataset import Dataset from esmvalcore.exceptions import InputFilesNotFound, RecipeError -from esmvalcore.local import ( +from esmvalcore.io.local import ( GRIB_FORMATS, _dates_to_timerange, _get_multiproduct_filename, @@ -1327,7 +1328,7 @@ def run(self) -> None: # Download required data # Add a special case for ESGF files to enable parallel downloads - esgf.download(self._download_files) + esmvalcore.io.esgf.download(self._download_files) for file in self._download_files: file.prepare() diff --git a/esmvalcore/_recipe/to_datasets.py b/esmvalcore/_recipe/to_datasets.py index 65951450d1..54b26393ae 100644 --- a/esmvalcore/_recipe/to_datasets.py +++ b/esmvalcore/_recipe/to_datasets.py @@ -10,9 +10,9 @@ from esmvalcore._recipe.check import get_no_data_message from esmvalcore.cmor.table import _CMOR_KEYS, _update_cmor_facets from esmvalcore.dataset import INHERITED_FACETS, Dataset, _isglob -from esmvalcore.esgf.facets import FACETS from esmvalcore.exceptions import RecipeError -from esmvalcore.local import _replace_years_with_timerange +from esmvalcore.io.esgf.facets import FACETS +from esmvalcore.io.local import _replace_years_with_timerange from esmvalcore.preprocessor._derive import get_required from esmvalcore.preprocessor._io import DATASET_KEYS from esmvalcore.preprocessor._supplementary_vars import ( diff --git a/esmvalcore/cmor/_fixes/icon/_base_fixes.py b/esmvalcore/cmor/_fixes/icon/_base_fixes.py index c94b3483c6..8d669191cd 100644 --- a/esmvalcore/cmor/_fixes/icon/_base_fixes.py +++ b/esmvalcore/cmor/_fixes/icon/_base_fixes.py @@ -23,7 +23,7 @@ from iris.cube import CubeList from iris.mesh import Connectivity, MeshXY -import esmvalcore.local +import esmvalcore.io.local from esmvalcore.cmor._fixes.native_datasets import NativeDatasetFix from esmvalcore.config._data_sources import _get_data_sources from esmvalcore.iris_helpers import add_leading_dim_to_cube, date2num @@ -328,7 +328,7 @@ def _get_grid_from_rootpath(self, grid_name: str) -> CubeList | None: """Try to get grid from the ICON rootpath.""" glob_patterns: list[Path] = [] for data_source in _get_data_sources(self.session, "ICON"): # type: ignore[arg-type] - if isinstance(data_source, esmvalcore.local.LocalDataSource): + if isinstance(data_source, esmvalcore.io.local.LocalDataSource): glob_patterns.extend( data_source._get_glob_patterns(**self.extra_facets), # noqa: SLF001 ) diff --git a/esmvalcore/config/_data_sources.py b/esmvalcore/config/_data_sources.py index d4f8ad80c3..db4d1feaff 100644 --- a/esmvalcore/config/_data_sources.py +++ b/esmvalcore/config/_data_sources.py @@ -7,8 +7,8 @@ import yaml -import esmvalcore.esgf -import esmvalcore.esgf.facets +import esmvalcore.io.esgf +import esmvalcore.io.esgf.facets import esmvalcore.local from esmvalcore.exceptions import InvalidConfigParameter, RecipeError from esmvalcore.io import load_data_sources @@ -52,16 +52,18 @@ def _get_data_sources( # Use legacy data sources from config-user.yml and config-developer.yml. data_sources: list[DataSource] = [] try: - legacy_local_data_sources = esmvalcore.local._get_data_sources(project) # noqa: SLF001 + legacy_local_data_sources = esmvalcore.local._get_data_sources( # noqa: SLF001 + project, + ) except (RecipeError, KeyError): # The project is not configured in config-developer.yml legacy_local_data_sources = [] else: if ( session.get("search_esgf", "") != "never" - and project in esmvalcore.esgf.facets.FACETS + and project in esmvalcore.io.esgf.facets.FACETS ): - data_source = esmvalcore.esgf.ESGFDataSource( + data_source = esmvalcore.io.esgf.ESGFDataSource( name="legacy-esgf", project=project, priority=2, diff --git a/esmvalcore/config/configurations/data-esmvalcore-esgf.yml b/esmvalcore/config/configurations/data-esmvalcore-esgf.yml index 4ebae5e04a..267dd829f9 100644 --- a/esmvalcore/config/configurations/data-esmvalcore-esgf.yml +++ b/esmvalcore/config/configurations/data-esmvalcore-esgf.yml @@ -1,12 +1,12 @@ -# Download CMIP, CORDEX, and obs4MIPs data from ESGF using the `esmvalcore.esgf` +# Download CMIP, CORDEX, and obs4MIPs data from ESGF using the `esmvalcore.io.esgf` # module, which uses the legacy ESGF search interface. projects: CMIP6: &esgf-pyclient-data data: esgf-pyclient: - type: "esmvalcore.esgf.ESGFDataSource" + type: "esmvalcore.io.esgf.ESGFDataSource" download_dir: ~/climate_data - # Use a lower priority than for esmvalcore.local.LocalDataSource + # Use a lower priority than for esmvalcore.io.local.LocalDataSource # to avoid searching ESGF with the setting `search_esgf: when_missing`. priority: 10 CMIP5: diff --git a/esmvalcore/config/configurations/data-hpc-badc.yml b/esmvalcore/config/configurations/data-hpc-badc.yml index 9bcc90e8d9..98225d10a9 100644 --- a/esmvalcore/config/configurations/data-hpc-badc.yml +++ b/esmvalcore/config/configurations/data-hpc-badc.yml @@ -3,49 +3,49 @@ projects: CMIP6: data: badc: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /badc/cmip6/data dirname_template: "{project}/{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" CMIP5: data: badc: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /badc/cmip5/data dirname_template: "{project.lower}/{product}/{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" CMIP3: data: badc: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /badc/cmip3_drs/data dirname_template: "{project.lower}/output/{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{short_name}/{ensemble}/{version}" filename_template: "{short_name}_*.nc" CORDEX: data: badc: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /badc/cordex/data dirname_template: "{project}/output/{domain}/{institute}/{driver}/{exp}/{ensemble}/{institute}-{dataset}/{rcm_version}/{mip}/{short_name}/{version}" filename_template: "{short_name}_{domain}_{driver}_{exp}_{ensemble}_{institute}-{dataset}_{rcm_version}_{mip}*.nc" obs4MIPs: data: badc: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /gws/nopw/j04/esmeval/obsdata-v2 dirname_template: "Tier{tier}/{dataset}" filename_template: "{short_name}_*.nc" OBS6: data: badc: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /gws/nopw/j04/esmeval/obsdata-v2 dirname_template: "Tier{tier}/{dataset}" filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" OBS: data: badc: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /gws/nopw/j04/esmeval/obsdata-v2 dirname_template: "Tier{tier}/{dataset}" filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" diff --git a/esmvalcore/config/configurations/data-hpc-dkrz.yml b/esmvalcore/config/configurations/data-hpc-dkrz.yml index 3ad4a4fb31..9f519490dd 100644 --- a/esmvalcore/config/configurations/data-hpc-dkrz.yml +++ b/esmvalcore/config/configurations/data-hpc-dkrz.yml @@ -3,88 +3,88 @@ projects: CMIP6: data: dkrz: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /work/ik1017/CMIP6/data dirname_template: "{project}/{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" esgf-cache: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /work/bd0854/DATA/ESMValTool2/download dirname_template: "{project}/{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" CMIP5: data: dkrz: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /work/kd0956/CMIP5/data dirname_template: "{project.lower}/{product}/{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" esgf-cache: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /work/bd0854/DATA/ESMValTool2/download dirname_template: "{project.lower}/{product}/{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" CMIP3: data: dkrz: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /work/bd0854/DATA/ESMValTool2/CMIP3 dirname_template: "{exp}/{modeling_realm}/{frequency}/{short_name}/{dataset}/{ensemble}" filename_template: "{short_name}_*.nc" esgf-cache: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /work/bd0854/DATA/ESMValTool2/download dirname_template: "{project.lower}/{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{ensemble}/{short_name}/{version}" filename_template: "{short_name}_*.nc" CORDEX: data: dkrz: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /work/ik1017/C3SCORDEX/data/c3s-cordex/output dirname_template: "{domain}/{institute}/{driver}/{exp}/{ensemble}/{institute}-{dataset}/{rcm_version}/{mip}/{short_name}/{version}" filename_template: "{short_name}_{domain}_{driver}_{exp}_{ensemble}_{institute}-{dataset}_{rcm_version}_{mip}*.nc" esgf-cache: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /work/bd0854/DATA/ESMValTool2/download dirname_template: "{project.lower}/output/{domain}/{institute}/{driver}/{exp}/{ensemble}/{dataset}/{rcm_version}/{frequency}/{short_name}/{version}" filename_template: "{short_name}_{domain}_{driver}_{exp}_{ensemble}_{institute}-{dataset}_{rcm_version}_{mip}*.nc" obs4MIPs: data: dkrz: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /work/bd0854/DATA/ESMValTool2/OBS dirname_template: "Tier{tier}/{dataset}" filename_template: "{short_name}_*.nc" esgf-cache: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /work/bd0854/DATA/ESMValTool2/download dirname_template: "{project}/{dataset}/{version}" filename_template: "{short_name}_*.nc" native6: data: dkrz: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /work/bd0854/DATA/ESMValTool2/RAWOBS dirname_template: "Tier{tier}/{dataset}/{version}/{frequency}/{short_name}" filename_template: "*.nc" # ERA5 data in GRIB format: # https://docs.dkrz.de/doc/dataservices/finding_and_accessing_data/era_data/index.html#pool-data-era5-file-and-directory-names dkrz-era5: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /pool/data/ERA5 dirname_template: "{family}/{level}/{type}/{tres}/{grib_id}" filename_template: "{family}{level}{typeid}_{tres}_*_{grib_id}.grb" OBS6: data: dkrz: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /work/bd0854/DATA/ESMValTool2/OBS dirname_template: "Tier{tier}/{dataset}" filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" OBS: data: dkrz: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /work/bd0854/DATA/ESMValTool2/OBS dirname_template: "Tier{tier}/{dataset}" filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" diff --git a/esmvalcore/config/configurations/data-hpc-ethz.yml b/esmvalcore/config/configurations/data-hpc-ethz.yml index 390ed055ac..ea53dc7471 100644 --- a/esmvalcore/config/configurations/data-hpc-ethz.yml +++ b/esmvalcore/config/configurations/data-hpc-ethz.yml @@ -3,28 +3,28 @@ projects: CMIP6: data: ethz: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /net/atmos/data dirname_template: "{project.lower}/{exp}/{mip}/{short_name}/{dataset}/{ensemble}/{grid}/" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" CMIP5: data: ethz: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /net/atmos/data dirname_template: "{project.lower}/{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" CMIP3: data: ethz: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /net/atmos/data dirname_template: "{project.lower}/{exp}/{modeling_realm}/{frequency}/{short_name}/{dataset}/{ensemble}" filename_template: "{short_name}_*.nc" OBS: data: ethz: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /net/exo/landclim/PROJECTS/C3S/datadir/obsdir/ dirname_template: "Tier{tier}/{dataset}" filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" diff --git a/esmvalcore/config/configurations/data-hpc-ipsl.yml b/esmvalcore/config/configurations/data-hpc-ipsl.yml index 914409643d..d3725bb4ca 100644 --- a/esmvalcore/config/configurations/data-hpc-ipsl.yml +++ b/esmvalcore/config/configurations/data-hpc-ipsl.yml @@ -3,35 +3,35 @@ projects: CMIP6: data: ipsl: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /bdd dirname_template: "{project}/{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" CMIP5: data: ipsl: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /bdd dirname_template: "{project}/output/{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" CMIP3: data: ipsl: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /bdd dirname_template: "{project}/{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{ensemble}/{short_name}/{version}/{short_name}" filename_template: "{short_name}_*.nc" CORDEX: data: ipsl: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /bdd dirname_template: "{project}/output/{domain}/{institute}/{driver}/{exp}/{ensemble}/{institute}-{dataset}/{rcm_version}/{mip}/{short_name}/{version}" filename_template: "{short_name}_{domain}_{driver}_{exp}_{ensemble}_{institute}-{dataset}_{rcm_version}_{mip}*.nc" obs4MIPs: data: ipsl: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /bdd dirname_template: "{project}/obs-CFMIP/observations/{realm}/{short_name}/{frequency}/{grid}/{institute}/{dataset}/{version}" filename_template: "{short_name}_*.nc" diff --git a/esmvalcore/config/configurations/data-hpc-mo.yml b/esmvalcore/config/configurations/data-hpc-mo.yml index 686c93b4d9..8b10c5b4df 100644 --- a/esmvalcore/config/configurations/data-hpc-mo.yml +++ b/esmvalcore/config/configurations/data-hpc-mo.yml @@ -3,7 +3,7 @@ projects: CMIP6: data: mo: &cmip6 - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /data/users/managecmip/champ dirname_template: "{project}/{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" @@ -14,7 +14,7 @@ projects: CMIP5: data: mo: &cmip5 - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /data/users/managecmip/champ dirname_template: "{project.lower}/{product}/{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" @@ -25,7 +25,7 @@ projects: CORDEX: data: mo: &cordex - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /data/users/managecmip/champ dirname_template: "{project.lower}/output/{domain}/{institute}/{driver}/{exp}/{ensemble}/{institute}-{dataset}/{rcm_version}/{mip}/{short_name}/{version}" filename_template: "{short_name}_{domain}_{driver}_{exp}_{ensemble}_{institute}-{dataset}_{rcm_version}_{mip}*.nc" @@ -36,28 +36,28 @@ projects: obs4MIPs: data: mo: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /data/users/esmval/ESMValTool/obs dirname_template: "Tier{tier}/{dataset}" filename_template: "{short_name}_*.nc" native6: data: mo: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /data/users/esmval/ESMValTool/rawobs dirname_template: "Tier{tier}/{dataset}/{version}/{frequency}/{short_name}" filename_template: "*.nc" OBS6: data: mo: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /data/users/esmval/ESMValTool/obs dirname_template: "Tier{tier}/{dataset}" filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" OBS: data: mo: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /data/users/esmval/ESMValTool/obs dirname_template: "Tier{tier}/{dataset}" filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" diff --git a/esmvalcore/config/configurations/data-hpc-nci.yml b/esmvalcore/config/configurations/data-hpc-nci.yml index dc960c0efc..1c4c85552c 100644 --- a/esmvalcore/config/configurations/data-hpc-nci.yml +++ b/esmvalcore/config/configurations/data-hpc-nci.yml @@ -3,7 +3,7 @@ projects: CMIP6: data: oi10: &cmip6 - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /g/data/oi10/replicas dirname_template: "{project}/{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" @@ -16,7 +16,7 @@ projects: CMIP5: data: r87: &cmip5 - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /g/data/r87/DRSv3/CMIP5 dirname_template: "{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" @@ -33,35 +33,35 @@ projects: CMIP3: data: local: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /g/data/r87/DRSv3/CMIP3 dirname_template: "{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{ensemble}/{short_name}/{latestversion}" filename_template: "{short_name}_*.nc" obs4MIPs: data: local: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /g/data/ct11/access-nri/replicas/esmvaltool/obsdata-v2 dirname_template: "Tier{tier}/{dataset}" filename_template: "{short_name}_*.nc" native6: data: local: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /g/data/xp65/public/apps/esmvaltool/native6 dirname_template: "Tier{tier}/{dataset}/{version}/{frequency}/{short_name}" filename_template: "*.nc" OBS6: data: local: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /g/data/ct11/access-nri/replicas/esmvaltool/obsdata-v2 dirname_template: "Tier{tier}/{dataset}" filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" OBS: data: local: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: /g/data/ct11/access-nri/replicas/esmvaltool/obsdata-v2 dirname_template: "Tier{tier}/{dataset}" filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" diff --git a/esmvalcore/config/configurations/data-local-esmvaltool.yml b/esmvalcore/config/configurations/data-local-esmvaltool.yml index f4549e7775..474cca3e8f 100644 --- a/esmvalcore/config/configurations/data-local-esmvaltool.yml +++ b/esmvalcore/config/configurations/data-local-esmvaltool.yml @@ -4,7 +4,7 @@ projects: native6: data: local: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: ~/climate_data dirname_template: "Tier{tier}/{dataset}/{version}/{frequency}/{short_name}" filename_template: "*.nc" @@ -12,7 +12,7 @@ projects: OBS6: data: local: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: ~/climate_data dirname_template: "Tier{tier}/{dataset}" filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" @@ -20,7 +20,7 @@ projects: OBS: data: local: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: ~/climate_data dirname_template: "Tier{tier}/{dataset}" filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" diff --git a/esmvalcore/config/configurations/data-local.yml b/esmvalcore/config/configurations/data-local.yml index 1080bf2c17..91616dedbf 100644 --- a/esmvalcore/config/configurations/data-local.yml +++ b/esmvalcore/config/configurations/data-local.yml @@ -3,35 +3,35 @@ projects: CMIP6: data: local: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: ~/climate_data dirname_template: "{project}/{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" CMIP5: data: local: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: ~/climate_data dirname_template: "{project.lower}/{product}/{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" CMIP3: data: local: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: ~/climate_data dirname_template: "{project.lower}/{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{ensemble}/{short_name}/{version}" filename_template: "{short_name}_*.nc" CORDEX: data: local: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: ~/climate_data dirname_template: "{project.lower}/output/{domain}/{institute}/{driver}/{exp}/{ensemble}/{dataset}/{rcm_version}/{frequency}/{short_name}/{version}" filename_template: "{short_name}_{domain}_{driver}_{exp}_{ensemble}_{institute}-{dataset}_{rcm_version}_{mip}*.nc" obs4MIPs: data: local: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: ~/climate_data dirname_template: "{project}/{dataset}/{version}" filename_template: "{short_name}_*.nc" diff --git a/esmvalcore/config/configurations/data-native-access.yml b/esmvalcore/config/configurations/data-native-access.yml index 832479c3e2..7a58d7762a 100644 --- a/esmvalcore/config/configurations/data-native-access.yml +++ b/esmvalcore/config/configurations/data-native-access.yml @@ -3,12 +3,12 @@ projects: ACCESS: data: access-sub-dataset: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: ~/climate_data dirname_template: "{dataset}/{sub_dataset}/{exp}/{modeling_realm}/netCDF" filename_template: "{sub_dataset}.{freq_attribute}-*.nc" access-ocean: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: ~/climate_data dirname_template: "{dataset}/{sub_dataset}/{exp}/{modeling_realm}/netCDF" filename_template: "ocean_{freq_attribute}.nc-*" diff --git a/esmvalcore/config/configurations/data-native-cesm.yml b/esmvalcore/config/configurations/data-native-cesm.yml index fdfb84bb5f..2681dcb14e 100644 --- a/esmvalcore/config/configurations/data-native-cesm.yml +++ b/esmvalcore/config/configurations/data-native-cesm.yml @@ -3,7 +3,7 @@ projects: CESM: data: run: &cesm - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: ~/climate_data dirname_template: "" # run directory filename_template: "{case}.{scomp}.{type}.{string}*nc" diff --git a/esmvalcore/config/configurations/data-native-emac.yml b/esmvalcore/config/configurations/data-native-emac.yml index eb894e7115..7fa7f69844 100644 --- a/esmvalcore/config/configurations/data-native-emac.yml +++ b/esmvalcore/config/configurations/data-native-emac.yml @@ -3,7 +3,7 @@ projects: EMAC: data: emac: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: ~/climate_data dirname_template: "{exp}/{channel}" filename_template: "{exp}*{channel}{postproc_flag}.nc" diff --git a/esmvalcore/config/configurations/data-native-icon.yml b/esmvalcore/config/configurations/data-native-icon.yml index 6f5332dfd2..95acfc0318 100644 --- a/esmvalcore/config/configurations/data-native-icon.yml +++ b/esmvalcore/config/configurations/data-native-icon.yml @@ -3,7 +3,7 @@ projects: ICON: data: icon: &icon - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: ~/climate_data dirname_template: "{exp}" filename_template: "{exp}_{var_type}*.nc" diff --git a/esmvalcore/config/configurations/data-native-ipslcm.yml b/esmvalcore/config/configurations/data-native-ipslcm.yml index 109ff7a5c8..80bc366977 100644 --- a/esmvalcore/config/configurations/data-native-ipslcm.yml +++ b/esmvalcore/config/configurations/data-native-ipslcm.yml @@ -3,12 +3,12 @@ projects: IPSLCM: data: ipslcm-varname: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: ~/climate_data dirname_template: "{root}/{account}/{model}/{status}/{exp}/{simulation}/{dir}/{out}/{freq}" filename_template: "{simulation}_*_{ipsl_varname}.nc" ipslcm-group: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: ~/climate_data dirname_template: "{root}/{account}/{model}/{status}/{exp}/{simulation}/{dir}/{out}/{freq}" filename_template: "{simulation}_*_{group}.nc" diff --git a/esmvalcore/dataset.py b/esmvalcore/dataset.py index 3d1083b331..87e9818368 100644 --- a/esmvalcore/dataset.py +++ b/esmvalcore/dataset.py @@ -27,7 +27,7 @@ ) from esmvalcore.config._data_sources import _get_data_sources from esmvalcore.exceptions import InputFilesNotFound, RecipeError -from esmvalcore.local import _dates_to_timerange, _get_output_file +from esmvalcore.io.local import _dates_to_timerange, _get_output_file from esmvalcore.preprocessor import preprocess if TYPE_CHECKING: @@ -94,7 +94,7 @@ class Dataset: ---------- **facets Facets describing the dataset. See - :obj:`esmvalcore.esgf.facets.FACETS` for the mapping between + :obj:`esmvalcore.io.esgf.facets.FACETS` for the mapping between the facet names used by ESMValCore and those used on ESGF. Attributes diff --git a/esmvalcore/esgf/__init__.py b/esmvalcore/esgf/__init__.py index 84ff8c1d95..d98b5c0f33 100644 --- a/esmvalcore/esgf/__init__.py +++ b/esmvalcore/esgf/__init__.py @@ -1,46 +1,16 @@ """Find files on the ESGF and download them. -.. note:: - - This module uses `esgf-pyclient `_ - to search for and download files from the Earth System Grid Federation (ESGF). - `esgf-pyclient`_ uses a - `deprecated API `__ - that is scheduled to be taken offline and replaced by new APIs based on - STAC (ESGF East) and Globus (ESGF West). An ESGF node mimicking the deprecated - API but built op top of Globus will be kept online for some time at - https://esgf-node.ornl.gov/esgf-1-5-bridge, but users are encouraged - to migrate to the new APIs as soon as possible by using the - :mod:`esmvalcore.io.intake_esgf` module instead. - -This module provides the function :py:func:`esmvalcore.esgf.find_files` -for searching for files on ESGF using the ESMValTool vocabulary. -It returns :class:`esmvalcore.esgf.ESGFFile` objects, which have a convenient -:meth:`esmvalcore.esgf.ESGFFile.download` method for downloading the file. -A :func:`esmvalcore.esgf.download` function for downloading multiple files in -parallel is also available. - -It also provides an :class:`esmvalcore.esgf.ESGFDataSource` that can be -used to find files on ESGF from the :class:`~esmvalcore.dataset.Dataset` -or the :ref:`recipe `. To use it, run the command - -.. code:: bash - - esmvalcore config copy data-esmvalcore-esgf.yml - -to copy the default configuration file for this module to your configuration -directory. This will create a file with the following content: - -.. literalinclude:: ../configurations/data-esmvalcore-esgf.yml - :caption: Contents of ``data-esmvalcore-esgf.yml`` - :language: yaml - -See :ref:`config-data-sources` for more information on configuring data sources -and :ref:`config-esgf` for additional configuration options of this module. +.. deprecated:: 2.14.0 + This module has been moved to :mod:`esmvalcore.io.esgf`. Importing it as + :mod:`esmvalcore.io.esgf` is deprecated and will be removed in version 2.16.0. """ -from esmvalcore.esgf._download import ESGFFile, download -from esmvalcore.esgf._search import ESGFDataSource, find_files +from esmvalcore.io.esgf import ( + ESGFDataSource, + ESGFFile, + download, + find_files, +) __all__ = [ "ESGFFile", diff --git a/esmvalcore/esgf/facets.py b/esmvalcore/esgf/facets.py index 6e314a7e1c..4d6030a618 100644 --- a/esmvalcore/esgf/facets.py +++ b/esmvalcore/esgf/facets.py @@ -1,138 +1,17 @@ -"""Module containing mappings from our names to ESGF names.""" +"""Find files on the ESGF and download them. -import pyesgf.search +.. deprecated:: 2.14.0 + This module has been moved to :mod:`esmvalcore.io.esgf.facets`. Importing it as + :mod:`esmvalcore.io.esgf.facets` is deprecated and will be removed in version 2.16.0. +""" -from esmvalcore.config._esgf_pyclient import get_esgf_config - -FACETS = { - "CMIP3": { - "dataset": "model", - "ensemble": "ensemble", - "exp": "experiment", - "frequency": "time_frequency", - "short_name": "variable", - }, - "CMIP5": { - "dataset": "model", - "ensemble": "ensemble", - "exp": "experiment", - "frequency": "time_frequency", - "institute": "institute", - "mip": "cmor_table", - "product": "product", - "short_name": "variable", - }, - "CMIP6": { - "activity": "activity_drs", - "dataset": "source_id", - "ensemble": "member_id", - "exp": "experiment_id", - "institute": "institution_id", - "grid": "grid_label", - "mip": "table_id", - "short_name": "variable", - }, - "CORDEX": { - "dataset": "rcm_name", - "driver": "driving_model", - "domain": "domain", - "ensemble": "ensemble", - "exp": "experiment", - "frequency": "time_frequency", - "institute": "institute", - "product": "product", - "short_name": "variable", - }, - "obs4MIPs": { - "dataset": "source_id", - "frequency": "time_frequency", - "institute": "institute", - "short_name": "variable", - }, -} -"""Mapping between the recipe and ESGF facet names.""" - -DATASET_MAP = { - "CMIP3": {}, - "CMIP5": { - "ACCESS1-0": "ACCESS1.0", - "ACCESS1-3": "ACCESS1.3", - "bcc-csm1-1": "BCC-CSM1.1", - "bcc-csm1-1-m": "BCC-CSM1.1(m)", - "CESM1-BGC": "CESM1(BGC)", - "CESM1-CAM5": "CESM1(CAM5)", - "CESM1-CAM5-1-FV2": "CESM1(CAM5.1,FV2)", - "CESM1-FASTCHEM": "CESM1(FASTCHEM)", - "CESM1-WACCM": "CESM1(WACCM)", - "CSIRO-Mk3-6-0": "CSIRO-Mk3.6.0", - "fio-esm": "FIO-ESM", - "GFDL-CM2p1": "GFDL-CM2.1", - "inmcm4": "INM-CM4", - "MRI-AGCM3-2H": "MRI-AGCM3.2H", - "MRI-AGCM3-2S": "MRI-AGCM3.2S", - }, - "CMIP6": {}, - "CORDEX": {}, - "obs4MIPs": {}, -} -"""Cache for the mapping between recipe/filesystem and ESGF dataset names.""" - - -def create_dataset_map(): - """Create the DATASET_MAP from recipe datasets to ESGF dataset names. - - Run `python -m esmvalcore.esgf.facets` to print an up to date map. - """ - cfg = get_esgf_config() - search_args = dict(cfg["search_connection"]) - url = search_args.pop("urls")[0] - connection = pyesgf.search.SearchConnection(url=url, **search_args) - - dataset_map = {} - indices = { - "CMIP3": 2, - "CMIP5": 3, - "CMIP6": 3, - "CORDEX": 7, - "obs4MIPs": 2, - } - - for project in FACETS: - dataset_map[project] = {} - dataset_key = FACETS[project]["dataset"] - ctx = connection.new_context( - project=project, - facets=[dataset_key], - fields=["id"], - latest=True, - ) - available_datasets = sorted(ctx.facet_counts[dataset_key]) - print(f"The following datasets are available for project {project}:") # noqa: T201 - for dataset in available_datasets: - print(dataset) # noqa: T201 - - # Figure out the ESGF name of the requested dataset - n_available = len(available_datasets) - for i, dataset in enumerate(available_datasets, 1): - print( # noqa: T201 - f"Looking for dataset name of facet name" - f" {dataset} ({i} of {n_available})", - ) - query = {dataset_key: dataset} - dataset_result = next(iter(ctx.search(batch_size=1, **query))) - print(f"Dataset id: {dataset_result.dataset_id}") # noqa: T201 - dataset_id = dataset_result.dataset_id - if dataset not in dataset_id: - idx = indices[project] - dataset_alias = dataset_id.split(".")[idx] - print( # noqa: T201 - f"Found dataset name '{dataset_alias}'" - f" for facet '{dataset}',", - ) - dataset_map[project][dataset_alias] = dataset - - return dataset_map +from esmvalcore.io.esgf.facets import DATASET_MAP, FACETS, create_dataset_map +__all__ = [ + "FACETS", + "DATASET_MAP", + "create_dataset_map", +] if __name__ == "__main__": # Run this module to create an up to date DATASET_MAP diff --git a/esmvalcore/io/__init__.py b/esmvalcore/io/__init__.py index ef6e056282..e115c462f2 100644 --- a/esmvalcore/io/__init__.py +++ b/esmvalcore/io/__init__.py @@ -7,7 +7,7 @@ >>> from esmvalcore.config import CFG >>> CFG["projects"]["CMIP6"]["data"]["local"] = { - "type": "esmvalcore.local.LocalDataSource", + "type": "esmvalcore.io.local.LocalDataSource", "rootpath": "~/climate_data", "dirname_template": "{project}/{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}", "filename_template": "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc", @@ -21,14 +21,14 @@ CMIP6: data: local: - type: "esmvalcore.local.LocalDataSource" + type: "esmvalcore.io.local.LocalDataSource" rootpath: "~/climate_data" dirname_template: "{project}/{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}" filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" where ``CMIP6`` is a project, and ``local`` is a unique name describing the data source. The data source type, -:class:`esmvalcore.local.LocalDataSource`, in the example above, needs to +:class:`esmvalcore.io.local.LocalDataSource`, in the example above, needs to implement the :class:`esmvalcore.io.protocol.DataSource` protocol. Any remaining key-value pairs in the configuration, ``rootpath``, ``dirname_template``, and ``filename_template`` in this example, are passed diff --git a/esmvalcore/io/esgf/__init__.py b/esmvalcore/io/esgf/__init__.py new file mode 100644 index 0000000000..7568cc0400 --- /dev/null +++ b/esmvalcore/io/esgf/__init__.py @@ -0,0 +1,50 @@ +"""Find files on the ESGF and download them. + +.. note:: + + This module uses `esgf-pyclient `_ + to search for and download files from the Earth System Grid Federation (ESGF). + `esgf-pyclient`_ uses a + `deprecated API `__ + that is scheduled to be taken offline and replaced by new APIs based on + STAC (ESGF East) and Globus (ESGF West). An ESGF node mimicking the deprecated + API but built op top of Globus will be kept online for some time at + https://esgf-node.ornl.gov/esgf-1-5-bridge, but users are encouraged + to migrate to the new APIs as soon as possible by using the + :mod:`esmvalcore.io.intake_esgf` module instead. + +This module provides the function :py:func:`esmvalcore.io.esgf.find_files` +for searching for files on ESGF using the ESMValTool vocabulary. +It returns :class:`esmvalcore.io.esgf.ESGFFile` objects, which have a convenient +:meth:`esmvalcore.io.esgf.ESGFFile.download` method for downloading the file. +A :func:`esmvalcore.io.esgf.download` function for downloading multiple files in +parallel is also available. + +It also provides an :class:`esmvalcore.io.esgf.ESGFDataSource` that can be +used to find files on ESGF from the :class:`~esmvalcore.dataset.Dataset` +or the :ref:`recipe `. To use it, run the command + +.. code:: bash + + esmvalcore config copy data-esmvalcore.io.esgf.yml + +to copy the default configuration file for this module to your configuration +directory. This will create a file with the following content: + +.. literalinclude:: ../configurations/data-esmvalcore-esgf.yml + :caption: Contents of ``data-esmvalcore-esgf.yml`` + :language: yaml + +See :ref:`config-data-sources` for more information on configuring data sources +and :ref:`config-esgf` for additional configuration options of this module. +""" + +from esmvalcore.io.esgf._download import ESGFFile, download +from esmvalcore.io.esgf._search import ESGFDataSource, find_files + +__all__ = [ + "ESGFFile", + "ESGFDataSource", + "download", + "find_files", +] diff --git a/esmvalcore/esgf/_download.py b/esmvalcore/io/esgf/_download.py similarity index 99% rename from esmvalcore/esgf/_download.py rename to esmvalcore/io/esgf/_download.py index bb565963b9..abcb67ccf5 100644 --- a/esmvalcore/esgf/_download.py +++ b/esmvalcore/io/esgf/_download.py @@ -24,12 +24,12 @@ from humanfriendly import format_size, format_timespan from esmvalcore.config import CFG -from esmvalcore.io.protocol import DataElement -from esmvalcore.local import ( +from esmvalcore.io.local import ( LocalFile, _dates_to_timerange, _get_start_end_date_from_filename, ) +from esmvalcore.io.protocol import DataElement from .facets import DATASET_MAP, FACETS @@ -185,7 +185,7 @@ def sort_hosts(urls): class ESGFFile(DataElement): """File on the ESGF. - This is the object returned by :func:`esmvalcore.esgf.find_files`. + This is the object returned by :func:`esmvalcore.io.esgf.find_files`. Attributes ---------- @@ -450,7 +450,7 @@ def local_file(self, dest_folder: Path | None) -> LocalFile: Returns ------- - LocalFile + : The path where the file will be located after download. """ dest_folder = self.dest_folder if dest_folder is None else dest_folder @@ -473,7 +473,7 @@ def download(self, dest_folder: Path | None) -> LocalFile: Returns ------- - LocalFile + : The path where the file will be located after download. """ local_file = self.local_file(dest_folder) diff --git a/esmvalcore/esgf/_search.py b/esmvalcore/io/esgf/_search.py similarity index 98% rename from esmvalcore/esgf/_search.py rename to esmvalcore/io/esgf/_search.py index c067f326c5..01e42a0bb4 100644 --- a/esmvalcore/esgf/_search.py +++ b/esmvalcore/io/esgf/_search.py @@ -14,12 +14,12 @@ import requests.exceptions from esmvalcore.config._esgf_pyclient import get_esgf_config -from esmvalcore.io.protocol import DataSource -from esmvalcore.local import ( +from esmvalcore.io.local import ( _parse_period, _replace_years_with_timerange, _truncate_dates, ) +from esmvalcore.io.protocol import DataSource from ._download import ESGFFile from .facets import DATASET_MAP, FACETS @@ -345,7 +345,7 @@ def find_files(*, project, short_name, dataset, **facets): if project not in FACETS: msg = ( f"Unable to download from ESGF, because project {project} is not" - " on it or is not supported by the esmvalcore.esgf module." + " on it or is not supported by the esmvalcore.io.esgf module." ) raise ValueError( msg, @@ -420,7 +420,7 @@ def find_data(self, **facets: FacetValue) -> list[ESGFFile]: Returns ------- - :obj:`list` of :obj:`esmvalcore.esgf.ESGFFile` + :obj:`list` of :obj:`esmvalcore.io.esgf.ESGFFile` A list of files that have been found on ESGF. """ files = find_files(**facets) diff --git a/esmvalcore/io/esgf/facets.py b/esmvalcore/io/esgf/facets.py new file mode 100644 index 0000000000..a731a25bab --- /dev/null +++ b/esmvalcore/io/esgf/facets.py @@ -0,0 +1,139 @@ +"""Module containing mappings from our names to ESGF names.""" + +import pyesgf.search + +from esmvalcore.config._esgf_pyclient import get_esgf_config + +FACETS = { + "CMIP3": { + "dataset": "model", + "ensemble": "ensemble", + "exp": "experiment", + "frequency": "time_frequency", + "short_name": "variable", + }, + "CMIP5": { + "dataset": "model", + "ensemble": "ensemble", + "exp": "experiment", + "frequency": "time_frequency", + "institute": "institute", + "mip": "cmor_table", + "product": "product", + "short_name": "variable", + }, + "CMIP6": { + "activity": "activity_drs", + "dataset": "source_id", + "ensemble": "member_id", + "exp": "experiment_id", + "institute": "institution_id", + "grid": "grid_label", + "mip": "table_id", + "short_name": "variable", + }, + "CORDEX": { + "dataset": "rcm_name", + "driver": "driving_model", + "domain": "domain", + "ensemble": "ensemble", + "exp": "experiment", + "frequency": "time_frequency", + "institute": "institute", + "product": "product", + "short_name": "variable", + }, + "obs4MIPs": { + "dataset": "source_id", + "frequency": "time_frequency", + "institute": "institute", + "short_name": "variable", + }, +} +"""Mapping between the recipe and ESGF facet names.""" + +DATASET_MAP = { + "CMIP3": {}, + "CMIP5": { + "ACCESS1-0": "ACCESS1.0", + "ACCESS1-3": "ACCESS1.3", + "bcc-csm1-1": "BCC-CSM1.1", + "bcc-csm1-1-m": "BCC-CSM1.1(m)", + "CESM1-BGC": "CESM1(BGC)", + "CESM1-CAM5": "CESM1(CAM5)", + "CESM1-CAM5-1-FV2": "CESM1(CAM5.1,FV2)", + "CESM1-FASTCHEM": "CESM1(FASTCHEM)", + "CESM1-WACCM": "CESM1(WACCM)", + "CSIRO-Mk3-6-0": "CSIRO-Mk3.6.0", + "fio-esm": "FIO-ESM", + "GFDL-CM2p1": "GFDL-CM2.1", + "inmcm4": "INM-CM4", + "MRI-AGCM3-2H": "MRI-AGCM3.2H", + "MRI-AGCM3-2S": "MRI-AGCM3.2S", + }, + "CMIP6": {}, + "CORDEX": {}, + "obs4MIPs": {}, +} +"""Cache for the mapping between recipe/filesystem and ESGF dataset names.""" + + +def create_dataset_map(): + """Create the DATASET_MAP from recipe datasets to ESGF dataset names. + + Run `python -m esmvalcore.io.esgf.facets` to print an up to date map. + """ + cfg = get_esgf_config() + search_args = dict(cfg["search_connection"]) + url = search_args.pop("urls")[0] + connection = pyesgf.search.SearchConnection(url=url, **search_args) + + dataset_map = {} + indices = { + "CMIP3": 2, + "CMIP5": 3, + "CMIP6": 3, + "CORDEX": 7, + "obs4MIPs": 2, + } + + for project in FACETS: + dataset_map[project] = {} + dataset_key = FACETS[project]["dataset"] + ctx = connection.new_context( + project=project, + facets=[dataset_key], + fields=["id"], + latest=True, + ) + available_datasets = sorted(ctx.facet_counts[dataset_key]) + print(f"The following datasets are available for project {project}:") # noqa: T201 + for dataset in available_datasets: + print(dataset) # noqa: T201 + + # Figure out the ESGF name of the requested dataset + n_available = len(available_datasets) + for i, dataset in enumerate(available_datasets, 1): + print( # noqa: T201 + f"Looking for dataset name of facet name" + f" {dataset} ({i} of {n_available})", + ) + query = {dataset_key: dataset} + dataset_result = next(iter(ctx.search(batch_size=1, **query))) + print(f"Dataset id: {dataset_result.dataset_id}") # noqa: T201 + dataset_id = dataset_result.dataset_id + if dataset not in dataset_id: + idx = indices[project] + dataset_alias = dataset_id.split(".")[idx] + print( # noqa: T201 + f"Found dataset name '{dataset_alias}'" + f" for facet '{dataset}',", + ) + dataset_map[project][dataset_alias] = dataset + + return dataset_map + + +if __name__ == "__main__": + # Run this module to create an up to date DATASET_MAP + print(create_dataset_map()) # noqa: T201 diff --git a/esmvalcore/io/intake_esgf.py b/esmvalcore/io/intake_esgf.py index 930ef6ad60..e1b7298ed5 100644 --- a/esmvalcore/io/intake_esgf.py +++ b/esmvalcore/io/intake_esgf.py @@ -32,9 +32,9 @@ import isodate from esmvalcore.dataset import _isglob, _ismatch +from esmvalcore.io.local import _parse_period from esmvalcore.io.protocol import DataElement, DataSource from esmvalcore.iris_helpers import dataset_to_iris -from esmvalcore.local import _parse_period if TYPE_CHECKING: import iris.cube diff --git a/esmvalcore/io/local.py b/esmvalcore/io/local.py new file mode 100644 index 0000000000..c7b2059357 --- /dev/null +++ b/esmvalcore/io/local.py @@ -0,0 +1,906 @@ +"""Find files on the local filesystem. + +Example configuration to find CMIP6 data on a personal computer: + +.. literalinclude:: ../configurations/data-local.yml + :language: yaml + :caption: Contents of ``data-local.yml`` + :start-at: projects: + :end-before: CMIP5: + +The module will find files matching the :func:`glob.glob` pattern formed by +``rootpath/dirname_template/filename_template``, where the facets defined +inside the curly braces of the templates are replaced by their values +from the :class:`~esmvalcore.dataset.Dataset` or the :ref:`recipe ` +plus any facet-value pairs that can be automatically added using +:meth:`~esmvalcore.dataset.Dataset.augment_facets`. +Note that the name of the data source, ``local-data`` in the example above, +must be unique within each project but can otherwise be chosen freely. + +To start using this module on a personal computer, copy the example +configuration file into your configuration directory by running the command: + +.. code-block:: bash + + esmvaltool config copy data-local.yml + +and tailor it for your own system if needed. + +Example configuration files for popular HPC systems and some +:ref:`supported climate models ` are also available. View +the list of available files by running the command: + +.. code-block:: bash + + esmvaltool config list + +Further information is available in :ref:`config-data-sources`. + +""" + +from __future__ import annotations + +import copy +import itertools +import logging +import os +import os.path +import re +from dataclasses import dataclass, field +from glob import glob +from pathlib import Path +from typing import TYPE_CHECKING, Any + +import iris.cube +import iris.fileformats.cf +import isodate +from cf_units import Unit +from netCDF4 import Dataset + +import esmvalcore.io.protocol +from esmvalcore.config._config import get_project_config +from esmvalcore.exceptions import RecipeError +from esmvalcore.iris_helpers import ignore_warnings_context + +if TYPE_CHECKING: + from collections.abc import Iterable + + from netCDF4 import Variable + + from esmvalcore.typing import Facets, FacetValue + +logger = logging.getLogger(__name__) + + +def _get_from_pattern( + pattern: str, + date_range_pattern: str, + stem: str, + group: str, +) -> tuple[str | None, str | None]: + """Get time, date or datetime from date range patterns in file names.""" + # Next string allows to test that there is an allowed delimiter (or + # string start or end) close to date range (or to single date) + start_point: str | None = None + end_point: str | None = None + context = r"(?:^|[-_]|$)" + + # First check for a block of two potential dates + date_range_pattern_with_context = context + date_range_pattern + context + daterange = re.search(date_range_pattern_with_context, stem) + if not daterange: + # Retry with extended context for CMIP3 + context = r"(?:^|[-_.]|$)" + date_range_pattern_with_context = ( + context + date_range_pattern + context + ) + daterange = re.search(date_range_pattern_with_context, stem) + + if daterange: + start_point = daterange.group(group) + end_group = f"{group}_end" + end_point = daterange.group(end_group) + else: + # Check for single dates in the filename + single_date_pattern = context + pattern + context + dates = re.findall(single_date_pattern, stem) + if len(dates) == 1: + start_point = end_point = dates[0][0] + elif len(dates) > 1: + # Check for dates at start or (exclusive or) end of filename + start = re.search(r"^" + pattern, stem) + end = re.search(pattern + r"$", stem) + if start and not end: + start_point = end_point = start.group(group) + elif end: + start_point = end_point = end.group(group) + + return start_point, end_point + + +def _get_var_name(variable: Variable) -> str: + """Get variable name (following Iris' Cube.name()).""" + for attr in ("standard_name", "long_name"): + if attr in variable.ncattrs(): + return str(variable.getncattr(attr)) + return str(variable.name) + + +def _get_start_end_date_from_filename( + file: str | Path, +) -> tuple[str | None, str | None]: + """Get the start and end dates as a string from a file name. + + Examples of allowed dates: 1980, 198001, 1980-01, 19801231, 1980-12-31, + 1980123123, 19801231T23, 19801231T2359, 19801231T235959, 19801231T235959Z + (ISO 8601). + + Dates must be surrounded by '-', '_' or '.' (the latter is used by CMIP3 + data), or string start or string end (after removing filename suffix). + + Look first for two dates separated by '-', '_' or '_cat_' (the latter is + used by CMIP3 data), then for one single date, and if there are multiple, + for one date at start or end. + + Parameters + ---------- + file: + The file to read the start and end data from. + + Returns + ------- + tuple[str, str] + The start and end date. + + Raises + ------ + ValueError + Start or end date cannot be determined. + """ + start_date = end_date = None + + # Build regex + time_pattern = ( + r"(?P[0-2][0-9]" + r"(?P[0-5][0-9]" + r"(?P[0-5][0-9])?)?Z?)" + ) + date_pattern = ( + r"(?P[0-9]{4})" + r"(?P-?[01][0-9]" + r"(?P-?[0-3][0-9]" + rf"(T?{time_pattern})?)?)?" + ) + datetime_pattern = rf"(?P{date_pattern})" + end_datetime_pattern = datetime_pattern.replace(">", "_end>") + + # Dates can either be delimited by '-', '_', or '_cat_' (the latter for + # CMIP3) + date_range_pattern = ( + datetime_pattern + r"[-_](?:cat_)?" + end_datetime_pattern + ) + + # Find dates using the regex + start_date, end_date = _get_from_pattern( + datetime_pattern, + date_range_pattern, + Path(file).stem, + "datetime", + ) + return start_date, end_date + + +def _get_start_end_date(file: str | Path) -> tuple[str, str]: + """Get the start and end dates as a string from a file. + + This function first tries to read the dates from the filename and only + if that fails, it will try to read them from the content of the file. + + Parameters + ---------- + file: + The file to read the start and end data from. + + Returns + ------- + tuple[str, str] + The start and end date. + + Raises + ------ + ValueError + Start or end date cannot be determined. + """ + start_date, end_date = _get_start_end_date_from_filename(file) + + # As final resort, try to get the dates from the file contents + if ( + (start_date is None or end_date is None) + and isinstance(file, (str, Path)) + and Path(file).exists() + ): + logger.debug("Must load file %s for daterange ", file) + with Dataset(file) as dataset: + for variable in dataset.variables.values(): + var_name = _get_var_name(variable) + attrs = variable.ncattrs() + if ( + var_name == "time" + and "units" in attrs + and "calendar" in attrs + ): + time_units = Unit( + variable.getncattr("units"), + calendar=variable.getncattr("calendar"), + ) + start_date = isodate.date_isoformat( + time_units.num2date(variable[0]), + format=isodate.isostrf.DATE_BAS_COMPLETE, + ) + end_date = isodate.date_isoformat( + time_units.num2date(variable[-1]), + format=isodate.isostrf.DATE_BAS_COMPLETE, + ) + break + + if start_date is None or end_date is None: + msg = ( + f"File {file} datetimes do not match a recognized pattern and " + f"time coordinate can not be read from the file" + ) + raise ValueError(msg) + + # Remove potential '-' characters from datetimes + start_date = start_date.replace("-", "") + end_date = end_date.replace("-", "") + + return start_date, end_date + + +def _dates_to_timerange(start_date: int | str, end_date: int | str) -> str: + """Convert ``start_date`` and ``end_date`` to ``timerange``. + + Note + ---- + This function ensures that dates in years format follow the pattern YYYY + (i.e., that they have at least 4 digits). Other formats, such as wildcards + (``'*'``) and relative time ranges (e.g., ``'P6Y'``) are used unchanged. + + Parameters + ---------- + start_date: + Start date. + end_date: + End date. + + Returns + ------- + str + ``timerange`` in the form ``'start_date/end_date'``. + """ + start_date = str(start_date) + end_date = str(end_date) + + # Pad years with 0s if not wildcard or relative time range + if start_date != "*" and not start_date.startswith("P"): + start_date = start_date.zfill(4) + if end_date != "*" and not end_date.startswith("P"): + end_date = end_date.zfill(4) + + return f"{start_date}/{end_date}" + + +def _replace_years_with_timerange(variable: dict[str, Any]) -> None: + """Set `timerange` tag from tags `start_year` and `end_year`.""" + start_year = variable.get("start_year") + end_year = variable.get("end_year") + if start_year and end_year: + variable["timerange"] = _dates_to_timerange(start_year, end_year) + elif start_year: + variable["timerange"] = _dates_to_timerange(start_year, start_year) + elif end_year: + variable["timerange"] = _dates_to_timerange(end_year, end_year) + variable.pop("start_year", None) + variable.pop("end_year", None) + + +def _parse_period(timerange: FacetValue) -> tuple[str, str]: + """Parse `timerange` values given as duration periods. + + Sum the duration periods to the `timerange` value given as a + reference point in order to compute the start and end dates needed + for file selection. + """ + if not isinstance(timerange, str): + msg = f"`timerange` should be a `str`, got {type(timerange)}" + raise TypeError(msg) + start_date: str | None = None + end_date: str | None = None + time_format = None + datetime_format = ( + isodate.DATE_BAS_COMPLETE + "T" + isodate.TIME_BAS_COMPLETE + ) + if timerange.split("/")[0].startswith("P"): + try: + end_date = isodate.parse_datetime(timerange.split("/")[1]) + time_format = datetime_format + except isodate.ISO8601Error: + end_date = isodate.parse_date(timerange.split("/")[1]) + time_format = isodate.DATE_BAS_COMPLETE + delta = isodate.parse_duration(timerange.split("/")[0]) + start_date = end_date - delta + elif timerange.split("/")[1].startswith("P"): + try: + start_date = isodate.parse_datetime(timerange.split("/")[0]) + time_format = datetime_format + except isodate.ISO8601Error: + start_date = isodate.parse_date(timerange.split("/")[0]) + time_format = isodate.DATE_BAS_COMPLETE + delta = isodate.parse_duration(timerange.split("/")[1]) + end_date = start_date + delta + + if time_format == datetime_format: + start_date = str( + isodate.datetime_isoformat(start_date, format=datetime_format), + ) + end_date = str( + isodate.datetime_isoformat(end_date, format=datetime_format), + ) + elif time_format == isodate.DATE_BAS_COMPLETE: + start_date = str( + isodate.date_isoformat(start_date, format=time_format), + ) + end_date = str(isodate.date_isoformat(end_date, format=time_format)) + + if start_date is None: + start_date = timerange.split("/")[0] + if end_date is None: + end_date = timerange.split("/")[1] + + return start_date, end_date + + +def _truncate_dates(date: str, file_date: str) -> tuple[int, int]: + """Truncate dates of different lengths and convert to integers. + + This allows to compare the dates chronologically. For example, this allows + comparisons between the formats 'YYYY' and 'YYYYMM', and 'YYYYMM' and + 'YYYYMMDD'. + + Warning + ------- + This function assumes that the years in ``date`` and ``file_date`` have the + same number of digits. If this is not the case, pad the dates with leading + zeros (e.g., use ``date='0100'`` and ``file_date='199901'`` for a correct + comparison). + """ + date = re.sub("[^0-9]", "", date) + file_date = re.sub("[^0-9]", "", file_date) + if len(date) < len(file_date): + file_date = file_date[0 : len(date)] + elif len(date) > len(file_date): + date = date[0 : len(file_date)] + + return int(date), int(file_date) + + +def _select_files( + filenames: Iterable[LocalFile], + timerange: FacetValue, +) -> list[LocalFile]: + """Select files containing data between a given timerange. + + If the timerange is given as a period, the file selection occurs + taking only the years into account. + + Otherwise, the file selection occurs taking into account the time + resolution of the file. + """ + if not isinstance(timerange, str): + msg = f"`timerange` should be a `str`, got {type(timerange)}" + raise TypeError(msg) + if "*" in timerange: + # TODO: support * combined with a period + return list(filenames) + + selection: list[LocalFile] = [] + + for filename in filenames: + start_date, end_date = _parse_period(timerange) + start, end = _get_start_end_date(filename) + + start_date_int, end_int = _truncate_dates(start_date, end) + end_date_int, start_int = _truncate_dates(end_date, start) + if start_int <= end_date_int and end_int >= start_date_int: + selection.append(filename) + + return selection + + +def _replace_tags( + paths: str | list[str], + variable: Facets, +) -> list[Path]: + """Replace tags in the config-developer's file with actual values.""" + pathset: Iterable[str] + if isinstance(paths, str): + pathset = {paths.strip("/")} + else: + pathset = {path.strip("/") for path in paths} + tlist: set[str] = set() + for path in pathset: + tlist = tlist.union(re.findall(r"{([^}]*)}", path)) + if "sub_experiment" in variable: + new_paths: set[str] = set() + for path in pathset: + new_paths.update( + ( + re.sub(r"(\b{ensemble}\b)", r"{sub_experiment}-\1", path), + re.sub(r"({ensemble})", r"{sub_experiment}-\1", path), + ), + ) + tlist.add("sub_experiment") + pathset = new_paths + + for original_tag in tlist: + tag, _, _ = _get_caps_options(original_tag) + + if tag in variable: + replacewith = variable[tag] + elif tag == "version": + replacewith = "*" + else: + msg = ( + f"Dataset key '{tag}' must be specified for {variable}, check " + f"your recipe entry and/or extra facet file(s)" + ) + raise RecipeError(msg) + pathset = _replace_tag(pathset, original_tag, replacewith) + return [Path(p) for p in pathset] + + +def _replace_tag( + paths: Iterable[str], + tag: str, + replacewith: FacetValue, +) -> list[str]: + """Replace tag by replacewith in paths.""" + _, lower, upper = _get_caps_options(tag) + result: list[str] = [] + if isinstance(replacewith, (list, tuple)): + for item in replacewith: + result.extend(_replace_tag(paths, tag, item)) + else: + text = _apply_caps(str(replacewith), lower, upper) + result.extend(p.replace("{" + tag + "}", text) for p in paths) + return list(set(result)) + + +def _get_caps_options(tag: str) -> tuple[str, bool, bool]: + lower = False + upper = False + if tag.endswith(".lower"): + lower = True + tag = tag[0:-6] + elif tag.endswith(".upper"): + upper = True + tag = tag[0:-6] + return tag, lower, upper + + +def _apply_caps(original: str, lower: bool, upper: bool) -> str: + if lower: + return original.lower() + if upper: + return original.upper() + return original + + +@dataclass(order=True) +class LocalDataSource(esmvalcore.io.protocol.DataSource): + """Data source for finding files on a local filesystem.""" + + name: str + """A name identifying the data source.""" + + project: str + """The project that the data source provides data for.""" + + priority: int + """The priority of the data source. Lower values have priority.""" + + debug_info: str = field(init=False, repr=False, default="") + """A string containing debug information when no data is found.""" + + rootpath: Path + """The path where the directories are located.""" + + dirname_template: str + """The template for the directory names.""" + + filename_template: str + """The template for the file names.""" + + ignore_warnings: list[dict[str, Any]] | None = field(default_factory=list) + """Warnings to ignore when loading the data. + + The list should contain :class:`dict`s with keyword arguments that + will be passed to the :func:`warnings.filterwarnings` function when + calling :meth:`LocalFile.to_iris`. + """ + + def __post_init__(self) -> None: + """Set further attributes.""" + self.rootpath = Path(os.path.expandvars(self.rootpath)).expanduser() + self._regex_pattern = self._templates_to_regex() + + def _get_glob_patterns(self, **facets: FacetValue) -> list[Path]: + """Compose the globs that will be used to look for files.""" + dirname_globs = _replace_tags(self.dirname_template, facets) + filename_globs = _replace_tags(self.filename_template, facets) + return sorted( + self.rootpath / d / f + for d in dirname_globs + for f in filename_globs + ) + + def find_data(self, **facets: FacetValue) -> list[LocalFile]: + """Find data locally. + + Parameters + ---------- + **facets : + Find data matching these facets. + + Returns + ------- + : + A list of files. + + """ + facets = dict(facets) + if "original_short_name" in facets: + facets["short_name"] = facets["original_short_name"] + + globs = self._get_glob_patterns(**facets) + self.debug_info = "No files found matching glob pattern " + "\n".join( + str(g) for g in globs + ) + logger.debug("Looking for files matching %s", globs) + + files: list[LocalFile] = [] + for glob_ in globs: + for filename in glob(str(glob_)): + file = LocalFile(filename) + file.facets.update( + self._path2facets( + file, + add_timerange=facets.get("frequency", "fx") != "fx", + ), + ) + file.ignore_warnings = self.ignore_warnings + files.append(file) + + files = _filter_versions_called_latest(files) + + if "version" not in facets: + files = _select_latest_version(files) + + files.sort() # sorting makes it easier to see what was found + + if "timerange" in facets: + files = _select_files(files, facets["timerange"]) + return files + + def _path2facets(self, path: Path, add_timerange: bool) -> dict[str, str]: + """Extract facets from path.""" + facets: dict[str, str] = {} + + if (match := re.search(self._regex_pattern, str(path))) is not None: + for facet, value in match.groupdict().items(): + if value: + facets[facet] = value + + if add_timerange: + try: + start_date, end_date = _get_start_end_date(path) + except ValueError: + pass + else: + facets["timerange"] = _dates_to_timerange(start_date, end_date) + + return facets + + def _templates_to_regex(self) -> str: + r"""Convert template strings to regex pattern. + + The resulting regex pattern can be used to extract facets from paths + using :func:`re.search`. + + Note + ---- + Facets must not contain "/" or "_". + + Examples + -------- + - rootpath: "/root" + dirname_template: "{f2.upper}" + filename_template: "{f3}[._]{f4}*" + --> regex_pattern: + "/root/(?P[^_/]*?)/(?P[^_/]*?)[\._](?P[^_/]*?).*?" + - rootpath: "/root" + dirname_template: "{f1}/{f1}-{f2}" + filename_template: "*.nc" + --> regex_pattern: + "/root/(?P[^_/]*?)/(?P=f1)\-(?P[^_/]*?)/.*?\.nc" + - rootpath: "/root" + dirname_template: "{f1}/{f2}{f3}" + filename_template: "*.nc" + --> regex_pattern: + "/root/(?P[^_/]*?)/(?:[^_/]*?)/.*?\.nc" + + """ + dirname_template = self.dirname_template + filename_template = self.filename_template + + # Templates must not be absolute paths (i.e., start with /), otherwise + # the roopath is ignored (see + # https://docs.python.org/3/library/pathlib.html#operators) + if self.dirname_template.startswith(os.sep): + dirname_template = dirname_template[1:] + if self.filename_template.startswith(os.sep): + filename_template = filename_template[1:] + + pattern = re.escape( + str(self.rootpath / dirname_template / filename_template), + ) + + # Remove all tags that are in between other tags, e.g., + # {tag1}{tag2}{tag3} -> {tag1}{tag2} (there is no way to reliably + # extract facets from those) + pattern = re.sub(r"(?<=\})\\\{[^\}]+?\\\}(?=\\(?=\{))", "", pattern) + + # Replace consecutive tags, e.g. {tag1}{tag2} with non-capturing groups + # (?:[^_/]*?) (there is no way to reliably extract facets from those) + # Note: This assumes that facets do NOT contain / or _ + pattern = re.sub( + r"\\\{[^\{]+?\}\\\{[^\}]+?\\\}", + rf"(?:[^_{os.sep}]*?)", + pattern, + ) + + # Convert tags {tag} to named capture groups (?P[^_/]*?); for + # duplicates use named backreferences (?P=tag) + # Note: This assumes that facets do NOT contain / or _ + already_used_tags: set[str] = set() + for full_tag in re.findall(r"\\\{(.+?)\\\}", pattern): + # Ignore .upper and .lower (full_tag: {tag.lower}, tag: {tag}) + if full_tag.endswith((r"\.upper", r"\.lower")): + tag = full_tag[:-7] + else: + tag = full_tag + + old_str = rf"\{{{full_tag}\}}" + if tag in already_used_tags: + new_str = rf"(?P={tag})" + else: + new_str = rf"(?P<{tag}>[^_{os.sep}]*?)" + already_used_tags.add(tag) + + pattern = pattern.replace(old_str, new_str, 1) + + # Convert fnmatch wildcards * and [] to regex wildcards + pattern = pattern.replace(r"\*", ".*?") + for chars in re.findall(r"\\\[(.*?)\\\]", pattern): + pattern = pattern.replace(rf"\[{chars}\]", f"[{chars}]") + + return pattern + + +def _get_output_file(variable: dict[str, Any], preproc_dir: Path) -> Path: + """Return the full path to the output (preprocessed) file.""" + cfg = get_project_config(variable["project"]) + + # Join different experiment names + if isinstance(variable.get("exp"), (list, tuple)): + variable = dict(variable) + variable["exp"] = "-".join(variable["exp"]) + outfile = _replace_tags(cfg["output_file"], variable)[0] + if "timerange" in variable: + timerange = variable["timerange"].replace("/", "-") + outfile = Path(f"{outfile}_{timerange}") + outfile = Path(f"{outfile}.nc") + return Path( + preproc_dir, + variable.get("diagnostic", ""), + variable.get("variable_group", ""), + outfile, + ) + + +def _get_multiproduct_filename(attributes: dict, preproc_dir: Path) -> Path: + """Get ensemble/multi-model filename depending on settings.""" + relevant_keys = [ + "project", + "dataset", + "exp", + "ensemble_statistics", + "multi_model_statistics", + "mip", + "short_name", + ] + + filename_segments = [] + for key in relevant_keys: + if key in attributes: + attribute = attributes[key] + if isinstance(attribute, (list, tuple)): + attribute = "-".join(attribute) + filename_segments.extend(attribute.split("_")) + + # Remove duplicate segments: + filename_segments = list(dict.fromkeys(filename_segments)) + + # Add time period if possible + if "timerange" in attributes: + filename_segments.append( + f"{attributes['timerange'].replace('/', '-')}", + ) + + filename = f"{'_'.join(filename_segments)}.nc" + return Path( + preproc_dir, + attributes["diagnostic"], + attributes["variable_group"], + filename, + ) + + +def _filter_versions_called_latest( + files: list[LocalFile], +) -> list[LocalFile]: + """Filter out versions called 'latest' if they are duplicates. + + On compute clusters it is usual to have a symbolic link to the + latest version called 'latest'. Those need to be skipped in order to + find valid version names and avoid duplicate results. + """ + resolved_valid_versions = { + f.resolve(strict=False) + for f in files + if f.facets.get("version") != "latest" + } + return [ + f + for f in files + if f.facets.get("version") != "latest" + or f.resolve(strict=False) not in resolved_valid_versions + ] + + +def _select_latest_version(files: list[LocalFile]) -> list[LocalFile]: + """Select only the latest version of files.""" + + def filename(file): + return file.name + + def version(file): + return file.facets.get("version", "") + + result = [] + for _, group in itertools.groupby( + sorted(files, key=filename), + key=filename, + ): + duplicates = sorted(group, key=version) + latest = duplicates[-1] + result.append(latest) + return result + + +GRIB_FORMATS = (".grib2", ".grib", ".grb2", ".grb", ".gb2", ".gb") +"""GRIB file extensions.""" + + +def _get_attr_from_field_coord( + ncfield: iris.fileformats.cf.CFVariable, + coord_name: str | None, + attr: str, +) -> Any: # noqa: ANN401 + """Get attribute from netCDF field coordinate.""" + if coord_name is not None: + attrs = ncfield.cf_group[coord_name].cf_attrs() + attr_val = [value for (key, value) in attrs if key == attr] + if attr_val: + return attr_val[0] + return None + + +def _restore_lat_lon_units( + cube: iris.cube.Cube, + field: iris.fileformats.cf.CFVariable, + filename: str, # noqa: ARG001 +) -> None: # pylint: disable=unused-argument + """Use this callback to restore the original lat/lon units.""" + # Iris chooses to change longitude and latitude units to degrees + # regardless of value in file, so reinstating file value + for coord in cube.coords(): + if coord.standard_name in ["longitude", "latitude"]: + units = _get_attr_from_field_coord(field, coord.var_name, "units") + if units is not None: + coord.units = units + + +class LocalFile(type(Path()), esmvalcore.io.protocol.DataElement): # type: ignore + """File on the local filesystem.""" + + def prepare(self) -> None: + """Prepare the data for access.""" + + @property + def facets(self) -> Facets: + """Facets are key-value pairs that were used to find this data.""" + if not hasattr(self, "_facets"): + self._facets: Facets = {} + return self._facets + + @facets.setter + def facets(self, value: Facets) -> None: + self._facets = value + + @property + def attributes(self) -> dict[str, Any]: + """Attributes are key-value pairs describing the data.""" + if not hasattr(self, "_attributes"): + msg = ( + "Attributes have not been read yet. Call the `to_iris` method " + "first to read the attributes from the file." + ) + raise ValueError(msg) + return self._attributes + + @attributes.setter + def attributes(self, value: dict[str, Any]) -> None: + self._attributes = value + + @property + def ignore_warnings(self) -> list[dict[str, Any]] | None: + """Warnings to ignore when loading the data. + + The list should contain :class:`dict`s with keyword arguments that + will be passed to the :func:`warnings.filterwarnings` function when + calling the ``to_iris`` method. + """ + if not hasattr(self, "_ignore_warnings"): + self._ignore_warnings: list[dict[str, Any]] | None = None + return self._ignore_warnings + + @ignore_warnings.setter + def ignore_warnings(self, value: list[dict[str, Any]] | None) -> None: + self._ignore_warnings = value + + def to_iris(self) -> iris.cube.CubeList: + """Load the data as Iris cubes. + + Returns + ------- + iris.cube.CubeList + The loaded data. + """ + file = Path(self) + + with ignore_warnings_context(self.ignore_warnings): + # GRIB files need to be loaded with iris.load, otherwise we will + # get separate (lat, lon) slices for each time step, pressure + # level, etc. + if file.suffix in GRIB_FORMATS: + cubes = iris.load(file, callback=_restore_lat_lon_units) + else: + cubes = iris.load_raw(file, callback=_restore_lat_lon_units) + + for cube in cubes: + cube.attributes.globals["source_file"] = str(file) + + # Cache the attributes. + self.attributes = copy.deepcopy(dict(cubes[0].attributes.globals)) + return cubes diff --git a/esmvalcore/io/protocol.py b/esmvalcore/io/protocol.py index 37239eef89..a52941b610 100644 --- a/esmvalcore/io/protocol.py +++ b/esmvalcore/io/protocol.py @@ -27,7 +27,7 @@ class DataElement(Protocol): """A data element represents some data that can be loaded. - An :class:`esmvalcore.local.LocalFile` is an example of a data element. + An :class:`esmvalcore.io.local.LocalFile` is an example of a data element. """ name: str diff --git a/esmvalcore/local.py b/esmvalcore/local.py index e8d63abcc7..e1b36b3572 100644 --- a/esmvalcore/local.py +++ b/esmvalcore/local.py @@ -1,503 +1,40 @@ """Find files on the local filesystem. -Example configuration to find CMIP6 data on a personal computer: - -.. literalinclude:: ../configurations/data-local.yml - :language: yaml - :caption: Contents of ``data-local.yml`` - :start-at: projects: - :end-before: CMIP5: - -The module will find files matching the :func:`glob.glob` pattern formed by -``rootpath/dirname_template/filename_template``, where the facets defined -inside the curly braces of the templates are replaced by their values -from the :class:`~esmvalcore.dataset.Dataset` or the :ref:`recipe ` -plus any facet-value pairs that can be automatically added using -:meth:`~esmvalcore.dataset.Dataset.augment_facets`. -Note that the name of the data source, ``local-data`` in the example above, -must be unique within each project but can otherwise be chosen freely. - -To start using this module on a personal computer, copy the example -configuration file into your configuration directory by running the command: - -.. code-block:: bash - - esmvaltool config copy data-local.yml - -and tailor it for your own system if needed. - -Example configuration files for popular HPC systems and some -:ref:`supported climate models ` are also available. View -the list of available files by running the command: - -.. code-block:: bash - - esmvaltool config list - -Further information is available in :ref:`config-data-sources`. - +.. deprecated:: 2.14.0 + This module has been moved to :mod:`esmvalcore.io.local`. Importing it as + :mod:`esmvalcore.local` is deprecated and will be removed in version 2.16.0. """ from __future__ import annotations -import copy -import itertools import logging -import os import os.path -import re import warnings -from dataclasses import dataclass, field -from glob import glob from pathlib import Path -from typing import TYPE_CHECKING, Any +from typing import TYPE_CHECKING -import iris.cube -import iris.fileformats.cf -import isodate -from cf_units import Unit -from netCDF4 import Dataset - -import esmvalcore.io.protocol from esmvalcore.config import CFG from esmvalcore.config._config import get_ignored_warnings, get_project_config -from esmvalcore.exceptions import RecipeError -from esmvalcore.iris_helpers import ignore_warnings_context +from esmvalcore.io.local import ( + LocalDataSource, + LocalFile, + _filter_versions_called_latest, + _select_latest_version, +) if TYPE_CHECKING: - from collections.abc import Iterable - - from netCDF4 import Variable + from esmvalcore.typing import FacetValue - from esmvalcore.typing import Facets, FacetValue +__all__ = [ + "DataSource", + "LocalDataSource", + "LocalFile", + "find_files", +] logger = logging.getLogger(__name__) -def _get_from_pattern( - pattern: str, - date_range_pattern: str, - stem: str, - group: str, -) -> tuple[str | None, str | None]: - """Get time, date or datetime from date range patterns in file names.""" - # Next string allows to test that there is an allowed delimiter (or - # string start or end) close to date range (or to single date) - start_point: str | None = None - end_point: str | None = None - context = r"(?:^|[-_]|$)" - - # First check for a block of two potential dates - date_range_pattern_with_context = context + date_range_pattern + context - daterange = re.search(date_range_pattern_with_context, stem) - if not daterange: - # Retry with extended context for CMIP3 - context = r"(?:^|[-_.]|$)" - date_range_pattern_with_context = ( - context + date_range_pattern + context - ) - daterange = re.search(date_range_pattern_with_context, stem) - - if daterange: - start_point = daterange.group(group) - end_group = f"{group}_end" - end_point = daterange.group(end_group) - else: - # Check for single dates in the filename - single_date_pattern = context + pattern + context - dates = re.findall(single_date_pattern, stem) - if len(dates) == 1: - start_point = end_point = dates[0][0] - elif len(dates) > 1: - # Check for dates at start or (exclusive or) end of filename - start = re.search(r"^" + pattern, stem) - end = re.search(pattern + r"$", stem) - if start and not end: - start_point = end_point = start.group(group) - elif end: - start_point = end_point = end.group(group) - - return start_point, end_point - - -def _get_var_name(variable: Variable) -> str: - """Get variable name (following Iris' Cube.name()).""" - for attr in ("standard_name", "long_name"): - if attr in variable.ncattrs(): - return str(variable.getncattr(attr)) - return str(variable.name) - - -def _get_start_end_date_from_filename( - file: str | Path, -) -> tuple[str | None, str | None]: - """Get the start and end dates as a string from a file name. - - Examples of allowed dates: 1980, 198001, 1980-01, 19801231, 1980-12-31, - 1980123123, 19801231T23, 19801231T2359, 19801231T235959, 19801231T235959Z - (ISO 8601). - - Dates must be surrounded by '-', '_' or '.' (the latter is used by CMIP3 - data), or string start or string end (after removing filename suffix). - - Look first for two dates separated by '-', '_' or '_cat_' (the latter is - used by CMIP3 data), then for one single date, and if there are multiple, - for one date at start or end. - - Parameters - ---------- - file: - The file to read the start and end data from. - - Returns - ------- - tuple[str, str] - The start and end date. - - Raises - ------ - ValueError - Start or end date cannot be determined. - """ - start_date = end_date = None - - # Build regex - time_pattern = ( - r"(?P[0-2][0-9]" - r"(?P[0-5][0-9]" - r"(?P[0-5][0-9])?)?Z?)" - ) - date_pattern = ( - r"(?P[0-9]{4})" - r"(?P-?[01][0-9]" - r"(?P-?[0-3][0-9]" - rf"(T?{time_pattern})?)?)?" - ) - datetime_pattern = rf"(?P{date_pattern})" - end_datetime_pattern = datetime_pattern.replace(">", "_end>") - - # Dates can either be delimited by '-', '_', or '_cat_' (the latter for - # CMIP3) - date_range_pattern = ( - datetime_pattern + r"[-_](?:cat_)?" + end_datetime_pattern - ) - - # Find dates using the regex - start_date, end_date = _get_from_pattern( - datetime_pattern, - date_range_pattern, - Path(file).stem, - "datetime", - ) - return start_date, end_date - - -def _get_start_end_date(file: str | Path) -> tuple[str, str]: - """Get the start and end dates as a string from a file. - - This function first tries to read the dates from the filename and only - if that fails, it will try to read them from the content of the file. - - Parameters - ---------- - file: - The file to read the start and end data from. - - Returns - ------- - tuple[str, str] - The start and end date. - - Raises - ------ - ValueError - Start or end date cannot be determined. - """ - start_date, end_date = _get_start_end_date_from_filename(file) - - # As final resort, try to get the dates from the file contents - if ( - (start_date is None or end_date is None) - and isinstance(file, (str, Path)) - and Path(file).exists() - ): - logger.debug("Must load file %s for daterange ", file) - with Dataset(file) as dataset: - for variable in dataset.variables.values(): - var_name = _get_var_name(variable) - attrs = variable.ncattrs() - if ( - var_name == "time" - and "units" in attrs - and "calendar" in attrs - ): - time_units = Unit( - variable.getncattr("units"), - calendar=variable.getncattr("calendar"), - ) - start_date = isodate.date_isoformat( - time_units.num2date(variable[0]), - format=isodate.isostrf.DATE_BAS_COMPLETE, - ) - end_date = isodate.date_isoformat( - time_units.num2date(variable[-1]), - format=isodate.isostrf.DATE_BAS_COMPLETE, - ) - break - - if start_date is None or end_date is None: - msg = ( - f"File {file} datetimes do not match a recognized pattern and " - f"time coordinate can not be read from the file" - ) - raise ValueError(msg) - - # Remove potential '-' characters from datetimes - start_date = start_date.replace("-", "") - end_date = end_date.replace("-", "") - - return start_date, end_date - - -def _dates_to_timerange(start_date: int | str, end_date: int | str) -> str: - """Convert ``start_date`` and ``end_date`` to ``timerange``. - - Note - ---- - This function ensures that dates in years format follow the pattern YYYY - (i.e., that they have at least 4 digits). Other formats, such as wildcards - (``'*'``) and relative time ranges (e.g., ``'P6Y'``) are used unchanged. - - Parameters - ---------- - start_date: - Start date. - end_date: - End date. - - Returns - ------- - str - ``timerange`` in the form ``'start_date/end_date'``. - """ - start_date = str(start_date) - end_date = str(end_date) - - # Pad years with 0s if not wildcard or relative time range - if start_date != "*" and not start_date.startswith("P"): - start_date = start_date.zfill(4) - if end_date != "*" and not end_date.startswith("P"): - end_date = end_date.zfill(4) - - return f"{start_date}/{end_date}" - - -def _replace_years_with_timerange(variable: dict[str, Any]) -> None: - """Set `timerange` tag from tags `start_year` and `end_year`.""" - start_year = variable.get("start_year") - end_year = variable.get("end_year") - if start_year and end_year: - variable["timerange"] = _dates_to_timerange(start_year, end_year) - elif start_year: - variable["timerange"] = _dates_to_timerange(start_year, start_year) - elif end_year: - variable["timerange"] = _dates_to_timerange(end_year, end_year) - variable.pop("start_year", None) - variable.pop("end_year", None) - - -def _parse_period(timerange: FacetValue) -> tuple[str, str]: - """Parse `timerange` values given as duration periods. - - Sum the duration periods to the `timerange` value given as a - reference point in order to compute the start and end dates needed - for file selection. - """ - if not isinstance(timerange, str): - msg = f"`timerange` should be a `str`, got {type(timerange)}" - raise TypeError(msg) - start_date: str | None = None - end_date: str | None = None - time_format = None - datetime_format = ( - isodate.DATE_BAS_COMPLETE + "T" + isodate.TIME_BAS_COMPLETE - ) - if timerange.split("/")[0].startswith("P"): - try: - end_date = isodate.parse_datetime(timerange.split("/")[1]) - time_format = datetime_format - except isodate.ISO8601Error: - end_date = isodate.parse_date(timerange.split("/")[1]) - time_format = isodate.DATE_BAS_COMPLETE - delta = isodate.parse_duration(timerange.split("/")[0]) - start_date = end_date - delta - elif timerange.split("/")[1].startswith("P"): - try: - start_date = isodate.parse_datetime(timerange.split("/")[0]) - time_format = datetime_format - except isodate.ISO8601Error: - start_date = isodate.parse_date(timerange.split("/")[0]) - time_format = isodate.DATE_BAS_COMPLETE - delta = isodate.parse_duration(timerange.split("/")[1]) - end_date = start_date + delta - - if time_format == datetime_format: - start_date = str( - isodate.datetime_isoformat(start_date, format=datetime_format), - ) - end_date = str( - isodate.datetime_isoformat(end_date, format=datetime_format), - ) - elif time_format == isodate.DATE_BAS_COMPLETE: - start_date = str( - isodate.date_isoformat(start_date, format=time_format), - ) - end_date = str(isodate.date_isoformat(end_date, format=time_format)) - - if start_date is None: - start_date = timerange.split("/")[0] - if end_date is None: - end_date = timerange.split("/")[1] - - return start_date, end_date - - -def _truncate_dates(date: str, file_date: str) -> tuple[int, int]: - """Truncate dates of different lengths and convert to integers. - - This allows to compare the dates chronologically. For example, this allows - comparisons between the formats 'YYYY' and 'YYYYMM', and 'YYYYMM' and - 'YYYYMMDD'. - - Warning - ------- - This function assumes that the years in ``date`` and ``file_date`` have the - same number of digits. If this is not the case, pad the dates with leading - zeros (e.g., use ``date='0100'`` and ``file_date='199901'`` for a correct - comparison). - """ - date = re.sub("[^0-9]", "", date) - file_date = re.sub("[^0-9]", "", file_date) - if len(date) < len(file_date): - file_date = file_date[0 : len(date)] - elif len(date) > len(file_date): - date = date[0 : len(file_date)] - - return int(date), int(file_date) - - -def _select_files( - filenames: Iterable[LocalFile], - timerange: FacetValue, -) -> list[LocalFile]: - """Select files containing data between a given timerange. - - If the timerange is given as a period, the file selection occurs - taking only the years into account. - - Otherwise, the file selection occurs taking into account the time - resolution of the file. - """ - if not isinstance(timerange, str): - msg = f"`timerange` should be a `str`, got {type(timerange)}" - raise TypeError(msg) - if "*" in timerange: - # TODO: support * combined with a period - return list(filenames) - - selection: list[LocalFile] = [] - - for filename in filenames: - start_date, end_date = _parse_period(timerange) - start, end = _get_start_end_date(filename) - - start_date_int, end_int = _truncate_dates(start_date, end) - end_date_int, start_int = _truncate_dates(end_date, start) - if start_int <= end_date_int and end_int >= start_date_int: - selection.append(filename) - - return selection - - -def _replace_tags( - paths: str | list[str], - variable: Facets, -) -> list[Path]: - """Replace tags in the config-developer's file with actual values.""" - pathset: Iterable[str] - if isinstance(paths, str): - pathset = {paths.strip("/")} - else: - pathset = {path.strip("/") for path in paths} - tlist: set[str] = set() - for path in pathset: - tlist = tlist.union(re.findall(r"{([^}]*)}", path)) - if "sub_experiment" in variable: - new_paths: set[str] = set() - for path in pathset: - new_paths.update( - ( - re.sub(r"(\b{ensemble}\b)", r"{sub_experiment}-\1", path), - re.sub(r"({ensemble})", r"{sub_experiment}-\1", path), - ), - ) - tlist.add("sub_experiment") - pathset = new_paths - - for original_tag in tlist: - tag, _, _ = _get_caps_options(original_tag) - - if tag in variable: - replacewith = variable[tag] - elif tag == "version": - replacewith = "*" - else: - msg = ( - f"Dataset key '{tag}' must be specified for {variable}, check " - f"your recipe entry and/or extra facet file(s)" - ) - raise RecipeError(msg) - pathset = _replace_tag(pathset, original_tag, replacewith) - return [Path(p) for p in pathset] - - -def _replace_tag( - paths: Iterable[str], - tag: str, - replacewith: FacetValue, -) -> list[str]: - """Replace tag by replacewith in paths.""" - _, lower, upper = _get_caps_options(tag) - result: list[str] = [] - if isinstance(replacewith, (list, tuple)): - for item in replacewith: - result.extend(_replace_tag(paths, tag, item)) - else: - text = _apply_caps(str(replacewith), lower, upper) - result.extend(p.replace("{" + tag + "}", text) for p in paths) - return list(set(result)) - - -def _get_caps_options(tag: str) -> tuple[str, bool, bool]: - lower = False - upper = False - if tag.endswith(".lower"): - lower = True - tag = tag[0:-6] - elif tag.endswith(".upper"): - upper = True - tag = tag[0:-6] - return tag, lower, upper - - -def _apply_caps(original: str, lower: bool, upper: bool) -> str: - if lower: - return original.lower() - if upper: - return original.upper() - return original - - def _select_drs(input_type: str, project: str, structure: str) -> list[str]: """Select the directory structure of input path.""" cfg = get_project_config(project) @@ -515,241 +52,6 @@ def _select_drs(input_type: str, project: str, structure: str) -> list[str]: raise KeyError(msg) -@dataclass(order=True) -class LocalDataSource(esmvalcore.io.protocol.DataSource): - """Data source for finding files on a local filesystem.""" - - name: str - """A name identifying the data source.""" - - project: str - """The project that the data source provides data for.""" - - priority: int - """The priority of the data source. Lower values have priority.""" - - debug_info: str = field(init=False, repr=False, default="") - """A string containing debug information when no data is found.""" - - rootpath: Path - """The path where the directories are located.""" - - dirname_template: str - """The template for the directory names.""" - - filename_template: str - """The template for the file names.""" - - ignore_warnings: list[dict[str, Any]] | None = field(default_factory=list) - """Warnings to ignore when loading the data. - - The list should contain :class:`dict`s with keyword arguments that - will be passed to the :func:`warnings.filterwarnings` function when - calling :meth:`LocalFile.to_iris`. - """ - - def __post_init__(self) -> None: - """Set further attributes.""" - self.rootpath = Path(os.path.expandvars(self.rootpath)).expanduser() - self._regex_pattern = self._templates_to_regex() - - def _get_glob_patterns(self, **facets: FacetValue) -> list[Path]: - """Compose the globs that will be used to look for files.""" - dirname_globs = _replace_tags(self.dirname_template, facets) - filename_globs = _replace_tags(self.filename_template, facets) - return sorted( - self.rootpath / d / f - for d in dirname_globs - for f in filename_globs - ) - - def find_data(self, **facets: FacetValue) -> list[LocalFile]: - """Find data locally. - - Parameters - ---------- - **facets : - Find data matching these facets. - - Returns - ------- - : - A list of files. - - """ - facets = dict(facets) - if "original_short_name" in facets: - facets["short_name"] = facets["original_short_name"] - - globs = self._get_glob_patterns(**facets) - self.debug_info = "No files found matching glob pattern " + "\n".join( - str(g) for g in globs - ) - logger.debug("Looking for files matching %s", globs) - - files: list[LocalFile] = [] - for glob_ in globs: - for filename in glob(str(glob_)): - file = LocalFile(filename) - file.facets.update( - self._path2facets( - file, - add_timerange=facets.get("frequency", "fx") != "fx", - ), - ) - file.ignore_warnings = self.ignore_warnings - files.append(file) - - files = _filter_versions_called_latest(files) - - if "version" not in facets: - files = _select_latest_version(files) - - files.sort() # sorting makes it easier to see what was found - - if "timerange" in facets: - files = _select_files(files, facets["timerange"]) - return files - - def _path2facets(self, path: Path, add_timerange: bool) -> dict[str, str]: - """Extract facets from path.""" - facets: dict[str, str] = {} - - if (match := re.search(self._regex_pattern, str(path))) is not None: - for facet, value in match.groupdict().items(): - if value: - facets[facet] = value - - if add_timerange: - try: - start_date, end_date = _get_start_end_date(path) - except ValueError: - pass - else: - facets["timerange"] = _dates_to_timerange(start_date, end_date) - - return facets - - def _templates_to_regex(self) -> str: - r"""Convert template strings to regex pattern. - - The resulting regex pattern can be used to extract facets from paths - using :func:`re.search`. - - Note - ---- - Facets must not contain "/" or "_". - - Examples - -------- - - rootpath: "/root" - dirname_template: "{f2.upper}" - filename_template: "{f3}[._]{f4}*" - --> regex_pattern: - "/root/(?P[^_/]*?)/(?P[^_/]*?)[\._](?P[^_/]*?).*?" - - rootpath: "/root" - dirname_template: "{f1}/{f1}-{f2}" - filename_template: "*.nc" - --> regex_pattern: - "/root/(?P[^_/]*?)/(?P=f1)\-(?P[^_/]*?)/.*?\.nc" - - rootpath: "/root" - dirname_template: "{f1}/{f2}{f3}" - filename_template: "*.nc" - --> regex_pattern: - "/root/(?P[^_/]*?)/(?:[^_/]*?)/.*?\.nc" - - """ - dirname_template = self.dirname_template - filename_template = self.filename_template - - # Templates must not be absolute paths (i.e., start with /), otherwise - # the roopath is ignored (see - # https://docs.python.org/3/library/pathlib.html#operators) - if self.dirname_template.startswith(os.sep): - dirname_template = dirname_template[1:] - if self.filename_template.startswith(os.sep): - filename_template = filename_template[1:] - - pattern = re.escape( - str(self.rootpath / dirname_template / filename_template), - ) - - # Remove all tags that are in between other tags, e.g., - # {tag1}{tag2}{tag3} -> {tag1}{tag2} (there is no way to reliably - # extract facets from those) - pattern = re.sub(r"(?<=\})\\\{[^\}]+?\\\}(?=\\(?=\{))", "", pattern) - - # Replace consecutive tags, e.g. {tag1}{tag2} with non-capturing groups - # (?:[^_/]*?) (there is no way to reliably extract facets from those) - # Note: This assumes that facets do NOT contain / or _ - pattern = re.sub( - r"\\\{[^\{]+?\}\\\{[^\}]+?\\\}", - rf"(?:[^_{os.sep}]*?)", - pattern, - ) - - # Convert tags {tag} to named capture groups (?P[^_/]*?); for - # duplicates use named backreferences (?P=tag) - # Note: This assumes that facets do NOT contain / or _ - already_used_tags: set[str] = set() - for full_tag in re.findall(r"\\\{(.+?)\\\}", pattern): - # Ignore .upper and .lower (full_tag: {tag.lower}, tag: {tag}) - if full_tag.endswith((r"\.upper", r"\.lower")): - tag = full_tag[:-7] - else: - tag = full_tag - - old_str = rf"\{{{full_tag}\}}" - if tag in already_used_tags: - new_str = rf"(?P={tag})" - else: - new_str = rf"(?P<{tag}>[^_{os.sep}]*?)" - already_used_tags.add(tag) - - pattern = pattern.replace(old_str, new_str, 1) - - # Convert fnmatch wildcards * and [] to regex wildcards - pattern = pattern.replace(r"\*", ".*?") - for chars in re.findall(r"\\\[(.*?)\\\]", pattern): - pattern = pattern.replace(rf"\[{chars}\]", f"[{chars}]") - - return pattern - - -class DataSource(LocalDataSource): - """Data source for finding files on a local filesystem. - - .. deprecated:: 2.14.0 - This class is deprecated and will be removed in version 2.16.0. - Please use :class:`esmvalcore.local.LocalDataSource` instead. - """ - - def __init__(self, *args, **kwargs): - msg = ( - "The 'esmvalcore.local.LocalDataSource' class is deprecated and will be " - "removed in version 2.16.0. Please use 'esmvalcore.local.LocalDataSource'" - ) - warnings.warn(msg, DeprecationWarning, stacklevel=2) - super().__init__(*args, **kwargs) - - @property - def regex_pattern(self) -> str: - """Get regex pattern that can be used to extract facets from paths.""" - return self._regex_pattern - - def get_glob_patterns(self, **facets: FacetValue) -> list[Path]: - """Compose the globs that will be used to look for files.""" - return self._get_glob_patterns(**facets) - - def path2facets(self, path: Path, add_timerange: bool) -> dict[str, str]: - """Extract facets from path.""" - return self._path2facets(path, add_timerange) - - def find_files(self, **facets: FacetValue) -> list[LocalFile]: - """Find files.""" - return self.find_data(**facets) - - _ROOTPATH_WARNED: set[tuple[str, tuple[str]]] = set() _LEGACY_DATA_SOURCES_WARNED: set[str] = set() @@ -818,105 +120,38 @@ def _get_data_sources(project: str) -> list[LocalDataSource]: raise KeyError(msg) -def _get_output_file(variable: dict[str, Any], preproc_dir: Path) -> Path: - """Return the full path to the output (preprocessed) file.""" - cfg = get_project_config(variable["project"]) - - # Join different experiment names - if isinstance(variable.get("exp"), (list, tuple)): - variable = dict(variable) - variable["exp"] = "-".join(variable["exp"]) - outfile = _replace_tags(cfg["output_file"], variable)[0] - if "timerange" in variable: - timerange = variable["timerange"].replace("/", "-") - outfile = Path(f"{outfile}_{timerange}") - outfile = Path(f"{outfile}.nc") - return Path( - preproc_dir, - variable.get("diagnostic", ""), - variable.get("variable_group", ""), - outfile, - ) - - -def _get_multiproduct_filename(attributes: dict, preproc_dir: Path) -> Path: - """Get ensemble/multi-model filename depending on settings.""" - relevant_keys = [ - "project", - "dataset", - "exp", - "ensemble_statistics", - "multi_model_statistics", - "mip", - "short_name", - ] - - filename_segments = [] - for key in relevant_keys: - if key in attributes: - attribute = attributes[key] - if isinstance(attribute, (list, tuple)): - attribute = "-".join(attribute) - filename_segments.extend(attribute.split("_")) - - # Remove duplicate segments: - filename_segments = list(dict.fromkeys(filename_segments)) - - # Add time period if possible - if "timerange" in attributes: - filename_segments.append( - f"{attributes['timerange'].replace('/', '-')}", - ) - - filename = f"{'_'.join(filename_segments)}.nc" - return Path( - preproc_dir, - attributes["diagnostic"], - attributes["variable_group"], - filename, - ) - - -def _filter_versions_called_latest( - files: list[LocalFile], -) -> list[LocalFile]: - """Filter out versions called 'latest' if they are duplicates. +class DataSource(LocalDataSource): + """Data source for finding files on a local filesystem. - On compute clusters it is usual to have a symbolic link to the - latest version called 'latest'. Those need to be skipped in order to - find valid version names and avoid duplicate results. + .. deprecated:: 2.14.0 + This class is deprecated and will be removed in version 2.16.0. + Please use :class:`esmvalcore.local.LocalDataSource` instead. """ - resolved_valid_versions = { - f.resolve(strict=False) - for f in files - if f.facets.get("version") != "latest" - } - return [ - f - for f in files - if f.facets.get("version") != "latest" - or f.resolve(strict=False) not in resolved_valid_versions - ] + def __init__(self, *args, **kwargs): + msg = ( + "The 'esmvalcore.local.LocalDataSource' class is deprecated and will be " + "removed in version 2.16.0. Please use 'esmvalcore.local.LocalDataSource'" + ) + warnings.warn(msg, DeprecationWarning, stacklevel=2) + super().__init__(*args, **kwargs) -def _select_latest_version(files: list[LocalFile]) -> list[LocalFile]: - """Select only the latest version of files.""" + @property + def regex_pattern(self) -> str: + """Get regex pattern that can be used to extract facets from paths.""" + return self._regex_pattern - def filename(file): - return file.name + def get_glob_patterns(self, **facets: FacetValue) -> list[Path]: + """Compose the globs that will be used to look for files.""" + return self._get_glob_patterns(**facets) - def version(file): - return file.facets.get("version", "") + def path2facets(self, path: Path, add_timerange: bool) -> dict[str, str]: + """Extract facets from path.""" + return self._path2facets(path, add_timerange) - result = [] - for _, group in itertools.groupby( - sorted(files, key=filename), - key=filename, - ): - duplicates = sorted(group, key=version) - latest = duplicates[-1] - result.append(latest) - return result + def find_files(self, **facets: FacetValue) -> list[LocalFile]: + """Find files.""" + return self.find_data(**facets) def find_files( @@ -1020,111 +255,3 @@ def find_files( globs.extend(data_source._get_glob_patterns(**facets)) # noqa: SLF001 return files, sorted(globs) return files - - -GRIB_FORMATS = (".grib2", ".grib", ".grb2", ".grb", ".gb2", ".gb") -"""GRIB file extensions.""" - - -def _get_attr_from_field_coord( - ncfield: iris.fileformats.cf.CFVariable, - coord_name: str | None, - attr: str, -) -> Any: # noqa: ANN401 - """Get attribute from netCDF field coordinate.""" - if coord_name is not None: - attrs = ncfield.cf_group[coord_name].cf_attrs() - attr_val = [value for (key, value) in attrs if key == attr] - if attr_val: - return attr_val[0] - return None - - -def _restore_lat_lon_units( - cube: iris.cube.Cube, - field: iris.fileformats.cf.CFVariable, - filename: str, # noqa: ARG001 -) -> None: # pylint: disable=unused-argument - """Use this callback to restore the original lat/lon units.""" - # Iris chooses to change longitude and latitude units to degrees - # regardless of value in file, so reinstating file value - for coord in cube.coords(): - if coord.standard_name in ["longitude", "latitude"]: - units = _get_attr_from_field_coord(field, coord.var_name, "units") - if units is not None: - coord.units = units - - -class LocalFile(type(Path()), esmvalcore.io.protocol.DataElement): # type: ignore - """File on the local filesystem.""" - - def prepare(self) -> None: - """Prepare the data for access.""" - - @property - def facets(self) -> Facets: - """Facets are key-value pairs that were used to find this data.""" - if not hasattr(self, "_facets"): - self._facets: Facets = {} - return self._facets - - @facets.setter - def facets(self, value: Facets) -> None: - self._facets = value - - @property - def attributes(self) -> dict[str, Any]: - """Attributes are key-value pairs describing the data.""" - if not hasattr(self, "_attributes"): - msg = ( - "Attributes have not been read yet. Call the `to_iris` method " - "first to read the attributes from the file." - ) - raise ValueError(msg) - return self._attributes - - @attributes.setter - def attributes(self, value: dict[str, Any]) -> None: - self._attributes = value - - @property - def ignore_warnings(self) -> list[dict[str, Any]] | None: - """Warnings to ignore when loading the data. - - The list should contain :class:`dict`s with keyword arguments that - will be passed to the :func:`warnings.filterwarnings` function when - calling the ``to_iris`` method. - """ - if not hasattr(self, "_ignore_warnings"): - self._ignore_warnings: list[dict[str, Any]] | None = None - return self._ignore_warnings - - @ignore_warnings.setter - def ignore_warnings(self, value: list[dict[str, Any]] | None) -> None: - self._ignore_warnings = value - - def to_iris(self) -> iris.cube.CubeList: - """Load the data as Iris cubes. - - Returns - ------- - iris.cube.CubeList - The loaded data. - """ - file = Path(self) - - with ignore_warnings_context(self.ignore_warnings): - # GRIB files need to be loaded with iris.load, otherwise we will - # get separate (lat, lon) slices for each time step, pressure - # level, etc. - if file.suffix in GRIB_FORMATS: - cubes = iris.load(file, callback=_restore_lat_lon_units) - else: - cubes = iris.load_raw(file, callback=_restore_lat_lon_units) - - for cube in cubes: - cube.attributes.globals["source_file"] = str(file) - - # Cache the attributes. - self.attributes = copy.deepcopy(dict(cubes[0].attributes.globals)) - return cubes diff --git a/esmvalcore/preprocessor/_concatenate.py b/esmvalcore/preprocessor/_concatenate.py index f3142ca18a..0ab4ad00e1 100644 --- a/esmvalcore/preprocessor/_concatenate.py +++ b/esmvalcore/preprocessor/_concatenate.py @@ -11,7 +11,7 @@ from iris.cube import CubeList from esmvalcore.cmor.check import CheckLevels -from esmvalcore.esgf.facets import FACETS +from esmvalcore.io.esgf.facets import FACETS from esmvalcore.iris_helpers import merge_cube_attributes from esmvalcore.preprocessor._shared import _rechunk_aux_factory_dependencies diff --git a/esmvalcore/preprocessor/_io.py b/esmvalcore/preprocessor/_io.py index 57a79bf0d6..78914e0f7c 100644 --- a/esmvalcore/preprocessor/_io.py +++ b/esmvalcore/preprocessor/_io.py @@ -20,9 +20,9 @@ from esmvalcore._task import write_ncl_settings from esmvalcore.exceptions import ESMValCoreLoadWarning +from esmvalcore.io.local import LocalFile from esmvalcore.io.protocol import DataElement from esmvalcore.iris_helpers import dataset_to_iris -from esmvalcore.local import LocalFile if TYPE_CHECKING: from collections.abc import Sequence diff --git a/tests/integration/conftest.py b/tests/integration/conftest.py index bb5f4a0324..3fae8cd8e6 100644 --- a/tests/integration/conftest.py +++ b/tests/integration/conftest.py @@ -7,13 +7,13 @@ import iris import pytest -import esmvalcore.local -from esmvalcore.local import ( +import esmvalcore.io.local +from esmvalcore.io.local import ( LocalFile, _replace_tags, - _select_drs, _select_files, ) +from esmvalcore.local import _select_drs if TYPE_CHECKING: from collections.abc import Callable, Iterator @@ -139,7 +139,7 @@ def _get_find_files_func( tracking_id = _tracking_ids() def find_files( - self: esmvalcore.local.LocalDataSource, + self: esmvalcore.io.local.LocalDataSource, *, debug: bool = False, **facets: FacetValue, @@ -156,7 +156,7 @@ def find_files( def patched_datafinder(tmp_path: Path, monkeypatch: pytest.MonkeyPath) -> None: find_files = _get_find_files_func(tmp_path) monkeypatch.setattr( - esmvalcore.local.LocalDataSource, + esmvalcore.io.local.LocalDataSource, "find_data", find_files, ) @@ -169,7 +169,7 @@ def patched_datafinder_grib( ) -> None: find_files = _get_find_files_func(tmp_path, suffix="grib") monkeypatch.setattr( - esmvalcore.local.LocalDataSource, + esmvalcore.io.local.LocalDataSource, "find_data", find_files, ) @@ -192,7 +192,7 @@ def patched_failing_datafinder( tracking_id = _tracking_ids() def find_files( - self: esmvalcore.local.LocalDataSource, + self: esmvalcore.io.local.LocalDataSource, *, debug: bool = False, **facets: FacetValue, @@ -209,7 +209,7 @@ def find_files( return returned_files monkeypatch.setattr( - esmvalcore.local.LocalDataSource, + esmvalcore.io.local.LocalDataSource, "find_data", find_files, ) diff --git a/tests/integration/dataset/test_dataset.py b/tests/integration/dataset/test_dataset.py index 5bbb7e22d3..3f172f01b8 100644 --- a/tests/integration/dataset/test_dataset.py +++ b/tests/integration/dataset/test_dataset.py @@ -54,7 +54,7 @@ def example_data_source(tmp_path: Path) -> dict[str, str]: areacella_tgt.parent.mkdir(parents=True, exist_ok=True) areacella_tgt.symlink_to(areacella_src) return { - "type": "esmvalcore.local.LocalDataSource", + "type": "esmvalcore.io.local.LocalDataSource", "rootpath": str(rootpath), "dirname_template": "{project.lower}/{product}/{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}", "filename_template": "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc", diff --git a/tests/integration/io/__init__.py b/tests/integration/io/__init__.py new file mode 100644 index 0000000000..b9806aae16 --- /dev/null +++ b/tests/integration/io/__init__.py @@ -0,0 +1 @@ +"""Tests for `esmvalcore.io`.""" diff --git a/tests/integration/data_finder.yml b/tests/integration/io/data_finder.yml similarity index 78% rename from tests/integration/data_finder.yml rename to tests/integration/io/data_finder.yml index 6c18253c2b..7288b63cbd 100644 --- a/tests/integration/data_finder.yml +++ b/tests/integration/io/data_finder.yml @@ -1,5 +1,4 @@ --- - get_output_file: - variable: &variable variable_group: test @@ -13,7 +12,7 @@ get_output_file: mip: Amon exp: historical ensemble: r1i1p1 - timerange: '1960/1980' + timerange: "1960/1980" diagnostic: test_diag preprocessor: test_preproc preproc_dir: this/is/a/path @@ -37,7 +36,7 @@ get_output_file: mip: Amon exp: amip channel: Amon - timerange: '1960/1980' + timerange: "1960/1980" diagnostic: test_diag preprocessor: test_preproc preproc_dir: this/is/a/path @@ -54,7 +53,7 @@ get_output_file: exp: piControl channel: CH postproc_flag: -p-mm - timerange: '199001/199002' + timerange: "199001/199002" diagnostic: test_diag preprocessor: test_preproc preproc_dir: this/is/a/path @@ -72,7 +71,7 @@ get_output_file: mip: Amon exp: amip var_type: atm_2d_ml - timerange: '1960/1980' + timerange: "1960/1980" diagnostic: test_diag preprocessor: test_preproc preproc_dir: this/is/a/path @@ -88,7 +87,7 @@ get_output_file: mip: Amon exp: amip var_type: custom_var_type - timerange: '20000101/20000102' + timerange: "20000101/20000102" diagnostic: test_diag preprocessor: test_preproc preproc_dir: this/is/a/path @@ -109,7 +108,7 @@ get_output_file: gcomp: atm scomp: cam type: h0 - timerange: '2000/2002' + timerange: "2000/2002" diagnostic: test_diag preprocessor: test_preproc preproc_dir: this/is/a/path @@ -131,15 +130,16 @@ get_output_file: tdir: tseries tperiod: month_1 string: TREFHT - timerange: '2015/2025' + timerange: "2015/2025" diagnostic: test_diag preprocessor: test_preproc preproc_dir: this/is/a/path output_file: this/is/a/path/test_diag/test/CESM_CESM2_f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_gcomp_scomp_h1_Amon_tas_2015-2025.nc - get_input_filelist: - drs: default + dirname_template: / + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: <<: *variable available_files: @@ -147,13 +147,15 @@ get_input_filelist: - ta_Amon_HadGEM2-ES_historical_r1i1p1_195912-198411.nc - ta_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc dirs: - - '' + - "" file_patterns: - ta_Amon_HadGEM2-ES_historical_r1i1p1*.nc found_files: - ta_Amon_HadGEM2-ES_historical_r1i1p1_195912-198411.nc - drs: default + dirname_template: / + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: variable_group: test short_name: ta @@ -166,7 +168,7 @@ get_input_filelist: mip: Amon exp: historical ensemble: r1i1p1 - timerange: '*' + timerange: "*" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -174,7 +176,7 @@ get_input_filelist: - ta_Amon_HadGEM2-ES_historical_r1i1p1_195912-198411.nc - ta_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc dirs: - - '' + - "" file_patterns: - ta_Amon_HadGEM2-ES_historical_r1i1p1*.nc found_files: @@ -183,6 +185,8 @@ get_input_filelist: - ta_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc - drs: default + dirname_template: / + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" variable: variable_group: test short_name: tro3 @@ -195,7 +199,7 @@ get_input_filelist: mip: Amon exp: historical ensemble: r1i1p1 - timerange: '1960/1980' + timerange: "1960/1980" diagnostic: test_diag preprocessor: test_preproc grid: gn @@ -204,16 +208,18 @@ get_input_filelist: - o3_Amon_HadGEM2-ES_historical_r1i1p1_gn_195912-198411.nc - o3_Amon_HadGEM2-ES_historical_r1i1p1_gn_198412-200511.nc dirs: - - '' + - "" file_patterns: - o3_Amon_HadGEM2-ES_historical_r1i1p1_gn*.nc found_files: - o3_Amon_HadGEM2-ES_historical_r1i1p1_gn_195912-198411.nc - drs: default + dirname_template: / + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: <<: *variable - timerange: '1960/2060' + timerange: "1960/2060" exp: [historical, rcp85] available_files: - ta_Amon_HadGEM2-ES_historical_r1i1p1_193412-195911.nc @@ -221,7 +227,7 @@ get_input_filelist: - ta_Amon_HadGEM2-ES_historical_r1i1p1_198413-200512.nc - ta_Amon_HadGEM2-ES_rcp85_r1i1p1_200601-210012.nc dirs: - - '' + - "" file_patterns: - ta_Amon_HadGEM2-ES_historical_r1i1p1*.nc - ta_Amon_HadGEM2-ES_rcp85_r1i1p1*.nc @@ -231,6 +237,8 @@ get_input_filelist: - ta_Amon_HadGEM2-ES_rcp85_r1i1p1_200601-210012.nc - drs: default + dirname_template: / + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: variable_group: test short_name: ta @@ -242,7 +250,7 @@ get_input_filelist: modeling_realm: [atmos] mip: Amon ensemble: r1i1p1 - timerange: '*' + timerange: "*" diagnostic: test_diag preprocessor: test_preproc exp: [historical, rcp85] @@ -252,7 +260,7 @@ get_input_filelist: - ta_Amon_HadGEM2-ES_historical_r1i1p1_198413-200512.nc - ta_Amon_HadGEM2-ES_rcp85_r1i1p1_200601-210012.nc dirs: - - '' + - "" file_patterns: - ta_Amon_HadGEM2-ES_historical_r1i1p1*.nc - ta_Amon_HadGEM2-ES_rcp85_r1i1p1*.nc @@ -263,33 +271,38 @@ get_input_filelist: - ta_Amon_HadGEM2-ES_rcp85_r1i1p1_200601-210012.nc - drs: default + dirname_template: / + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: <<: *variable - timerange: '2010/2100' + timerange: "2010/2100" available_files: - ta_Amon_HadGEM2-ES_historical_r1i1p1_193412-195911.nc - ta_Amon_HadGEM2-ES_historical_r1i1p1_195912-198411.nc - ta_Amon_HadGEM2-ES_historical_r1i1p1_198413-200512.nc - ta_Amon_HadGEM2-ES_rcp85_r1i1p1_200601-210012.nc dirs: - - '' + - "" file_patterns: - ta_Amon_HadGEM2-ES_historical_r1i1p1*.nc found_files: [] - drs: default + dirname_template: / + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: *variable dirs: - - '' + - "" file_patterns: - ta_Amon_HadGEM2-ES_historical_r1i1p1*.nc found_files: [] - - drs: BADC + dirname_template: "{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: <<: *variable - timerange: '1980/2002' + timerange: "1980/2002" available_files: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_193412-195911.nc - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_195912-198411.nc @@ -307,9 +320,11 @@ get_input_filelist: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20120928/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc - drs: BADC + dirname_template: "{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: <<: *variable - timerange: '2000/2005' + timerange: "2000/2005" version: v20110329 available_files: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc @@ -323,10 +338,12 @@ get_input_filelist: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc - drs: BADC + dirname_template: "{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: <<: *variable - ensemble: '*' - timerange: '2000/2005' + ensemble: "*" + timerange: "2000/2005" available_files: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r2i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r2i1p1_198412-200511.nc @@ -342,9 +359,11 @@ get_input_filelist: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r2i1p1/v20120928/ta/ta_Amon_HadGEM2-ES_historical_r2i1p1_198412-200511.nc - drs: BADC + dirname_template: "{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: <<: *variable - timerange: '1980/2002' + timerange: "1980/2002" available_files: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_193412-195911.nc - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_195912-198411.nc @@ -365,9 +384,11 @@ get_input_filelist: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20120928/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc - drs: DKRZ + dirname_template: "{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: <<: *variable - timerange: '1980/2002' + timerange: "1980/2002" available_files: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110330/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_185912-188411.nc - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110330/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_188412-190911.nc @@ -385,10 +406,12 @@ get_input_filelist: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110330/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc - drs: DKRZ + dirname_template: "{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: <<: *variable exp: [historical, rcp45, rcp85] - timerange: '1980/2100' + timerange: "1980/2100" available_files: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110330/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_185912-188411.nc - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110330/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_188412-190911.nc @@ -416,9 +439,11 @@ get_input_filelist: - MOHC/HadGEM2-ES/rcp85/mon/atmos/Amon/r1i1p1/v20110330/ta/ta_Amon_HadGEM2-ES_rcp85_r1i1p1_200601-210012.nc - drs: ETHZ + dirname_template: "{exp}/{mip}/{short_name}/{dataset}/{ensemble}/" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: <<: *variable - timerange: '1980/2002' + timerange: "1980/2002" available_files: - historical/Amon/ta/HadGEM2-ES/r1i1p1/ta_Amon_HadGEM2-ES_historical_r1i1p1_185912-188411.nc - historical/Amon/ta/HadGEM2-ES/r1i1p1/ta_Amon_HadGEM2-ES_historical_r1i1p1_188412-190911.nc @@ -435,9 +460,11 @@ get_input_filelist: - historical/Amon/ta/HadGEM2-ES/r1i1p1/ta_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc - drs: ETHZ + dirname_template: "{exp}/{mip}/{short_name}/{dataset}/{ensemble}/" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: <<: *variable - timerange: '2000/2100' + timerange: "2000/2100" available_files: - historical/Amon/ta/HadGEM2-ES/r1i1p1/ta_Amon_HadGEM2-ES_historical_r1i1p1_185912-188411.nc - historical/Amon/ta/HadGEM2-ES/r1i1p1/ta_Amon_HadGEM2-ES_historical_r1i1p1_188412-190911.nc @@ -454,9 +481,11 @@ get_input_filelist: - historical/Amon/ta/HadGEM2-ES/r1i1p1/ta_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc - drs: NCI + dirname_template: "{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: <<: *variable - timerange: '1980/2002' + timerange: "1980/2002" available_files: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_193412-195911.nc - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_195912-198411.nc @@ -474,9 +503,11 @@ get_input_filelist: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20120928/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc - drs: NCI + dirname_template: "{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: <<: *variable - timerange: '2000/2005' + timerange: "2000/2005" version: v20110329 available_files: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc @@ -490,10 +521,12 @@ get_input_filelist: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc - drs: NCI + dirname_template: "{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: <<: *variable - ensemble: '*' - timerange: '2000/2005' + ensemble: "*" + timerange: "2000/2005" available_files: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_198412-200511.nc - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r2i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r2i1p1_198412-200511.nc @@ -509,9 +542,11 @@ get_input_filelist: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r2i1p1/v20120928/ta/ta_Amon_HadGEM2-ES_historical_r2i1p1_198412-200511.nc - drs: NCI + dirname_template: "{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: <<: *variable - timerange: '1980/2002' + timerange: "1980/2002" available_files: - MOHC/HadGEM2-ES/historical/mon/atmos/Amon/r1i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_193412-195911.nc - MOHC/HadGEM2-ES/historical/mon/atmo1/Amon/r1i1p1/v20110329/ta/ta_Amon_HadGEM2-ES_historical_r1i1p1_195912-198411.nc @@ -534,6 +569,8 @@ get_input_filelist: # Test other projects - drs: DKRZ + dirname_template: "{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" variable: variable_group: test short_name: ta @@ -548,7 +585,7 @@ get_input_filelist: exp: historical grid: gn ensemble: r1i1p1f1 - timerange: '1999/2000' + timerange: "1999/2000" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -566,6 +603,8 @@ get_input_filelist: - CMIP/MOHC/HadGEM3-GC31-LL/historical/r1i1p1f1/Amon/ta/gn/v20200101/ta_Amon_HadGEM3-GC31-LL_historical_r1i1p1f1_gn_200001-201412.nc - drs: DKRZ + dirname_template: "{exp}/{modeling_realm}/{frequency}/{short_name}/{dataset}/{ensemble}" + filename_template: "{short_name}_*.nc" variable: variable_group: test short_name: ta @@ -578,7 +617,7 @@ get_input_filelist: mip: Amon exp: historical ensemble: r1i1p1 - timerange: '1999/2000' + timerange: "1999/2000" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -595,6 +634,8 @@ get_input_filelist: - historical/atmos/mon/ta/HADGEM1/r1i1p1/ta_HADGEM1_200001-200112.nc - drs: NCI + dirname_template: "{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" variable: variable_group: test short_name: ta @@ -609,7 +650,7 @@ get_input_filelist: exp: historical grid: gn ensemble: r1i1p1f1 - timerange: '1999/2000' + timerange: "1999/2000" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -627,6 +668,8 @@ get_input_filelist: - CMIP/MOHC/HadGEM3-GC31-LL/historical/r1i1p1f1/Amon/ta/gn/v20200101/ta_Amon_HadGEM3-GC31-LL_historical_r1i1p1f1_gn_200001-201412.nc - drs: default + dirname_template: "Tier{tier}/{dataset}" + filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" variable: variable_group: test short_name: tas @@ -638,7 +681,7 @@ get_input_filelist: tier: 3 type: reanaly version: 42 - timerange: '1999/2000' + timerange: "1999/2000" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -656,6 +699,8 @@ get_input_filelist: - Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_42_Amon_tas_200001-201012.nc - drs: default + dirname_template: "Tier{tier}/{dataset}" + filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" variable: variable_group: test short_name: tas @@ -667,7 +712,7 @@ get_input_filelist: tier: 3 type: reanaly version: 42 - timerange: '1999/2000' + timerange: "1999/2000" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -701,7 +746,7 @@ get_input_filelist: simulation: simulation out: Output freq: MO - timerange: '1850/1851' + timerange: "1850/1851" diagnostic: test_diag preprocessor: test_preproc # Next are extra facets, which must be provided explicitly here @@ -718,6 +763,40 @@ get_input_filelist: found_files: - thredds/tgcc/store/p86caub/IPSLCM6/PROD/historical/simulation/ATM/Output/MO/simulation_18500101_18591231_1M_histmth.nc + - dirname_template: "{root}/{account}/{model}/{status}/{exp}/{simulation}/{dir}/{out}/{freq}" + filename_template: "{simulation}_*_{group}.nc" + variable: + variable_group: test + short_name: tas + original_short_name: tas + dataset: IPSL-CM6 + root: thredds/tgcc/store + exp: historical + project: IPSLCM + frequency: mon + mip: Amon + account: p86caub + model: IPSLCM6 + status: PROD + simulation: simulation + out: Output + freq: MO + timerange: "1850/1851" + diagnostic: test_diag + preprocessor: test_preproc + # Next are extra facets, which must be provided explicitly here + ipsl_varname: t2m + group: histmth + dir: ATM + available_files: + - thredds/tgcc/store/p86caub/IPSLCM6/PROD/historical/simulation/ATM/Output/MO/simulation_18500101_18591231_1M_histmth.nc + dirs: + - thredds/tgcc/store/p86caub/IPSLCM6/PROD/historical/simulation/ATM/Output/MO + file_patterns: + - simulation_*_histmth.nc + found_files: + - thredds/tgcc/store/p86caub/IPSLCM6/PROD/historical/simulation/ATM/Output/MO/simulation_18500101_18591231_1M_histmth.nc + - drs: default variable: <<: *ipsl_variable @@ -733,9 +812,26 @@ get_input_filelist: found_files: - thredds/tgcc/store/p86caub/IPSLCM6/PROD/historical/simulation/ATM/Analyse/TS_MO/simulation_18500101_20141231_1M_t2m.nc + - dirname_template: "{root}/{account}/{model}/{status}/{exp}/{simulation}/{dir}/{out}/{freq}" + filename_template: "{simulation}_*_{ipsl_varname}.nc" + variable: + <<: *ipsl_variable + out: Analyse + freq: TS_MO + available_files: + - thredds/tgcc/store/p86caub/IPSLCM6/PROD/historical/simulation/ATM/Analyse/TS_MO/simulation_18500101_20141231_1M_t2m.nc + dirs: + - thredds/tgcc/store/p86caub/IPSLCM6/PROD/historical/simulation/ATM/Analyse/TS_MO + file_patterns: + - simulation_*_t2m.nc + found_files: + - thredds/tgcc/store/p86caub/IPSLCM6/PROD/historical/simulation/ATM/Analyse/TS_MO/simulation_18500101_20141231_1M_t2m.nc + # Test fx files - drs: default + dirname_template: "/" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: variable_group: test short_name: areacella @@ -754,13 +850,15 @@ get_input_filelist: - areacella_fx_HadGEM2-ES_historical_r1i1p1.nc - areacella_fx_HadGEM2-ES_historical_r0i0p0.nc dirs: - - '' + - "" file_patterns: - areacella_fx_HadGEM2-ES_historical_r0i0p0*.nc found_files: - areacella_fx_HadGEM2-ES_historical_r0i0p0.nc - drs: DKRZ + dirname_template: "{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: variable_group: test short_name: sftlf @@ -787,6 +885,8 @@ get_input_filelist: - MOHC/HadGEM2-ES/historical/fx/atmos/fx/r0i0p0/v20110330/sftlf/sftlf_fx_HadGEM2-ES_historical_r0i0p0.nc - drs: DKRZ + dirname_template: "{institute}/{dataset}/{exp}/{frequency}/{modeling_realm}/{mip}/{ensemble}/{version}/{short_name}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}*.nc" variable: variable_group: test short_name: orog @@ -812,6 +912,8 @@ get_input_filelist: found_files: [] - drs: DKRZ + dirname_template: "{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" variable: variable_group: test short_name: areacello @@ -841,6 +943,8 @@ get_input_filelist: - CMIP/MOHC/HadGEM3-GC31-LL/historical/r1i1p1f1/Ofx/areacello/gn/v20200101/areacello_Ofx_HadGEM3-GC31-LL_historical_r1i1p1f1_gn.nc - drs: DKRZ + dirname_template: "{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" variable: variable_group: test short_name: areacello @@ -855,7 +959,7 @@ get_input_filelist: exp: historical grid: gn ensemble: r1i1p1f1 - timerange: '2000/2000' + timerange: "2000/2000" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -871,6 +975,8 @@ get_input_filelist: - CMIP/MOHC/HadGEM3-GC31-LL/historical/r1i1p1f1/Omon/areacello/gn/v20200101/areacello_Omon_HadGEM3-GC31-LL_historical_r1i1p1f1_gn_199901-200012.nc - drs: DKRZ + dirname_template: "{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" variable: variable_group: test short_name: volcello @@ -885,7 +991,7 @@ get_input_filelist: exp: historical grid: gn ensemble: r1i1p1f1 - timerange: '2000/2000' + timerange: "2000/2000" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -900,6 +1006,8 @@ get_input_filelist: found_files: [] - drs: DKRZ + dirname_template: "{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}" + filename_template: "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc" variable: variable_group: test short_name: volcello @@ -914,7 +1022,7 @@ get_input_filelist: exp: historical grid: gn ensemble: r1i1p1f1 - timerange: '2000/2000' + timerange: "2000/2000" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -927,6 +1035,8 @@ get_input_filelist: found_files: [] - drs: DKRZ + dirname_template: "{exp}/{modeling_realm}/{frequency}/{short_name}/{dataset}/{ensemble}" + filename_template: "{short_name}_*.nc" variable: variable_group: test short_name: areacella @@ -952,6 +1062,8 @@ get_input_filelist: - historical/atmos/fx/areacella/HADGEM1/r1i1p1/areacella_HADGEM1.nc - drs: default + dirname_template: "Tier{tier}/{dataset}" + filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" variable: variable_group: test short_name: basin @@ -977,6 +1089,8 @@ get_input_filelist: - Tier3/ERA-Interim/OBS_ERA-Interim_reanaly_42_fx_basin.nc - drs: default + dirname_template: "Tier{tier}/{dataset}" + filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" variable: variable_group: test short_name: deptho @@ -988,7 +1102,7 @@ get_input_filelist: tier: 3 type: reanaly version: 42 - timerange: '1995/1996' + timerange: "1995/1996" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -1005,6 +1119,8 @@ get_input_filelist: - Tier3/ERA-Interim/OBS6_ERA-Interim_reanaly_42_Omon_deptho_199001-199912.nc - drs: default + dirname_template: "Tier{tier}/{dataset}" + filename_template: "{project}_{dataset}_{type}_{version}_{mip}_{short_name}[_.]*nc" variable: variable_group: test short_name: deptho @@ -1016,7 +1132,7 @@ get_input_filelist: tier: 3 type: reanaly version: 42 - timerange: '2050/2100' + timerange: "2050/2100" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -1032,6 +1148,8 @@ get_input_filelist: found_files: [] - drs: default + dirname_template: "Tier{tier}/{dataset}/{version}/{frequency}/{short_name}" + filename_template: "*.nc" variable: short_name: tas dataset: ERA5 @@ -1040,24 +1158,26 @@ get_input_filelist: mip: Amon tier: 3 type: reanaly - timerange: '2000/2010' + timerange: "2000/2010" available_files: - Tier3/ERA5/1/mon/tas/era5_2m_temperature_2000_monthly.nc - Tier3/ERA5/1/mon/tas/era5_2m_temperature_2001_monthly.nc dirs: - Tier3/ERA5/*/mon/tas file_patterns: - - '*.nc' + - "*.nc" found_files: - Tier3/ERA5/1/mon/tas/era5_2m_temperature_2000_monthly.nc - Tier3/ERA5/1/mon/tas/era5_2m_temperature_2001_monthly.nc available_symlinks: - link_name: Tier3/ERA5/latest - target: '1' + target: "1" # EMAC - drs: default + dirname_template: "{exp}/{channel}" + filename_template: "{exp}*{channel}{postproc_flag}.nc" variable: variable_group: test short_name: tas @@ -1068,8 +1188,8 @@ get_input_filelist: mip: Amon exp: amip channel: Amon - postproc_flag: '' - timerange: '200002/200003' + postproc_flag: "" + timerange: "200002/200003" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -1094,6 +1214,8 @@ get_input_filelist: - amip/Amon/amip___________200003_Amon.nc - drs: default + dirname_template: "{exp}/{channel}" + filename_template: "{exp}*{channel}{postproc_flag}.nc" variable: variable_group: test short_name: tas @@ -1105,7 +1227,7 @@ get_input_filelist: exp: amip channel: rad postproc_flag: -p-mm - timerange: '200001/200002' + timerange: "200001/200002" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -1142,7 +1264,7 @@ get_input_filelist: mip: Amon exp: amip var_type: atm_2d_ml - timerange: '200002/200003' + timerange: "200002/200003" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -1162,6 +1284,33 @@ get_input_filelist: - amip/amip_atm_2d_ml_20000201T000000Z.nc - amip/amip_atm_2d_ml_20000301T000000Z.nc + - dirname_template: "{exp}" + filename_template: "{exp}_{var_type}*.nc" + variable: + variable_group: test + short_name: tas + original_short_name: tas + dataset: ICON + project: ICON + frequency: mon + mip: Amon + exp: amip + var_type: atm_2d_ml + timerange: "200002/200003" + diagnostic: test_diag + preprocessor: test_preproc + available_files: + - amip/amip_atm_2d_ml_20000101T000000Z.nc + - amip/amip_atm_2d_ml_20000201T000000Z.nc + - amip/amip_atm_2d_ml_20000301T000000Z.nc + dirs: + - amip + file_patterns: + - amip_atm_2d_ml*.nc + found_files: + - amip/amip_atm_2d_ml_20000201T000000Z.nc + - amip/amip_atm_2d_ml_20000301T000000Z.nc + - drs: default variable: variable_group: test @@ -1173,7 +1322,7 @@ get_input_filelist: mip: Amon exp: amip var_type: var - timerange: '200003/200005' + timerange: "200003/200005" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -1194,6 +1343,32 @@ get_input_filelist: - amip/outdata/amip_var_20000401T000000Z.nc - amip/outdata/amip_var_20000501T000000Z.nc + - dirname_template: "{exp}" + filename_template: "{exp}_{var_type}*.nc" + variable: + variable_group: test + short_name: tas + original_short_name: tas + dataset: ICON + project: ICON + frequency: mon + mip: Amon + exp: amip + var_type: var + timerange: "200003/200005" + diagnostic: test_diag + preprocessor: test_preproc + available_files: + - amip/amip_var_20000101T000000Z.nc + - amip/amip_var_20000201T000000Z.nc + - amip/amip_var_20000301T000000Z.nc + dirs: + - amip + file_patterns: + - amip_var*.nc + found_files: + - amip/amip_var_20000301T000000Z.nc + # CESM2 - drs: default @@ -1209,10 +1384,10 @@ get_input_filelist: gcomp: atm scomp: cam type: h0 - tdir: '' - tperiod: '' - string: '' - timerange: '2000/2002' + tdir: "" + tperiod: "" + string: "" + timerange: "2000/2002" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -1220,7 +1395,7 @@ get_input_filelist: - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.cam.h0.2001.nc - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/atm/hist/f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.cam.h0.2002.nc dirs: - - '' + - "" - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/atm/proc - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/atm/hist file_patterns: @@ -1230,6 +1405,37 @@ get_input_filelist: - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.cam.h0.2001.nc - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/atm/hist/f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.cam.h0.2002.nc + - dirname_template: "/" + filename_template: "{case}.{scomp}.{type}.{string}*nc" + variable: + variable_group: test + short_name: tas + original_short_name: tas + dataset: CESM2 + project: CESM + frequency: mon + mip: Amon + case: f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1 + gcomp: atm + scomp: cam + type: h0 + tdir: "" + tperiod: "" + string: "" + timerange: "2000/2002" + diagnostic: test_diag + preprocessor: test_preproc + available_files: + - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.cam.h0.2000.nc + - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.cam.h0.2001.nc + dirs: + - "" + file_patterns: + - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.cam.h0.*nc + found_files: + - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.cam.h0.2000.nc + - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.cam.h0.2001.nc + - drs: default variable: variable_group: test @@ -1246,7 +1452,7 @@ get_input_filelist: tdir: tseries tperiod: month_1 string: TREFHT - timerange: '2015/2025' + timerange: "2015/2025" diagnostic: test_diag preprocessor: test_preproc available_files: @@ -1257,7 +1463,7 @@ get_input_filelist: - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/gcomp/proc/tseries/month_1/f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.scomp.h1.TREFHT.202001-202912.nc - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/gcomp/proc/tseries/month_1/f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.scomp.h1.TREFHT.203001-203912.nc dirs: - - '' + - "" - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/gcomp/hist - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/gcomp/proc/tseries/month_1 file_patterns: @@ -1266,9 +1472,43 @@ get_input_filelist: - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/gcomp/proc/tseries/month_1/f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.scomp.h1.TREFHT.201001-201912.nc - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/gcomp/proc/tseries/month_1/f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.scomp.h1.TREFHT.202001-202912.nc + - dirname_template: "{case}/{gcomp}/proc/{tdir}/{tperiod}" + filename_template: "{case}.{scomp}.{type}.{string}*nc" + variable: + variable_group: test + short_name: tas + original_short_name: tas + dataset: CESM2 + project: CESM + frequency: mon + mip: Amon + case: f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1 + gcomp: gcomp + scomp: scomp + type: h1 + tdir: tseries + tperiod: month_1 + string: TREFHT + timerange: "2015/2025" + diagnostic: test_diag + preprocessor: test_preproc + available_files: + - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/gcomp/proc/tseries/month_1/f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.scomp.h1.TREFHT.201001-201912.nc + - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/gcomp/proc/tseries/month_1/f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.scomp.h1.TREFHT.202001-202912.nc + - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/gcomp/proc/tseries/month_1/f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.scomp.h1.TREFHT.203001-203912.nc + dirs: + - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/gcomp/proc/tseries/month_1 + file_patterns: + - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.scomp.h1.TREFHT*nc + found_files: + - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/gcomp/proc/tseries/month_1/f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.scomp.h1.TREFHT.201001-201912.nc + - f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1/gcomp/proc/tseries/month_1/f.e21.FHIST_BGC.f09_f09_mg17.CMIP6-AMIP.001_cosp1.scomp.h1.TREFHT.202001-202912.nc + # CORDEX - drs: ESGF + dirname_template: "{project.lower}/output/{domain}/{institute}/{driver}/{exp}/{ensemble}/{dataset}/{rcm_version}/{frequency}/{short_name}/{version}" + filename_template: "{short_name}_{domain}_{driver}_{exp}_{ensemble}_{institute}-{dataset}_{rcm_version}_{mip}*.nc" variable: short_name: tas frequency: mon @@ -1281,8 +1521,8 @@ get_input_filelist: domain: EUR-11 exp: historical ensemble: r12i1p1 - timerange: '1998/2002' - version: 'v20190502' + timerange: "1998/2002" + version: "v20190502" available_files: - cordex/output/EUR-11/ICTP/ICHEC-EC-EARTH/historical/r12i1p1/RegCM4-6/v1/mon/tas/v20190502/tas_EUR-11_ICHEC-EC-EARTH_historical_r12i1p1_ICTP-RegCM4-6_v1_mon_197001-198012.nc - cordex/output/EUR-11/ICTP/ICHEC-EC-EARTH/historical/r12i1p1/RegCM4-6/v1/mon/tas/v20190502/tas_EUR-11_ICHEC-EC-EARTH_historical_r12i1p1_ICTP-RegCM4-6_v1_mon_198101-199012.nc @@ -1297,6 +1537,8 @@ get_input_filelist: - cordex/output/EUR-11/ICTP/ICHEC-EC-EARTH/historical/r12i1p1/RegCM4-6/v1/mon/tas/v20190502/tas_EUR-11_ICHEC-EC-EARTH_historical_r12i1p1_ICTP-RegCM4-6_v1_mon_200101-200512.nc - drs: BADC + dirname_template: "{domain}/{institute}/{driver}/{exp}/{ensemble}/{institute}-{dataset}/{rcm_version}/{mip}/{short_name}/{version}" + filename_template: "{short_name}_{domain}_{driver}_{exp}_{ensemble}_{institute}-{dataset}_{rcm_version}_{mip}*.nc" variable: short_name: tas frequency: mon @@ -1309,8 +1551,8 @@ get_input_filelist: domain: EUR-11 exp: rcp26 ensemble: r1i1p1 - timerange: '2008/2010' - version: 'v20160525' + timerange: "2008/2010" + version: "v20160525" available_files: - EUR-11/MPI-CSC/MPI-M-MPI-ESM-LR/rcp26/r1i1p1/MPI-CSC-REMO2009/v1/mon/tas/v20160525/tas_EUR-11_MPI-M-MPI-ESM-LR_rcp26_r1i1p1_MPI-CSC-REMO2009_v1_mon_200601-201012.nc - EUR-11/MPI-CSC/MPI-M-MPI-ESM-LR/rcp26/r1i1p1/MPI-CSC-REMO2009/v1/mon/tas/v20160525/tas_EUR-11_MPI-M-MPI-ESM-LR_rcp26_r1i1p1_MPI-CSC-REMO2009_v1_mon_201101-202012.nc @@ -1322,6 +1564,8 @@ get_input_filelist: - EUR-11/MPI-CSC/MPI-M-MPI-ESM-LR/rcp26/r1i1p1/MPI-CSC-REMO2009/v1/mon/tas/v20160525/tas_EUR-11_MPI-M-MPI-ESM-LR_rcp26_r1i1p1_MPI-CSC-REMO2009_v1_mon_200601-201012.nc - drs: DKRZ + dirname_template: "{domain}/{institute}/{driver}/{exp}/{ensemble}/{institute}-{dataset}/{rcm_version}/{mip}/{short_name}/{version}" + filename_template: "{short_name}_{domain}_{driver}_{exp}_{ensemble}_{institute}-{dataset}_{rcm_version}_{mip}*.nc" variable: short_name: tas frequency: mon @@ -1334,8 +1578,8 @@ get_input_filelist: domain: EUR-11 exp: historical ensemble: r12i1p1 - timerange: '1998/2002' - version: 'v20190502' + timerange: "1998/2002" + version: "v20190502" available_files: - EUR-11/ICTP/ICHEC-EC-EARTH/historical/r12i1p1/ICTP-RegCM4-6/v1/mon/tas/v20190502/tas_EUR-11_ICHEC-EC-EARTH_historical_r12i1p1_ICTP-RegCM4-6_v1_mon_197001-198012.nc - EUR-11/ICTP/ICHEC-EC-EARTH/historical/r12i1p1/ICTP-RegCM4-6/v1/mon/tas/v20190502/tas_EUR-11_ICHEC-EC-EARTH_historical_r12i1p1_ICTP-RegCM4-6_v1_mon_198101-199012.nc diff --git a/tests/integration/esgf/__init__.py b/tests/integration/io/esgf/__init__.py similarity index 100% rename from tests/integration/esgf/__init__.py rename to tests/integration/io/esgf/__init__.py diff --git a/tests/integration/esgf/search_results/Amon_r1i1p1_historical,rcp85_INM-CM4_CMIP5_tas.json b/tests/integration/io/esgf/search_results/Amon_r1i1p1_historical,rcp85_INM-CM4_CMIP5_tas.json similarity index 100% rename from tests/integration/esgf/search_results/Amon_r1i1p1_historical,rcp85_INM-CM4_CMIP5_tas.json rename to tests/integration/io/esgf/search_results/Amon_r1i1p1_historical,rcp85_INM-CM4_CMIP5_tas.json diff --git a/tests/integration/esgf/search_results/Amon_r1i1p1_historical_FIO-ESM_CMIP5_tas.json b/tests/integration/io/esgf/search_results/Amon_r1i1p1_historical_FIO-ESM_CMIP5_tas.json similarity index 100% rename from tests/integration/esgf/search_results/Amon_r1i1p1_historical_FIO-ESM_CMIP5_tas.json rename to tests/integration/io/esgf/search_results/Amon_r1i1p1_historical_FIO-ESM_CMIP5_tas.json diff --git a/tests/integration/esgf/search_results/Amon_r1i1p1_rcp85_HadGEM2-CC_CMIP5_tas.json b/tests/integration/io/esgf/search_results/Amon_r1i1p1_rcp85_HadGEM2-CC_CMIP5_tas.json similarity index 100% rename from tests/integration/esgf/search_results/Amon_r1i1p1_rcp85_HadGEM2-CC_CMIP5_tas.json rename to tests/integration/io/esgf/search_results/Amon_r1i1p1_rcp85_HadGEM2-CC_CMIP5_tas.json diff --git a/tests/integration/esgf/search_results/EUR-11_MOHC-HadGEM2-ES_r1i1p1_historical_CORDEX_RACMO22E_mon_tas.json b/tests/integration/io/esgf/search_results/EUR-11_MOHC-HadGEM2-ES_r1i1p1_historical_CORDEX_RACMO22E_mon_tas.json similarity index 100% rename from tests/integration/esgf/search_results/EUR-11_MOHC-HadGEM2-ES_r1i1p1_historical_CORDEX_RACMO22E_mon_tas.json rename to tests/integration/io/esgf/search_results/EUR-11_MOHC-HadGEM2-ES_r1i1p1_historical_CORDEX_RACMO22E_mon_tas.json diff --git a/tests/integration/esgf/search_results/expected.yml b/tests/integration/io/esgf/search_results/expected.yml similarity index 100% rename from tests/integration/esgf/search_results/expected.yml rename to tests/integration/io/esgf/search_results/expected.yml diff --git a/tests/integration/esgf/search_results/historical_gn_r4i1p1f1_CMIP6_CESM2_Amon_tas.json b/tests/integration/io/esgf/search_results/historical_gn_r4i1p1f1_CMIP6_CESM2_Amon_tas.json similarity index 100% rename from tests/integration/esgf/search_results/historical_gn_r4i1p1f1_CMIP6_CESM2_Amon_tas.json rename to tests/integration/io/esgf/search_results/historical_gn_r4i1p1f1_CMIP6_CESM2_Amon_tas.json diff --git a/tests/integration/esgf/search_results/inmcm4_CMIP5_tas.json b/tests/integration/io/esgf/search_results/inmcm4_CMIP5_tas.json similarity index 100% rename from tests/integration/esgf/search_results/inmcm4_CMIP5_tas.json rename to tests/integration/io/esgf/search_results/inmcm4_CMIP5_tas.json diff --git a/tests/integration/esgf/search_results/obs4MIPs_CERES-EBAF_mon_rsutcs.json b/tests/integration/io/esgf/search_results/obs4MIPs_CERES-EBAF_mon_rsutcs.json similarity index 100% rename from tests/integration/esgf/search_results/obs4MIPs_CERES-EBAF_mon_rsutcs.json rename to tests/integration/io/esgf/search_results/obs4MIPs_CERES-EBAF_mon_rsutcs.json diff --git a/tests/integration/esgf/search_results/obs4MIPs_GPCP-V2.3_pr.json b/tests/integration/io/esgf/search_results/obs4MIPs_GPCP-V2.3_pr.json similarity index 100% rename from tests/integration/esgf/search_results/obs4MIPs_GPCP-V2.3_pr.json rename to tests/integration/io/esgf/search_results/obs4MIPs_GPCP-V2.3_pr.json diff --git a/tests/integration/esgf/search_results/run1_historical_cccma_cgcm3_1_CMIP3_mon_tas.json b/tests/integration/io/esgf/search_results/run1_historical_cccma_cgcm3_1_CMIP3_mon_tas.json similarity index 100% rename from tests/integration/esgf/search_results/run1_historical_cccma_cgcm3_1_CMIP3_mon_tas.json rename to tests/integration/io/esgf/search_results/run1_historical_cccma_cgcm3_1_CMIP3_mon_tas.json diff --git a/tests/integration/esgf/test_search_download.py b/tests/integration/io/esgf/test_search_download.py similarity index 99% rename from tests/integration/esgf/test_search_download.py rename to tests/integration/io/esgf/test_search_download.py index 685e55c937..d3a9673e17 100644 --- a/tests/integration/esgf/test_search_download.py +++ b/tests/integration/io/esgf/test_search_download.py @@ -6,7 +6,7 @@ import yaml from pyesgf.search.results import FileResult -from esmvalcore.esgf import _search, download, find_files +from esmvalcore.io.esgf import _search, download, find_files VARIABLES = [ { diff --git a/tests/integration/test_local.py b/tests/integration/io/test_local.py similarity index 70% rename from tests/integration/test_local.py rename to tests/integration/io/test_local.py index 633d7b45da..08e217de4c 100644 --- a/tests/integration/test_local.py +++ b/tests/integration/io/test_local.py @@ -1,4 +1,6 @@ -"""Tests for `esmvalcore.local`.""" +"""Tests for `esmvalcore.io.local`.""" + +from __future__ import annotations import os import pprint @@ -8,13 +10,13 @@ import yaml from esmvalcore.config import CFG -from esmvalcore.local import ( +from esmvalcore.io.local import ( + LocalDataSource, LocalFile, _get_output_file, _parse_period, - _select_drs, - find_files, ) +from esmvalcore.local import _select_drs, find_files # Load test configuration with open( @@ -84,6 +86,11 @@ def root(tmp_path): @pytest.mark.parametrize("cfg", CONFIG["get_input_filelist"]) def test_find_files(monkeypatch, root, cfg): """Test retrieving input filelist.""" + if "drs" not in cfg: + pytest.skip( + "Skipping test that depends on multiple patterns, this is intentionally not " + "supported for `LocalDataSource`. Create multiple data sources if you need this.", + ) print( f"Testing DRS {cfg['drs']} with variable:\n", pprint.pformat(cfg["variable"]), @@ -132,6 +139,44 @@ def test_find_files_with_facets(monkeypatch, root): assert input_filelist[0].facets +@pytest.mark.parametrize("cfg", CONFIG["get_input_filelist"]) +def test_find_data(root, cfg): + """Test retrieving input filelist.""" + if "dirname_template" not in cfg: + pytest.skip( + "Skipping test that depends on multiple patterns, this is intentionally not " + "supported for `LocalDataSource`. Create multiple data sources if you need this.", + ) + data_source = LocalDataSource( + name="test-data-source", + project=cfg["variable"]["project"], + rootpath=root, + priority=1, + dirname_template=cfg["dirname_template"], + filename_template=cfg["filename_template"], + ) + print( + f"Testing {data_source} with variable:\n", + pprint.pformat(cfg["variable"]), + ) + create_tree( + root, + cfg.get("available_files"), + cfg.get("available_symlinks"), + ) + + # Find files + input_filelist = data_source.find_data(**cfg["variable"]) + # Test result + ref_files = [Path(root, file) for file in cfg["found_files"]] + ref_globs = [ + Path(root, d, f) for d in cfg["dirs"] for f in cfg["file_patterns"] + ] + assert [Path(f) for f in input_filelist] == sorted(ref_files) + for pattern in ref_globs: + assert str(pattern) in data_source.debug_info + + def test_select_invalid_drs_structure(): msg = ( r"drs _INVALID_STRUCTURE_ for CMIP6 project not specified in " diff --git a/tests/integration/preprocessor/test_preprocessing_task.py b/tests/integration/preprocessor/test_preprocessing_task.py index 20e43a9029..3340a6c5eb 100644 --- a/tests/integration/preprocessor/test_preprocessing_task.py +++ b/tests/integration/preprocessor/test_preprocessing_task.py @@ -9,7 +9,7 @@ import esmvalcore.preprocessor from esmvalcore.dataset import Dataset -from esmvalcore.local import LocalFile +from esmvalcore.io.local import LocalFile from esmvalcore.preprocessor import PreprocessingTask, PreprocessorFile diff --git a/tests/integration/recipe/test_check.py b/tests/integration/recipe/test_check.py index 19fd6b01ca..a9cf809a92 100644 --- a/tests/integration/recipe/test_check.py +++ b/tests/integration/recipe/test_check.py @@ -11,11 +11,11 @@ import pytest import esmvalcore._recipe.check -import esmvalcore.esgf +import esmvalcore.io.esgf from esmvalcore._recipe import check from esmvalcore.dataset import Dataset from esmvalcore.exceptions import RecipeError -from esmvalcore.local import LocalFile +from esmvalcore.io.local import LocalFile from esmvalcore.preprocessor import PreprocessorFile if TYPE_CHECKING: @@ -313,7 +313,9 @@ def test_data_availability_nonexistent(tmp_path): context=None, ) dest_folder = tmp_path - input_files = [esmvalcore.esgf.ESGFFile([result]).local_file(dest_folder)] + input_files = [ + esmvalcore.io.esgf.ESGFFile([result]).local_file(dest_folder), + ] dataset = Dataset(**var) dataset.files = input_files check.data_availability(dataset) diff --git a/tests/integration/recipe/test_recipe.py b/tests/integration/recipe/test_recipe.py index 10801a671e..40a5c09205 100644 --- a/tests/integration/recipe/test_recipe.py +++ b/tests/integration/recipe/test_recipe.py @@ -18,8 +18,9 @@ from nested_lookup import get_occurrence_of_value from PIL import Image -import esmvalcore import esmvalcore._task +import esmvalcore.io.esgf +import esmvalcore.io.local from esmvalcore._recipe.recipe import ( _get_input_datasets, _representative_datasets, @@ -30,7 +31,7 @@ from esmvalcore.config._diagnostics import TAGS from esmvalcore.dataset import Dataset from esmvalcore.exceptions import RecipeError -from esmvalcore.local import _get_output_file +from esmvalcore.io.local import _get_output_file from esmvalcore.preprocessor import DEFAULT_ORDER, PreprocessingTask from tests.integration.test_provenance import check_provenance @@ -2513,12 +2514,12 @@ def test_recipe_run(tmp_path, patched_datafinder, session, mocker): """) mocker.patch.object( - esmvalcore.esgf, + esmvalcore.io.esgf, "download", create_autospec=True, ) mocker.patch.object( - esmvalcore.local.LocalFile, + esmvalcore.io.local.LocalFile, "prepare", create_autospec=True, ) @@ -2530,8 +2531,8 @@ def test_recipe_run(tmp_path, patched_datafinder, session, mocker): recipe.write_html_summary = mocker.Mock() recipe.run() - esmvalcore.esgf.download.assert_called() - esmvalcore.local.LocalFile.prepare.assert_called() + esmvalcore.io.esgf.download.assert_called() + esmvalcore.io.local.LocalFile.prepare.assert_called() recipe.tasks.run.assert_called_once_with( max_parallel_tasks=session["max_parallel_tasks"], ) diff --git a/tests/sample_data/experimental/test_run_recipe.py b/tests/sample_data/experimental/test_run_recipe.py index 7259526815..03e290ea21 100644 --- a/tests/sample_data/experimental/test_run_recipe.py +++ b/tests/sample_data/experimental/test_run_recipe.py @@ -89,7 +89,7 @@ def test_run_recipe( session["remove_preproc_dir"] = False session["projects"]["CMIP6"]["data"] = { "local": { - "type": "esmvalcore.local.LocalDataSource", + "type": "esmvalcore.io.local.LocalDataSource", "rootpath": sample_data_config["rootpath"]["CMIP6"][0], "dirname_template": "{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}", "filename_template": "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc", diff --git a/tests/unit/esgf/__init__.py b/tests/unit/io/esgf/__init__.py similarity index 100% rename from tests/unit/esgf/__init__.py rename to tests/unit/io/esgf/__init__.py diff --git a/tests/unit/esgf/test_download.py b/tests/unit/io/esgf/test_download.py similarity index 97% rename from tests/unit/esgf/test_download.py rename to tests/unit/io/esgf/test_download.py index 886e76404e..931f4fc531 100644 --- a/tests/unit/esgf/test_download.py +++ b/tests/unit/io/esgf/test_download.py @@ -1,4 +1,4 @@ -"""Test `esmvalcore.esgf._download`.""" +"""Test `esmvalcore.io.esgf._download`.""" from __future__ import annotations @@ -15,8 +15,8 @@ import yaml from pyesgf.search.results import FileResult -import esmvalcore.esgf -from esmvalcore.esgf import _download +import esmvalcore.io.esgf +from esmvalcore.io.esgf import _download if TYPE_CHECKING: from pytest_mock import MockerFixture @@ -291,12 +291,12 @@ def test_to_iris(mocker: MockerFixture, esgf_file: _download.ESGFFile) -> None: """Test `ESGFFile.prepare`.""" prepare = mocker.patch.object(_download.ESGFFile, "prepare") local_file_to_iris = mocker.patch.object( - esmvalcore.esgf._download.LocalFile, + esmvalcore.io.esgf._download.LocalFile, "to_iris", return_value=mocker.sentinel.iris_cubes, ) mocker.patch.object( - esmvalcore.esgf._download.LocalFile, + esmvalcore.io.esgf._download.LocalFile, "attributes", new_callable=mocker.PropertyMock, return_value={"attribute": "value"}, @@ -637,10 +637,10 @@ def test_get_download_message(): def test_download(mocker, tmp_path, caplog): - """Test `esmvalcore.esgf.download`.""" + """Test `esmvalcore.io.esgf.download`.""" dest_folder = tmp_path test_files = [ - mocker.create_autospec(esmvalcore.esgf.ESGFFile, instance=True) + mocker.create_autospec(esmvalcore.io.esgf.ESGFFile, instance=True) for _ in range(5) ] for i, file in enumerate(test_files): @@ -650,7 +650,7 @@ def test_download(mocker, tmp_path, caplog): file.__lt__.return_value = False caplog.set_level(logging.INFO) - esmvalcore.esgf.download(test_files, dest_folder) + esmvalcore.io.esgf.download(test_files, dest_folder) for file in test_files: file.download.assert_called_with(dest_folder) @@ -660,10 +660,10 @@ def test_download(mocker, tmp_path, caplog): def test_download_fail(mocker, tmp_path, caplog): - """Test `esmvalcore.esgf.download`.""" + """Test `esmvalcore.io.esgf.download`.""" dest_folder = tmp_path test_files = [ - mocker.create_autospec(esmvalcore.esgf.ESGFFile, instance=True) + mocker.create_autospec(esmvalcore.io.esgf.ESGFFile, instance=True) for _ in range(5) ] for i, file in enumerate(test_files): @@ -683,7 +683,7 @@ def test_download_fail(mocker, tmp_path, caplog): error messages for third file """).strip() with pytest.raises(_download.DownloadError, match=re.escape(msg)): - esmvalcore.esgf.download(test_files, dest_folder) + esmvalcore.io.esgf.download(test_files, dest_folder) assert error0 in caplog.text assert error1 in caplog.text for file in test_files: @@ -693,5 +693,5 @@ def test_download_fail(mocker, tmp_path, caplog): def test_download_noop(mocker: MockerFixture) -> None: """Test downloading no files.""" mock_download = mocker.patch.object(_download.ESGFFile, "_download") - esmvalcore.esgf.download([], dest_folder="/does/not/exist") + esmvalcore.io.esgf.download([], dest_folder="/does/not/exist") mock_download.assert_not_called() diff --git a/tests/unit/esgf/test_facet.py b/tests/unit/io/esgf/test_facet.py similarity index 89% rename from tests/unit/esgf/test_facet.py rename to tests/unit/io/esgf/test_facet.py index 2d27c23867..50b3a5d5b7 100644 --- a/tests/unit/esgf/test_facet.py +++ b/tests/unit/io/esgf/test_facet.py @@ -1,12 +1,12 @@ -"""Test `esmvalcore.esgf.facets`.""" +"""Test `esmvalcore.io.esgf.facets`.""" import pyesgf.search -from esmvalcore.esgf import facets +from esmvalcore.io.esgf import facets def test_create_dataset_map(monkeypatch, mocker): - """Test `esmvalcore.esgf.facets.create_dataset_map`.""" + """Test `esmvalcore.io.esgf.facets.create_dataset_map`.""" monkeypatch.setattr(facets, "FACETS", {"CMIP5": facets.FACETS["CMIP5"]}) conn = mocker.create_autospec( diff --git a/tests/unit/esgf/test_search.py b/tests/unit/io/esgf/test_search.py similarity index 97% rename from tests/unit/esgf/test_search.py rename to tests/unit/io/esgf/test_search.py index a1b8147e89..eca1510966 100644 --- a/tests/unit/esgf/test_search.py +++ b/tests/unit/io/esgf/test_search.py @@ -1,4 +1,4 @@ -"""Test 1esmvalcore.esgf._search`.""" +"""Test 1esmvalcore.io.esgf._search`.""" from __future__ import annotations @@ -13,7 +13,7 @@ from pyesgf.search.results import FileResult import esmvalcore.io.protocol -from esmvalcore.esgf import ESGFDataSource, ESGFFile, _search, find_files +from esmvalcore.io.esgf import ESGFDataSource, ESGFFile, _search, find_files if TYPE_CHECKING: from pytest_mock import MockerFixture @@ -437,14 +437,14 @@ def test_search_unknown_project(): project = "Unknown" msg = ( f"Unable to download from ESGF, because project {project} is not on" - " it or is not supported by the esmvalcore.esgf module." + " it or is not supported by the esmvalcore.io.esgf module." ) with pytest.raises(ValueError, match=msg): find_files(project=project, dataset="", short_name="") class TestESGFDataSource: - """Test `esmvalcore.esgf.ESGFDataSource`.""" + """Test `esmvalcore.io.esgf.ESGFDataSource`.""" def test_init(self) -> None: """Test initialization.""" @@ -467,7 +467,7 @@ def test_find_data(self, mocker: MockerFixture) -> None: mock_result = [mocker.create_autospec(ESGFFile, instance=True)] mock_find_files = mocker.patch( - "esmvalcore.esgf._search.find_files", + "esmvalcore.io.esgf._search.find_files", return_value=mock_result, ) diff --git a/tests/unit/local/__init__.py b/tests/unit/io/local/__init__.py similarity index 100% rename from tests/unit/local/__init__.py rename to tests/unit/io/local/__init__.py diff --git a/tests/unit/local/test_facets.py b/tests/unit/io/local/test_facets.py similarity index 99% rename from tests/unit/local/test_facets.py rename to tests/unit/io/local/test_facets.py index 0332f27693..0db061292e 100644 --- a/tests/unit/local/test_facets.py +++ b/tests/unit/io/local/test_facets.py @@ -2,7 +2,7 @@ import pytest -from esmvalcore.local import LocalDataSource, LocalFile +from esmvalcore.io.local import LocalDataSource, LocalFile @pytest.mark.parametrize( diff --git a/tests/unit/local/test_get_data_sources.py b/tests/unit/io/local/test_get_data_sources.py similarity index 95% rename from tests/unit/local/test_get_data_sources.py rename to tests/unit/io/local/test_get_data_sources.py index 67057562df..8f19709b8a 100644 --- a/tests/unit/local/test_get_data_sources.py +++ b/tests/unit/io/local/test_get_data_sources.py @@ -7,7 +7,8 @@ from esmvalcore.config import CFG from esmvalcore.config._config_validators import validate_config_developer -from esmvalcore.local import DataSource, LocalDataSource, _get_data_sources +from esmvalcore.io.local import LocalDataSource +from esmvalcore.local import DataSource, _get_data_sources if TYPE_CHECKING: import pytest_mock diff --git a/tests/unit/local/test_replace_tags.py b/tests/unit/io/local/test_replace_tags.py similarity index 86% rename from tests/unit/local/test_replace_tags.py rename to tests/unit/io/local/test_replace_tags.py index f8704a4997..8adb8a83dc 100644 --- a/tests/unit/local/test_replace_tags.py +++ b/tests/unit/io/local/test_replace_tags.py @@ -1,11 +1,11 @@ -"""Tests for `_replace_tags` in `esmvalcore.local`.""" +"""Tests for `_replace_tags` in `esmvalcore.io.local`.""" from pathlib import Path import pytest from esmvalcore.exceptions import RecipeError -from esmvalcore.local import _replace_tags +from esmvalcore.io.local import _replace_tags VARIABLE = { "project": "CMIP6", @@ -46,6 +46,17 @@ def test_replace_tags(): ] +def test_replace_tags_with_caps(): + """Test for `_replace_tags` function with .lower and .upper feature.""" + input_file = _replace_tags( + "{short_name.upper}_{mip}_{dataset.lower}_{exp}_{ensemble}_{grid}*.nc", + VARIABLE, + ) + assert input_file == [ + Path("TAS_Amon_accurate-model_experiment_r1i1p1f1_gr*.nc"), + ] + + def test_replace_tags_missing_facet(): """Check that a RecipeError is raised if a required facet is missing.""" paths = ["{short_name}_{missing}_*.nc"] diff --git a/tests/unit/local/test_select_files.py b/tests/unit/io/local/test_select_files.py similarity index 99% rename from tests/unit/local/test_select_files.py rename to tests/unit/io/local/test_select_files.py index 7ecc571ab2..5ab90e05de 100644 --- a/tests/unit/local/test_select_files.py +++ b/tests/unit/io/local/test_select_files.py @@ -1,6 +1,6 @@ import pytest -from esmvalcore.local import _select_files +from esmvalcore.io.local import _select_files def test_select_files(): diff --git a/tests/unit/local/test_time.py b/tests/unit/io/local/test_time.py similarity index 98% rename from tests/unit/local/test_time.py rename to tests/unit/io/local/test_time.py index 5548dfe254..19f0fc4d82 100644 --- a/tests/unit/local/test_time.py +++ b/tests/unit/io/local/test_time.py @@ -1,4 +1,4 @@ -"""Unit tests for time related functions in `esmvalcore.local`.""" +"""Unit tests for time related functions in `esmvalcore.io.local`.""" from pathlib import Path @@ -7,8 +7,8 @@ import pytest from cf_units import Unit -from esmvalcore.esgf import ESGFFile -from esmvalcore.local import ( +from esmvalcore.io.esgf import ESGFFile +from esmvalcore.io.local import ( LocalFile, _dates_to_timerange, _get_start_end_date, diff --git a/tests/unit/local/test_to_iris.py b/tests/unit/io/local/test_to_iris.py similarity index 94% rename from tests/unit/local/test_to_iris.py rename to tests/unit/io/local/test_to_iris.py index cae3db2599..0d7b1b18e7 100644 --- a/tests/unit/local/test_to_iris.py +++ b/tests/unit/io/local/test_to_iris.py @@ -5,7 +5,7 @@ import iris.cube import pytest -from esmvalcore.local import LocalFile, _get_attr_from_field_coord +from esmvalcore.io.local import LocalFile, _get_attr_from_field_coord if TYPE_CHECKING: from pathlib import Path diff --git a/tests/unit/main/test_esmvaltool.py b/tests/unit/main/test_esmvaltool.py index 087c9b091b..1463564046 100644 --- a/tests/unit/main/test_esmvaltool.py +++ b/tests/unit/main/test_esmvaltool.py @@ -10,7 +10,7 @@ import esmvalcore.config import esmvalcore.config._config_object import esmvalcore.config._logging -import esmvalcore.esgf +import esmvalcore.io.esgf from esmvalcore import __version__ from esmvalcore._main import HEADER, ESMValTool from esmvalcore.exceptions import InvalidConfigParameter, RecipeError diff --git a/tests/unit/provenance/test_trackedfile.py b/tests/unit/provenance/test_trackedfile.py index 8035962b49..2a8f8c1ed4 100644 --- a/tests/unit/provenance/test_trackedfile.py +++ b/tests/unit/provenance/test_trackedfile.py @@ -8,8 +8,8 @@ from prov.model import ProvDocument from esmvalcore._provenance import ESMVALTOOL_URI_PREFIX, TrackedFile +from esmvalcore.io.local import LocalFile from esmvalcore.io.protocol import DataElement -from esmvalcore.local import LocalFile if TYPE_CHECKING: import iris.cube diff --git a/tests/unit/recipe/test_recipe.py b/tests/unit/recipe/test_recipe.py index 87c3884846..be5f98a427 100644 --- a/tests/unit/recipe/test_recipe.py +++ b/tests/unit/recipe/test_recipe.py @@ -11,8 +11,8 @@ import esmvalcore.config import esmvalcore.experimental.recipe_output from esmvalcore.dataset import Dataset -from esmvalcore.esgf._download import ESGFFile from esmvalcore.exceptions import RecipeError +from esmvalcore.io.esgf._download import ESGFFile from tests import PreprocessorFile diff --git a/tests/unit/recipe/test_to_datasets.py b/tests/unit/recipe/test_to_datasets.py index 4369a815fa..6e081c8fc3 100644 --- a/tests/unit/recipe/test_to_datasets.py +++ b/tests/unit/recipe/test_to_datasets.py @@ -10,7 +10,7 @@ from esmvalcore._recipe import to_datasets from esmvalcore.dataset import Dataset from esmvalcore.exceptions import RecipeError -from esmvalcore.local import LocalFile +from esmvalcore.io.local import LocalFile if TYPE_CHECKING: import pytest_mock diff --git a/tests/unit/task/test_print.py b/tests/unit/task/test_print.py index 53aad046d3..c033eadf09 100644 --- a/tests/unit/task/test_print.py +++ b/tests/unit/task/test_print.py @@ -8,7 +8,7 @@ from esmvalcore._task import DiagnosticTask from esmvalcore.dataset import Dataset -from esmvalcore.local import LocalFile +from esmvalcore.io.local import LocalFile from esmvalcore.preprocessor import PreprocessingTask, PreprocessorFile diff --git a/tests/unit/test_dataset.py b/tests/unit/test_dataset.py index 63ebd22b91..07cf485be8 100644 --- a/tests/unit/test_dataset.py +++ b/tests/unit/test_dataset.py @@ -12,13 +12,13 @@ import yaml import esmvalcore.dataset -import esmvalcore.esgf -import esmvalcore.local +import esmvalcore.io.esgf +import esmvalcore.io.local from esmvalcore.cmor.check import CheckLevels from esmvalcore.config import CFG, Session from esmvalcore.dataset import Dataset -from esmvalcore.esgf import ESGFFile from esmvalcore.exceptions import InputFilesNotFound, RecipeError +from esmvalcore.io.esgf import ESGFFile if TYPE_CHECKING: from esmvalcore.typing import Facets @@ -562,7 +562,7 @@ def test_from_recipe_with_automatic_supplementary( ): def _find_files(self): if self.facets["short_name"] == "areacello": - file = esmvalcore.local.LocalFile() + file = esmvalcore.io.local.LocalFile() file.facets = { "short_name": "areacello", "mip": "fx", @@ -663,7 +663,7 @@ def find_files(self): def test_from_files(session, monkeypatch): rootpath = Path("/path/to/data") - file1 = esmvalcore.local.LocalFile( + file1 = esmvalcore.io.local.LocalFile( rootpath, "CMIP6", "CMIP", @@ -688,7 +688,7 @@ def test_from_files(session, monkeypatch): "grid": "gn", "version": "v20190827", } - file2 = esmvalcore.local.LocalFile( + file2 = esmvalcore.io.local.LocalFile( rootpath, "CMIP6", "CMIP", @@ -703,7 +703,7 @@ def test_from_files(session, monkeypatch): "tas_Amon_FGOALS-g3_historical_r3i1p1f1_gn_200001-200912.nc", ) file2.facets = dict(file1.facets) - file3 = esmvalcore.local.LocalFile( + file3 = esmvalcore.io.local.LocalFile( rootpath, "CMIP6", "CMIP", @@ -763,7 +763,7 @@ def test_from_files(session, monkeypatch): def test_from_files_with_supplementary(session, monkeypatch): rootpath = Path("/path/to/data") - file1 = esmvalcore.local.LocalFile( + file1 = esmvalcore.io.local.LocalFile( rootpath, "CMIP6", "CMIP", @@ -788,7 +788,7 @@ def test_from_files_with_supplementary(session, monkeypatch): "grid": "gn", "version": "v20190827", } - file2 = esmvalcore.local.LocalFile( + file2 = esmvalcore.io.local.LocalFile( rootpath, "CMIP6", "CMIP", @@ -813,7 +813,7 @@ def test_from_files_with_supplementary(session, monkeypatch): "grid": "gn", "version": "v20210615", } - file3 = esmvalcore.local.LocalFile( + file3 = esmvalcore.io.local.LocalFile( rootpath, "CMIP5", "CMIP", @@ -897,7 +897,7 @@ def test_from_files_with_supplementary(session, monkeypatch): def test_from_files_with_globs(monkeypatch, session): """Test `from_files` with wildcards in dataset and supplementary.""" rootpath = Path("/path/to/data") - file1 = esmvalcore.local.LocalFile( + file1 = esmvalcore.io.local.LocalFile( rootpath, "CMIP6", "CMIP", @@ -924,7 +924,7 @@ def test_from_files_with_globs(monkeypatch, session): "timerange": "185001/201412", "version": "v20181126", } - file2 = esmvalcore.local.LocalFile( + file2 = esmvalcore.io.local.LocalFile( rootpath, "CMIP6", "GMMIP", @@ -1009,7 +1009,7 @@ def test_from_files_with_globs_and_missing_facets(monkeypatch, session): Tests a combination of files with complete facets and missing facets. """ rootpath = Path("/path/to/data") - file1 = esmvalcore.local.LocalFile( + file1 = esmvalcore.io.local.LocalFile( rootpath, "CMIP6", "CMIP", @@ -1036,7 +1036,7 @@ def test_from_files_with_globs_and_missing_facets(monkeypatch, session): "timerange": "185001/201412", "version": "v20181126", } - file2 = esmvalcore.local.LocalFile( + file2 = esmvalcore.io.local.LocalFile( rootpath, "tas", "tas_Amon_BCC-CSM2-MR_historical_r1i1p1f1_gn_185001-201412.nc", @@ -1094,7 +1094,7 @@ def test_from_files_with_globs_and_automatic_missing(monkeypatch, session): added. """ rootpath = Path("/path/to/data") - file = esmvalcore.local.LocalFile( + file = esmvalcore.io.local.LocalFile( rootpath, "CMIP6", "BCC-CSM2-MR", @@ -1163,7 +1163,7 @@ def test_from_files_with_globs_and_automatic_missing(monkeypatch, session): def test_from_files_with_globs_and_only_missing_facets(monkeypatch, session): """Test `from_files` with wildcards and only files with missing facets.""" rootpath = Path("/path/to/data") - file = esmvalcore.local.LocalFile( + file = esmvalcore.io.local.LocalFile( rootpath, "CMIP6", "CMIP", @@ -1482,14 +1482,14 @@ def dataset(): "CMIP6": { "data": { "local": { - "type": "esmvalcore.local.LocalDataSource", + "type": "esmvalcore.io.local.LocalDataSource", "rootpath": Path("/local_dir"), "dirname_template": "{project}/{activity}/{institute}/{dataset}/{exp}/{ensemble}/{mip}/{short_name}/{grid}/{version}", "filename_template": "{short_name}_{mip}_{dataset}_{exp}_{ensemble}_{grid}*.nc", "priority": 1, }, "esgf": { - "type": "esmvalcore.esgf.ESGFDataSource", + "type": "esmvalcore.io.esgf.ESGFDataSource", "download_dir": Path("/download_dir"), "priority": 2, }, @@ -1525,13 +1525,13 @@ def test_find_files(mocker, dataset, local_availability): ) mocker.patch.object( - esmvalcore.local.LocalDataSource, + esmvalcore.io.local.LocalDataSource, "find_data", autospec=True, return_value=list(local_files), ) mocker.patch.object( - esmvalcore.esgf.ESGFDataSource, + esmvalcore.io.esgf.ESGFDataSource, "find_data", autospec=True, return_value=list(esgf_files), @@ -1562,13 +1562,13 @@ def test_find_files_wildcard_timerange(mocker, dataset): ) mocker.patch.object( - esmvalcore.local.LocalDataSource, + esmvalcore.io.local.LocalDataSource, "find_data", autospec=True, return_value=list(local_files), ) mocker.patch.object( - esmvalcore.esgf.ESGFDataSource, + esmvalcore.io.esgf.ESGFDataSource, "find_data", autospec=True, return_value=list(esgf_files), @@ -1599,13 +1599,13 @@ def test_find_files_outdated_local(mocker, dataset): ) mocker.patch.object( - esmvalcore.local.LocalDataSource, + esmvalcore.io.local.LocalDataSource, "find_data", autospec=True, return_value=list(local_files), ) mocker.patch.object( - esmvalcore.esgf.ESGFDataSource, + esmvalcore.io.esgf.ESGFDataSource, "find_data", autospec=True, return_value=list(esgf_files), @@ -1617,11 +1617,11 @@ def test_find_files_outdated_local(mocker, dataset): def test_set_version(): dataset = Dataset(short_name="tas") dataset.add_supplementary(short_name="areacella") - file_v1 = esmvalcore.local.LocalFile("/path/to/v1/tas.nc") + file_v1 = esmvalcore.io.local.LocalFile("/path/to/v1/tas.nc") file_v1.facets["version"] = "v1" - file_v2 = esmvalcore.local.LocalFile("/path/to/v2/tas.nc") + file_v2 = esmvalcore.io.local.LocalFile("/path/to/v2/tas.nc") file_v2.facets["version"] = "v2" - areacella_file = esmvalcore.local.LocalFile("/path/to/v3/areacella.nc") + areacella_file = esmvalcore.io.local.LocalFile("/path/to/v3/areacella.nc") areacella_file.facets["version"] = "v3" dataset.files = [file_v2, file_v1] dataset.supplementaries[0].files = [areacella_file] @@ -1760,7 +1760,9 @@ def mock_preprocess( mocker.patch.object(esmvalcore.dataset, "preprocess", mock_preprocess) - items = [mocker.create_autospec(esmvalcore.local.LocalFile, instance=True)] + items = [ + mocker.create_autospec(esmvalcore.io.local.LocalFile, instance=True), + ] dataset.files = items cube = dataset.load() @@ -2208,7 +2210,7 @@ def test_derivation_necessary_no_force_derivation(tmp_path, session): input_dir = tmp_path / "Tier2" / "SAT" input_dir.mkdir(parents=True, exist_ok=True) - lwcre_file = esmvalcore.local.LocalFile( + lwcre_file = esmvalcore.io.local.LocalFile( input_dir / "OBS6_SAT_sat_1_Amon_lwcre_1980-2000.nc", ) lwcre_file.touch() @@ -2227,7 +2229,7 @@ def test_derivation_necessary_force_derivation(tmp_path, session): input_dir = tmp_path / "Tier2" / "SAT" input_dir.mkdir(parents=True, exist_ok=True) - lwcre_file = esmvalcore.local.LocalFile( + lwcre_file = esmvalcore.io.local.LocalFile( input_dir / "OBS6_SAT_sat_1_Amon_lwcre_1980-2000.nc", ) lwcre_file.touch() diff --git a/tests/unit/test_esgf_facets.py b/tests/unit/test_esgf_facets.py new file mode 100644 index 0000000000..15e4659bdc --- /dev/null +++ b/tests/unit/test_esgf_facets.py @@ -0,0 +1,15 @@ +"""Test the `esmvalcore.esgf.facets` module.""" + +# Note that the esmvalcore.esgf module has been moved to esmvalcore.io.esgf +# and support for importing it as esmvalcore.esgf will be removed in v2.16. +# These test can be removed in v2.16 too. + +import esmvalcore.esgf.facets + + +def test_facets(): + assert isinstance(esmvalcore.esgf.facets.FACETS, dict) + + +def test_dataset_map(): + assert isinstance(esmvalcore.esgf.facets.DATASET_MAP, dict)