Skip to content

Not working - using earthkit to download CMIP6 data #781

@nicoler-23

Description

@nicoler-23

No longer a bug report, instead a suggested improvement

What happened?

I am trying to download some NetCDF data from the CDS using Earthkit. When downloading the CMIP6 data I need, I get a folder with 3 files:

  1. provenance.json
  2. provenance.png
  3. tasmax_day_CMCC-ESM2_ssp585_r1i1p1f1_gn_20800101-20801231.nc

Earthkit then tries to read all these files, resulting in a warning message "Unknown file type, no reader available." followed by a file path to a .png file. It does correctly load the NetCDF file from the ZIP into memory and I can use it, but it seems like undesired behavior for it to try to load a PNG file (and potentially others) when I am specifying NetCDF.

I have tested this for multiple cases. Below are the example cases I have tried:

These examples threw the same warning

import earthkit.data

# Define request
dataset = "projections-cmip6"
request = {
    'format': 'netcdf',
    'temporal_resolution': 'daily',
    'variable': "daily_maximum_near_surface_air_temperature",
    'experiment': "ssp5_8_5",
    'model': 'cmcc_esm2',
    'year': "2080",
    'month': "01",
    'day': "01",
}

# Download data
ds = earthkit.data.from_source("cds", dataset, request)

Warning: this is 700 MB

import earthkit.data

# Define request
dataset =  "multi-origin-c3s-atlas",                           
request =  {
    "origin": "cmip5",
    "experiment": "historical",
    "domain": "global",
    "period": "1850-2005",
    "variable": "monthly_heavy_precipitation_days",
    "bias_adjustment": "no_bias_adjustment"
    }

# Download data
ds = earthkit.data.from_source("cds", dataset, request)

This did work without the warning

import earthkit.data

# Define request
dataset = "projections-cmip5-daily-pressure-levels",
request =  {
   "experiment": "historical",
    "variable": ["geopotential_height"],
    "model": "access1_0",
    "ensemble_member": "r1i1p1",
    "period": ["19900101-19941231"]
}

# Download data
ds = earthkit.data.from_source("cds", dataset, request)

Note: The example in this link gives the same warning: https://github.com/ecmwf/earthkit-plots/blob/develop/docs/examples/gallery/time-series/cmip6.ipynb

Note the following block of code does this but only reads the NetCDF using cdsapi.

# This code works 
# This code works 
from pathlib import Path
import cdsapi
import os
import xarray as xr
import zipfile

c = cdsapi.Client()  # Key set up in .cdsapirc file

# Define request
dataset = "projections-cmip6"
request = {
    'format': 'zip',
    'temporal_resolution': 'daily',
    'variable': "daily_maximum_near_surface_air_temperature",
    'experiment': "ssp5_8_5",
    'model': 'cmcc_esm2',
    'year': "2080",
    'month': "01",
    'day': "01",
}

# Define filenames
dest = Path('./data')
os.makedirs(dest, exist_ok=True)
zip_path = dest / f"{dataset}_example.zip"
extract_path = zip_path.with_suffix("")

# Download data
c.retrieve(dataset, request, zip_path)

# Manually extract NetCDF file from ZIP
# Based on https://github.com/ecmwf-projects/c3s-atlas/blob/main/c3s_atlas/utils.py
with zipfile.ZipFile(zip_path , 'r') as zip_ref:
    # Get filename inside ZIP
    names = zip_ref.namelist()  # In this example we know we're only downloading one
    name_nc = [filename for filename in names if filename[-3:] == ".nc"][0]

    # Extract
    zip_ref.extract(name_nc, extract_path)

# Open dataset
ds = xr.open_dataset(extract_path/name_nc)
 

What are the steps to reproduce the bug?

import earthkit.data

# Define request
dataset = "projections-cmip6"
request = {
    'format': 'netcdf',
    'temporal_resolution': 'daily',
    'variable': "daily_maximum_near_surface_air_temperature",
    'experiment': "ssp5_8_5",
    'model': 'cmcc_esm2',
    'year': "2080",
    'month': "01",
    'day': "01",
}

# Download data
ds = earthkit.data.from_source("cds", dataset, request)

Version

v0.11.2

Platform (OS and architecture)

Microsoft Windows 11 Enterprise 64-bit operating system, x64-based processor

Relevant log output

2025-08-12 13:36:44,084 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-08-12 13:36:44,356 INFO [2024-09-26T00:00:00] Watch our [Forum](https://forum.ecmwf.int/) for Announcements, news and other discussed topics.
2025-08-12 13:36:45,168 INFO Request ID is 148d75e9-1a82-405b-89b5-caa1b1cd1208
2025-08-12 13:36:45,324 INFO status has been updated to accepted
2025-08-12 13:37:07,031 INFO status has been updated to successful
Unknown file type, no reader available. path=C:\Users\nr2\AppData\Local\Temp\tmpmeer6lxa\cds-d1477e4d87135c5a2a0dd362385cfa389fcf1e9d27c7d2c11c7d82684b59b703.d\provenance.png magic=b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\n\xd3\x00\x00\x03"\x08\x02\x00\x00\x00\x99\xec9+\x00\x00\x00\x06bKGD\x00\xff\x00\xff\x00\xff\xa0\xbd\xa7\x93\x00\x00 \x00IDATx\x9c\xec\xddw' content_type=None

Accompanying data

No response

Organisation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions