Skip to content

Concatonating Files in download_merra2.ipynb raised an error #8

@Tjarke

Description

@Tjarke

I was using the notebook: download_merra2.ipynb and in the section: Setting up the DataFrame the following function raised an error:

 xr.open_mfdataset(file_path, concat_dim='date', preprocess=extract_date)

raised:

xarray ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation

I solved it (for Germany) by changing the code in the following way:


def extract_date(data_set):
    """
    Extracts the date from the filename before merging the datasets. 
    """
    try:
        # The attribute name changed during the development of this script
        # from HDF5_Global.Filename to Filename. 
        if 'HDF5_GLOBAL.Filename' in data_set.attrs:
            f_name = data_set.attrs['HDF5_GLOBAL.Filename']
        elif 'Filename' in data_set.attrs:
            f_name = data_set.attrs['Filename']
        else: 
            raise AttributeError('The attribute name has changed again!')

        # find a match between "." and ".nc4" that does not have "." .
        exp = r'(?<=\.)[^\.]*(?=\.nc4)'
        res = re.search(exp, f_name).group(0)
        # Extract the date. 
        y, m, d = res[0:4], res[4:6], res[6:8]
        date_str = ('%s-%s-%s' % (y, m, d))
        data_set = data_set.assign(date=date_str)
         data_set = data_set.expand_dims("date") 
         data_set.coords["lat"] = [47.5, 48.0, 48.5, 49.0, 49.5, 50.0, 50.5, 51.0, 51.5, 52.0, 52.5, 53.0, 53.5, 54.0, 54.5, 55.0] 
         data_set.coords["lon"] = [5.625, 6.25, 6.875, 7.5, 8.125, 8.75, 9.375, 10.0, 10.625, 11.25, 11.875, 12.5, 13.125, 13.75, 14.375, 15.0] 
         data_set.coords["time"] = list(range(24)) 


        return data_set

    except KeyError:
        # The last dataset is the one all the other sets will be merged into. 
        # Therefore, no date can be extracted.
         data_set.coords["lat"] = [47.5, 48.0, 48.5, 49.0, 49.5, 50.0, 50.5, 51.0, 51.5, 52.0, 52.5, 53.0, 53.5, 54.0, 54.5, 55.0] 
         data_set.coords["lon"] = [5.625, 6.25, 6.875, 7.5, 8.125, 8.75, 9.375, 10.0, 10.625, 11.25, 11.875, 12.5, 13.125, 13.75, 14.375, 15.0] 
         data_set.coords["time"] = list(range(24)) 
        return data_set

and by commenting in the following cell:

df.drop('DISPH', axis=1, inplace=True)
df.drop(['time', 'date'], axis=1, inplace=True)
df.drop(['U2M', 'U10M', 'U50M', 'V2M', 'V10M', 'V50M'], axis=1, inplace=True)

# df['lat'] = df['lat'].apply(lambda x: lat_array[int(x)])
# df['lon'] = df['lon'].apply(lambda x: lon_array[int(x)])

I could not check whether the same error occurred on another machine.

For the rest thank you for writing this awesome notebook!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions