Skip to content

dealing with updates to GBIFΒ #23

@LevanBokeria

Description

@LevanBokeria

As mentioned in our meeting, I discovered that some URLs to images get deleted from the GBIF database.

On Baskerville, in folder /bask/projects/v/vjgo8416-amber/data/gbif_download_standalone/dwca_files/ you will find two dwca files, one for Sesiidae downloaded in August 2023 and an updated one downloaded in October 2023.

Those files are also uploaded here on our sharedrive, for those without Baskerville access:

Attached here is also a CSV file for one of the UK species for which I (by chance) noticed that images no longer get downloaded if I point to the October dwca file instead of the August one. The species is "Pyropteron chrysidiformis". I presume similar issue might have occurred with other species too.

@KatrionaGoldmann the result can be easily reproduced by using the 03_download_images/fetch_images_whole_dwca_wrapper.ipynb notebook, and changing the dwca_dir argument to point to the folder containing extracted files from either the October or the August Sesiidae dwca file. The results will show that when pointing to the August file we get some images downloaded, but not when pointing to the October file.

This cannot be an issue with the URLs being broken, because the August dwca files still have the URLs which work. So the URL entries themselves must have been deleted from the October file, or perhaps the whole occurrence records have been deleted, including the URLs.

uksi-moths-keys-nodup-small-Pyropteron-chrysidiformis.csv

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

πŸ«› For discussion or future consideration

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions