Skip to content

Fail to load the imagenet2012_multilabel dataset #11123

@Yi-Chung-Chen

Description

@Yi-Chung-Chen

Short description
tfds.load('imagenet2012_multilabel') fails during the download phase. The helper file
https://storage.googleapis.com/brain-car-datasets/imagenet-mistakes/human_accuracy_v3.0.0.json returns HTTP 403, raising tensorflow_datasets.core.download.downloader.DownloadError.

I have manually downloaded the ImageNet validation set.

Environment information
Operating System: Ubuntu 22.04
Python version: 3.9.21
tensorflow-datasets/tfds-nightly version: tensorflow-datasets 4.9.3
tensorflow/tf-nightly version: tensorflow 2.19.0

Does the issue still exist with the latest tfds-nightly package (pip install --upgrade tfds-nightly)? Yes / No (please check)

  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ?
    Yes

Reproduction instructions

import tensorflow_datasets as tfds

# Fails with HTTP 403 on a helper JSON file
ds = tfds.load('imagenet2012_multilabel')

If you share a colab, make sure to update the permissions to share it.

Link to logs

ds = tfds.load('imagenet2012_multilabel')
2025-08-25 23:01:29.655977: W external/local_xla/xla/tsl/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Could not resolve hostname', error details: Could not resolve host: metadata.google.internal".
Downloading and preparing dataset 191.13 MiB (download: 191.13 MiB, generated: 2.50 GiB, total: 2.69 GiB) to /home/.../tensorflow_datasets/imagenet2012_multilabel/3.0.0...
Dl Size...: 0 MiB [00:00, ? MiB/s] | 0/1 [00:00<?, ? url/s]
Dl Completed...: 0%| | 0/1 [00:00<?, ? url/s]
Traceback (most recent call last):
File "", line 1, in
File ".../tensorflow_datasets/core/load.py", line 639, in load
_download_and_prepare_builder(dbuilder, download, download_and_prepare_kwargs)
File ".../tensorflow_datasets/core/load.py", line 498, in _download_and_prepare_builder
dbuilder.download_and_prepare(**download_and_prepare_kwargs)
File ".../tensorflow_datasets/core/dataset_builder.py", line 1547, in _download_and_prepare
split_generators = self._split_generators(
File ".../imagenet2012_multilabel_dataset_builder.py", line 115, in _split_generators
_get_multi_labels_and_problematic_images(dl_manager)
File ".../imagenet2012_multilabel_dataset_builder.py", line 48, in _get_multi_labels_and_problematic_images
with tf.io.gfile.GFile(dl_manager.download(_MULTI_LABELS_URL), 'r') as f:
File ".../download_manager.py", line 601, in download
return _map_promise(self._download, url_or_urls)
File ".../downloader.py", line 230, in _sync_download
with _open_url(url, verify=verify) as (response, iter_content):
File ".../downloader.py", line 330, in _assert_status
raise DownloadError(
tensorflow_datasets.core.download.downloader.DownloadError: Failed to get url https://storage.googleapis.com/brain-car-datasets/imagenet-mistakes/human_accuracy_v3.0.0.json. HTTP code: 403

Expected behavior
Dataset downloads and prepares without access-denied errors.

Additional context
It looks like the auxiliary JSON file used to build the multilabel annotations is no longer publicly readable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions