Skip to content

Cannot read datapackage from s3 #1596

@barbuz

Description

@barbuz

Overview

I want to use Frictionless datapackages to provide metadata about some collections hosted on s3, but I'm encountering issues when trying to read these files.
I can load the data fine as a Resource, and I can even validate it against a local tableschema, but if I try loading the datapackage I get the following error:

>>> pak = frictionless.Package('s3://rimrep-data-public-development/csiro-seltmp-baseline-surveys-jul22/datapackage.json')
Traceback (most recent call last):
  File "/home/leo/miniconda3/lib/python3.10/site-packages/frictionless/metadata.py", line 306, in metadata_retrieve
    response = session.get(descriptor, stream=True)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/requests/sessions.py", line 600, in get
    return self.request("GET", url, **kwargs)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/requests/sessions.py", line 695, in send
    adapter = self.get_adapter(url=request.url)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/requests/sessions.py", line 792, in get_adapter
    raise InvalidSchema(f"No connection adapters were found for {url!r}")
requests.exceptions.InvalidSchema: No connection adapters were found for 's3://rimrep-data-public-development/csiro-seltmp-baseline-surveys-jul22/datapackage.json'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/leo/miniconda3/lib/python3.10/site-packages/frictionless/package/factory.py", line 38, in __call__
    cls.from_descriptor(source, basepath=basepath, **options),  # type: ignore
  File "/home/leo/miniconda3/lib/python3.10/site-packages/frictionless/metadata.py", line 162, in from_descriptor
    descriptor = cls.metadata_retrieve(descriptor)
  File "/home/leo/miniconda3/lib/python3.10/site-packages/frictionless/metadata.py", line 324, in metadata_retrieve
    raise FrictionlessException(Error(note=note)) from exception
frictionless.exception.FrictionlessException: [package-error] The data package has an error: cannot retrieve metadata "s3://rimrep-data-public-development/csiro-seltmp-baseline-surveys-jul22/datapackage.json" because "No connection adapters were found for 's3://rimrep-data-public-development/csiro-seltmp-baseline-surveys-jul22/datapackage.json'"

I have also tried opening a local copy of the datapackage with its resource path pointing to s3://rimrep-data-public-development/csiro-seltmp-baseline-surveys-jul22/data.parquet/part.0.parquet, but then the validation fails with:

>>> pak.validate()
{'valid': False,
 'stats': {'tasks': 1, 'errors': 1, 'warnings': 0, 'seconds': 0.057},
 'warnings': [],
 'errors': [],
 'tasks': [{'name': 'data',
            'type': 'table',
            'valid': False,
            'place': 's3://rimrep-data-public-development/csiro-seltmp-baseline-surveys-jul22/data.parquet/part.0.parquet',
            'labels': [],
            'stats': {'errors': 1, 'warnings': 0, 'seconds': 0.026},
            'warnings': [],
            'errors': [{'type': 'source-error',
                        'title': 'Source Error',
                        'description': 'Data reading error because of not '
                                       'supported or inconsistent contents.',
                        'message': 'The data source has not supported or has '
                                   'inconsistent contents: '
                                   's3://rimrep-data-public-development/csiro-seltmp-baseline-surveys-jul22/data.parquet/part.0.parquet',
                        'tags': [],
                        'note': 's3://rimrep-data-public-development/csiro-seltmp-baseline-surveys-jul22/data.parquet/part.0.parquet'}]}]}

Finally, I've done some experiments with the CLI but encountered the same errors there too. In particular, trying to validate the remote data against a local tableschema.json file worked, but if the tableschema was also hosted on s3 I got the error "No connection adapters were found for 's3://rimrep-data-public-development/csiro-seltmp-baseline-surveys-jul22/tableschema.json'"

All the files used here should be public, so you can try replicating the issue. Please let me know if I'm doing something wrong or if this is an actual bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggood first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions