Skip to content

Conversation

TomAugspurger
Copy link
Contributor

Closes #34626

This works in 1.0.4 I think, so no whatsnew.

I'd like to wait for #34266 to get in first. I'll need to update things.

And in the future, I'd like to deprecate this behavior in favor of something like

pd.read_csv("s3://path/to/key", storage_options={"anon": True})

It should be easy to detect the cases when we need to do that. We just don't have the storage_options kwarg yet.

Closes pandas-dev#34626

This works in 1.0.4 I think, so no whatsnew.
@TomAugspurger TomAugspurger added this to the 1.1 milestone Jun 18, 2020
@TomAugspurger TomAugspurger added the IO CSV read_csv, to_csv label Jun 18, 2020
@TomAugspurger TomAugspurger changed the title REGR: Fixed reading from public S3 buckets with credentials [WIP]REGR: Fixed reading from public S3 buckets with credentials Jun 18, 2020
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment otherwise lgtm

@pytest.mark.slow
def test_read_s3_public():
# ensure we can read from a public bucket with credentials
pytest.importorskip("s3fs")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the skip_if_no decorator from pandas/util/_test_decorators here instead?

@alimcmaster1
Copy link
Member

Ahh just realised we are looking at the same thing @TomAugspurger . I think the case #34626 is trying to address is reading without AWS Credentials right? I've put a test together for this: #34877

@jorisvandenbossche
Copy link
Member

The credentials @TomAugspurger is adding here are dummy (non-working) credentials to avoid a moto failure. But: since this is reading from an actual s3, and nothing is being mocked, moto should not be involved here?

@alimcmaster1
Copy link
Member

The credentials @TomAugspurger is adding here are dummy (non-working) credentials to avoid a moto failure. But: since this is reading from an actual s3, and nothing is being mocked, moto should not be involved here?

Agree - think it should be as simple as: https://github.com/pandas-dev/pandas/pull/34877/files#diff-681243e864617f600083a270b329d17dR36 - unless i'm missing something?

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jun 20, 2020 via email

@alimcmaster1
Copy link
Member

@TomAugspurger - my understanding was:

import pandas as pd
res = pd.read_csv("s3://gdelt-open-data/events/1981.csv", nrows=3)
  1. With AWS Credentials in my Env:
    pandas 1.0.3 & 1.0.4 -> Works

  2. Without AWS Credentials in Env
    1.0.3 - Works
    1.0.4 - NoCredentialsError: Unable to locate credentials

I think (2) was the regression in #34626 - hence my PR here

But also think we should merge the case you are testing here ~ Dummy AWS Creds since maybe that tests (1) here?

@TomAugspurger
Copy link
Contributor Author

Sounds good. Can you add the tests from my PR and I'll just close mine?

Regardless, I think we should wait till #34266 is in.

@alimcmaster1
Copy link
Member

alimcmaster1 commented Jun 22, 2020

Sounds good. Can you add the tests from my PR and I'll just close mine?

Regardless, I think we should wait till #34266 is in.

Sure and apologies for the mix up. I've added this test to the PR here: #34877

Agree re waiting for fsspec.

@TomAugspurger
Copy link
Contributor Author

Closing in favor of #34877.

@TomAugspurger TomAugspurger deleted the s3-test branch June 23, 2020 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

IO CSV read_csv, to_csv

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: s3 reads from public buckets not working

4 participants