Skip to content

Conversation

@troyraen
Copy link
Contributor

@troyraen troyraen commented Nov 4, 2025

Text in the AllWISE parquet notebook makes it sound like AWS credentials are required, which they're not. This PR corrects that and also improves related languaging in the cloud access intro notebook. It also changes the AllWISE ODR link to point directly to the wise-allwise page (which didn't exist yet when the notebook was written) and removes a "Terminology" heading since the word tends to scare people off and the heading isn't necessary here anyway.

@troyraen troyraen requested a review from jkrick November 4, 2025 23:33
@troyraen troyraen added documentation Improvements or additions to documentation content: cloud Content related issues/PRs for notebooks with cloud hosted data relevance labels Nov 4, 2025
Copy link
Contributor

@jkrick jkrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor suggestions, thanks for working on this.

To connect to an S3 bucket we just need to point the reader at S3 instead of the local filesystem.
(Here, a "reader" is a python library that reads parquet files.)
We'll use [pyarrow.fs.S3FileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html) for this because it is recognized by every reader in examples below, and we're already using pyarrow.
[s3fs](https://s3fs.readthedocs.io/en/latest/index.html) is another common option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's covered in the cloud access notebook that I added a link to and I felt that it interrupted the flow of this paragraph much more than it added. Do you think it's particularly valuable here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not critical at all, but seemed small enough and useful enough to leave here, since many users won't follow the link.

[s3fs](https://s3fs.readthedocs.io/en/latest/index.html) is another common option.
The call to `S3FileSystem` will look for AWS credentials in environment variables and/or the file ~/.aws/credentials.
Credentials can also be passed as keyword arguments.
To access without credentials, we'll use the keyword argument `anonymous=True`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To access without credentials, we'll use the keyword argument `anonymous=True`.
To avoid an error about credentials, we'll use the keyword argument `anonymous=True`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for saying the argument helps avoid an error? The same could be said for most arguments. Rereading what I changed above, I see that this mirrors the "To avoid this" that I put in the other notebook. It makes it sound like the keyword argument is a workaround for some problem, but a lack of credentials isn't a problem. Maybe I'll change that a bit to phrase it more positively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not super important, I just thought we should justify the keyword argument as being associated with not needing credentials, and I assume people won't read all the text, so repetition seems appropriate.

@troyraen troyraen merged commit 252f6df into main Nov 6, 2025
8 checks passed
@troyraen troyraen deleted the raen/improve-credentials-info branch November 6, 2025 02:07
github-actions bot pushed a commit that referenced this pull request Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

content: cloud Content related issues/PRs for notebooks with cloud hosted data relevance documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants