-
Notifications
You must be signed in to change notification settings - Fork 5
Clarify that credentials are not required #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -31,7 +31,8 @@ kernelspec: | |||||
| ## Introduction | ||||||
|
|
||||||
| This notebook demonstrates access to the [HEALPix](https://ui.adsabs.harvard.edu/abs/2005ApJ...622..759G/abstract)-partitioned (order 5), [Apache Parquet](https://parquet.apache.org/) version of the [AllWISE Source Catalog](https://wise2.ipac.caltech.edu/docs/release/allwise/expsup/sec1_3.html#src_cat). | ||||||
| The catalog is available through the [AWS Open Data](https://aws.amazon.com/opendata) program, as part of the [NASA Open-Source Science Initiative](https://science.nasa.gov/open-science-overview). | ||||||
| The catalog is available through the [AWS Open Data](https://registry.opendata.aws/wise-allwise/) program, as part of the [NASA Open-Source Science Initiative](https://science.nasa.gov/open-science-overview). | ||||||
| Access is free and no special permissions or credentials are required. | ||||||
|
|
||||||
| Parquet is convenient for large astronomical catalogs in part because the storage format supports efficient database-style queries on the files themselves, without having to load the catalog into a database (or into memory) first. | ||||||
| The AllWISE catalog is fairly large at 340 GB. | ||||||
|
|
@@ -65,13 +66,12 @@ from pyarrow.fs import S3FileSystem | |||||
|
|
||||||
| +++ | ||||||
|
|
||||||
| This AllWISE catalog is stored in an [AWS S3](https://aws.amazon.com/s3/) bucket. | ||||||
| To connect to an S3 bucket we just need to point the reader at S3 instead of the local filesystem, and pass in AWS credentials. | ||||||
| This AllWISE catalog is stored in an [AWS S3](https://aws.amazon.com/s3/) cloud storage bucket. | ||||||
| To connect to an S3 bucket we just need to point the reader at S3 instead of the local filesystem. | ||||||
| (Here, a "reader" is a python library that reads parquet files.) | ||||||
| We'll use [pyarrow.fs.S3FileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html) for this because it is recognized by every reader in examples below, and we're already using pyarrow. | ||||||
| [s3fs](https://s3fs.readthedocs.io/en/latest/index.html) is another common option. | ||||||
| The call to `S3FileSystem` will look for AWS credentials in environment variables and/or the file ~/.aws/credentials. | ||||||
| Credentials can also be passed as keyword arguments. | ||||||
| To access without credentials, we'll use the keyword argument `anonymous=True`. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the motivation for saying the argument helps avoid an error? The same could be said for most arguments. Rereading what I changed above, I see that this mirrors the "To avoid this" that I put in the other notebook. It makes it sound like the keyword argument is a workaround for some problem, but a lack of credentials isn't a problem. Maybe I'll change that a bit to phrase it more positively.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not super important, I just thought we should justify the keyword argument as being associated with not needing credentials, and I assume people won't read all the text, so repetition seems appropriate. |
||||||
| More information about accessing S3 buckets can be found at [](#cloud-access-intro). | ||||||
|
|
||||||
| ```{code-cell} ipython3 | ||||||
| bucket = "nasa-irsa-wise" | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's covered in the cloud access notebook that I added a link to and I felt that it interrupted the flow of this paragraph much more than it added. Do you think it's particularly valuable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not critical at all, but seemed small enough and useful enough to leave here, since many users won't follow the link.