You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tutorials/parquet-catalog-demos/wise-allwise-catalog-demo.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,6 +32,7 @@ kernelspec:
32
32
33
33
This notebook demonstrates access to the [HEALPix](https://ui.adsabs.harvard.edu/abs/2005ApJ...622..759G/abstract)-partitioned (order 5), [Apache Parquet](https://parquet.apache.org/) version of the [AllWISE Source Catalog](https://wise2.ipac.caltech.edu/docs/release/allwise/expsup/sec1_3.html#src_cat).
34
34
The catalog is available through the [AWS Open Data](https://aws.amazon.com/opendata) program, as part of the [NASA Open-Source Science Initiative](https://science.nasa.gov/open-science-overview).
35
+
Access is free and no special permissions or credentials are required.
35
36
36
37
Parquet is convenient for large astronomical catalogs in part because the storage format supports efficient database-style queries on the files themselves, without having to load the catalog into a database (or into memory) first.
37
38
The AllWISE catalog is fairly large at 340 GB.
@@ -65,13 +66,12 @@ from pyarrow.fs import S3FileSystem
65
66
66
67
+++
67
68
68
-
This AllWISE catalog is stored in an [AWS S3](https://aws.amazon.com/s3/) bucket.
69
-
To connect to an S3 bucket we just need to point the reader at S3 instead of the local filesystem, and pass in AWS credentials.
69
+
This AllWISE catalog is stored in an [AWS S3](https://aws.amazon.com/s3/)cloud storage bucket.
70
+
To connect to an S3 bucket we just need to point the reader at S3 instead of the local filesystem.
70
71
(Here, a "reader" is a python library that reads parquet files.)
71
72
We'll use [pyarrow.fs.S3FileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html) for this because it is recognized by every reader in examples below, and we're already using pyarrow.
72
-
[s3fs](https://s3fs.readthedocs.io/en/latest/index.html) is another common option.
73
-
The call to `S3FileSystem` will look for AWS credentials in environment variables and/or the file ~/.aws/credentials.
74
-
Credentials can also be passed as keyword arguments.
73
+
To access without credentials, we'll use the keyword argument `anonymous=True`.
74
+
More information about accessing S3 buckets can be found at [](#cloud-access-intro).
0 commit comments