diff --git a/tutorials/cloud_access/cloud-access-intro.md b/tutorials/cloud_access/cloud-access-intro.md index 2db3169e..455af30a 100644 --- a/tutorials/cloud_access/cloud-access-intro.md +++ b/tutorials/cloud_access/cloud-access-intro.md @@ -27,8 +27,6 @@ Learning Goals: ## 1. Cloud basics -### 1.1 Terminology - AWS S3 is an [object store](https://en.wikipedia.org/wiki/Object_storage) where the fundamental entities are "buckets" and "objects". Buckets are containers for objects, and objects are blobs of data. Users may be more familiar with [filesystem](https://en.wikipedia.org/wiki/File_system) "files" and "directories". @@ -43,7 +41,7 @@ The following S3 terms are also used in this notebook: +++ -### 1.2 General access +### 1.1 General access Most of the common python methods used to read images and catalogs from a local disk can also be pointed at cloud storage buckets. This includes methods like Astropy `fits.open` and Pandas `read_parquet`. @@ -51,7 +49,9 @@ The cloud connection is handled by a separate library, usually [s3fs](https://s3 The IRSA buckets are public and access is free. Credentials are not required. -Anonymous connections can be made, often by setting a keyword argument like `anon=True`. +However, most tools will look for credentials by default and raise an error when none are found. +To access without credentials, users can make an anonymous connection, usually with a keyword argument such as `anon=True`. +This notebook demonstrates with the `s3fs`, `astropy`, and `pyarrow` libraries. +++ diff --git a/tutorials/parquet-catalog-demos/wise-allwise-catalog-demo.md b/tutorials/parquet-catalog-demos/wise-allwise-catalog-demo.md index b1f19d00..8d259f67 100644 --- a/tutorials/parquet-catalog-demos/wise-allwise-catalog-demo.md +++ b/tutorials/parquet-catalog-demos/wise-allwise-catalog-demo.md @@ -31,7 +31,8 @@ kernelspec: ## Introduction This notebook demonstrates access to the [HEALPix](https://ui.adsabs.harvard.edu/abs/2005ApJ...622..759G/abstract)-partitioned (order 5), [Apache Parquet](https://parquet.apache.org/) version of the [AllWISE Source Catalog](https://wise2.ipac.caltech.edu/docs/release/allwise/expsup/sec1_3.html#src_cat). -The catalog is available through the [AWS Open Data](https://aws.amazon.com/opendata) program, as part of the [NASA Open-Source Science Initiative](https://science.nasa.gov/open-science-overview). +The catalog is available through the [AWS Open Data](https://registry.opendata.aws/wise-allwise/) program, as part of the [NASA Open-Source Science Initiative](https://science.nasa.gov/open-science-overview). +Access is free and no special permissions or credentials are required. Parquet is convenient for large astronomical catalogs in part because the storage format supports efficient database-style queries on the files themselves, without having to load the catalog into a database (or into memory) first. The AllWISE catalog is fairly large at 340 GB. @@ -65,13 +66,13 @@ from pyarrow.fs import S3FileSystem +++ -This AllWISE catalog is stored in an [AWS S3](https://aws.amazon.com/s3/) bucket. -To connect to an S3 bucket we just need to point the reader at S3 instead of the local filesystem, and pass in AWS credentials. +This AllWISE catalog is stored in an [AWS S3](https://aws.amazon.com/s3/) cloud storage bucket. +To connect to an S3 bucket we just need to point the reader at S3 instead of the local filesystem. (Here, a "reader" is a python library that reads parquet files.) We'll use [pyarrow.fs.S3FileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html) for this because it is recognized by every reader in examples below, and we're already using pyarrow. -[s3fs](https://s3fs.readthedocs.io/en/latest/index.html) is another common option. -The call to `S3FileSystem` will look for AWS credentials in environment variables and/or the file ~/.aws/credentials. -Credentials can also be passed as keyword arguments. +([s3fs](https://s3fs.readthedocs.io/en/latest/index.html) is another common option.) +To access without credentials, we'll use the keyword argument `anonymous=True`. +More information about accessing S3 buckets can be found at [](#cloud-access-intro). ```{code-cell} ipython3 bucket = "nasa-irsa-wise"