You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tutorials/cloud_access/cloud-access-intro.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,8 +27,6 @@ Learning Goals:
27
27
28
28
## 1. Cloud basics
29
29
30
-
### 1.1 Terminology
31
-
32
30
AWS S3 is an [object store](https://en.wikipedia.org/wiki/Object_storage) where the fundamental entities are "buckets" and "objects".
33
31
Buckets are containers for objects, and objects are blobs of data.
34
32
Users may be more familiar with [filesystem](https://en.wikipedia.org/wiki/File_system) "files" and "directories".
@@ -43,15 +41,17 @@ The following S3 terms are also used in this notebook:
43
41
44
42
+++
45
43
46
-
### 1.2 General access
44
+
### 1.1 General access
47
45
48
46
Most of the common python methods used to read images and catalogs from a local disk can also be pointed at cloud storage buckets.
49
47
This includes methods like Astropy `fits.open` and Pandas `read_parquet`.
50
48
The cloud connection is handled by a separate library, usually [s3fs](https://s3fs.readthedocs.io), [fsspec](https://filesystem-spec.readthedocs.io), or [pyarrow.fs](https://arrow.apache.org/docs/python/api/filesystems.html).
51
49
52
50
The IRSA buckets are public and access is free.
53
51
Credentials are not required.
54
-
Anonymous connections can be made, often by setting a keyword argument like `anon=True`.
52
+
However, most tools will look for credentials by default and raise an error when none are found.
53
+
To access without credentials, users can make an anonymous connection, usually with a keyword argument such as `anon=True`.
54
+
This notebook demonstrates with the `s3fs`, `astropy`, and `pyarrow` libraries.
Copy file name to clipboardExpand all lines: tutorials/parquet-catalog-demos/wise-allwise-catalog-demo.md
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,7 +31,8 @@ kernelspec:
31
31
## Introduction
32
32
33
33
This notebook demonstrates access to the [HEALPix](https://ui.adsabs.harvard.edu/abs/2005ApJ...622..759G/abstract)-partitioned (order 5), [Apache Parquet](https://parquet.apache.org/) version of the [AllWISE Source Catalog](https://wise2.ipac.caltech.edu/docs/release/allwise/expsup/sec1_3.html#src_cat).
34
-
The catalog is available through the [AWS Open Data](https://aws.amazon.com/opendata) program, as part of the [NASA Open-Source Science Initiative](https://science.nasa.gov/open-science-overview).
34
+
The catalog is available through the [AWS Open Data](https://registry.opendata.aws/wise-allwise/) program, as part of the [NASA Open-Source Science Initiative](https://science.nasa.gov/open-science-overview).
35
+
Access is free and no special permissions or credentials are required.
35
36
36
37
Parquet is convenient for large astronomical catalogs in part because the storage format supports efficient database-style queries on the files themselves, without having to load the catalog into a database (or into memory) first.
37
38
The AllWISE catalog is fairly large at 340 GB.
@@ -65,13 +66,13 @@ from pyarrow.fs import S3FileSystem
65
66
66
67
+++
67
68
68
-
This AllWISE catalog is stored in an [AWS S3](https://aws.amazon.com/s3/) bucket.
69
-
To connect to an S3 bucket we just need to point the reader at S3 instead of the local filesystem, and pass in AWS credentials.
69
+
This AllWISE catalog is stored in an [AWS S3](https://aws.amazon.com/s3/)cloud storage bucket.
70
+
To connect to an S3 bucket we just need to point the reader at S3 instead of the local filesystem.
70
71
(Here, a "reader" is a python library that reads parquet files.)
71
72
We'll use [pyarrow.fs.S3FileSystem](https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html) for this because it is recognized by every reader in examples below, and we're already using pyarrow.
72
-
[s3fs](https://s3fs.readthedocs.io/en/latest/index.html) is another common option.
73
-
The call to `S3FileSystem` will look for AWS credentials in environment variables and/or the file ~/.aws/credentials.
74
-
Credentials can also be passed as keyword arguments.
73
+
([s3fs](https://s3fs.readthedocs.io/en/latest/index.html) is another common option.)
74
+
To access without credentials, we'll use the keyword argument `anonymous=True`.
75
+
More information about accessing S3 buckets can be found at [](#cloud-access-intro).
0 commit comments