Skip to content

Commit 41dc527

Browse files
committed
Address comments
1 parent d430140 commit 41dc527

File tree

1 file changed

+16
-8
lines changed

1 file changed

+16
-8
lines changed

modules/04-data-in-the-cloud/index.ipynb

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@
7676
"There are a couple implications that you should be aware of when working with data on the cloud:\n",
7777
"\n",
7878
"- Pay-as-you-go - Most cloud providers use pay-as-you-go pricing, where you only pay for the storage and services that you use. This can potentially reduce costs, especially upfront costs (e.g., you never need to buy a hard drive). However, **it can be easy to forget about data in storage and continue to pay for it indefinitely**.\n",
79-
"- Time and cost of bringing data to your computer - Hosting the data on the cloud naturally means it's no longer already near your computer's processing resources. Transporting data from the cloud to your computer is expensive, since most cloud providers charge for any data leaving their network, and slow, since the data needs to travel large distances. The primary solution for this is \"data-proximate computing\" which involves running your code on computing resources in the same cloud location as your data. In with \"data-proximate computing\", there are many other ways to make working with data on the cloud cheaper and easier. Let's take a look!"
79+
"- Time and cost of bringing data to your computer - Hosting the data on the cloud naturally means it's no longer already near your computer's processing resources. Transporting data from the cloud to your computer is expensive, since most cloud providers charge for any data leaving their network, and slow, since the data needs to travel large distances. The primary solution for this is \"data-proximate computing\" which involves running your code on computing resources in the same cloud location as your data. For example, I commonly use NASA data products that are hosted on AWS servers in the 'us-west-2' region, which corresponds to Oregon in the figure above. Following the \"data-proximate computing\" paradigm, I use AWS compute resources that are also in Oregon when working with those data, rather than downloading data to use the computing resources on my laptop in North Carolina. In addition to \"data-proximate computing\", there are many other ways to make working with data on the cloud cheaper and easier. Let's take a look!"
8080
]
8181
},
8282
{
@@ -86,7 +86,7 @@
8686
"source": [
8787
"## What is cloud-native data?\n",
8888
"\n",
89-
"Cloud native data are structured for efficient querying across the network. You can learn more about these data in the [CNG data formats guide](https://guide.cloudnativegeo.org/), but here we'll just explore working with data that is, compared to data that isn't, optimized for cloud usage."
89+
"Cloud native data are structured for efficient querying across the network. For this 101 tutorial, you can think of \"the network\" as synonymous with \"the internet\". You can learn more about these data in the [CNG data formats guide](https://guide.cloudnativegeo.org/), but here we'll just explore working with data that is, compared to data that isn't, optimized for cloud usage."
9090
]
9191
},
9292
{
@@ -141,7 +141,15 @@
141141
"id": "e7ba92b7",
142142
"metadata": {},
143143
"source": [
144-
"Open a file without any configuration and find the maximum value."
144+
"Open a file without any configuration and find the maximum value.\n",
145+
"\n",
146+
":::{tip}\n",
147+
"The `%%time` at the start of the next Jupyter cell is \"magic command\" that reports the time to run all the commands inside the cell. The [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/) provides a nice overview of [magic commands for Jupyter/iPython](https://jakevdp.github.io/PythonDataScienceHandbook/01.03-magic-commands.html) if you'd like to dive in more!\n",
148+
":::\n",
149+
"\n",
150+
":::{tip}\n",
151+
"The `with` code block is called a context manager, and is very useful for making sure that opened files are closed after usage, which prevents bugs and memory issues.\n",
152+
":::"
145153
]
146154
},
147155
{
@@ -197,7 +205,7 @@
197205
"id": "0c19a70e",
198206
"metadata": {},
199207
"source": [
200-
"Obstore is another library for interacting with cloud data. It's more verbose than fsspec because explicit usage patterns are core to its design patterns. You may want to use obstore if performance matters a lot to you, since it's very fast."
208+
"Obstore is another library for interacting with cloud data. It's more verbose than fsspec because the design pattern for obstore values explicit usage patterns. You may want to use obstore if performance matters a lot to you, since it's very fast."
201209
]
202210
},
203211
{
@@ -218,7 +226,7 @@
218226
"id": "b3fdbf0e",
219227
"metadata": {},
220228
"source": [
221-
"List the files available following at this prefix on AWS S3 storage. Currently there's no globbing functionality but that will be added in the future to obspec_utils."
229+
"List the files available following at this prefix on AWS S3 storage. Currently there's no globbing (i.e., [file pattern matching](https://en.wikipedia.org/wiki/Glob_(programming))) functionality but that will be added in the future to `obspec_utils`."
222230
]
223231
},
224232
{
@@ -262,7 +270,7 @@
262270
"id": "c5aa9837",
263271
"metadata": {},
264272
"source": [
265-
"We can instead cache the entire file in memory, if we know that we'll need to read most of file (as is needed for finding the mean of a variable)."
273+
"We can instead cache the entire file in memory, if we know that we'll need to read most of file (as is needed for finding the mean of a variable). We do this by using the `ObstoreMemCacheReader` rather than the `ObstoreReader`."
266274
]
267275
},
268276
{
@@ -357,7 +365,7 @@
357365
],
358366
"metadata": {
359367
"kernelspec": {
360-
"display_name": "Python 3 (ipykernel)",
368+
"display_name": "default",
361369
"language": "python",
362370
"name": "python3"
363371
},
@@ -371,7 +379,7 @@
371379
"name": "python",
372380
"nbconvert_exporter": "python",
373381
"pygments_lexer": "ipython3",
374-
"version": "3.13.0"
382+
"version": "3.14.0"
375383
}
376384
},
377385
"nbformat": 4,

0 commit comments

Comments
 (0)