Skip to content

Commit dc7e858

Browse files
committed
nit
1 parent 8ca57a3 commit dc7e858

File tree

1 file changed

+28
-27
lines changed

1 file changed

+28
-27
lines changed

doc/demo.ipynb

Lines changed: 28 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22
"cells": [
33
{
44
"cell_type": "markdown",
5-
"id": "registered-bahrain",
5+
"id": "vocal-florida",
66
"metadata": {},
77
"source": [
8-
"# xbatcher demo\n",
8+
"# demo\n",
99
"\n",
1010
"Author: Cindy Chiao\n",
1111
"Last Modified: Nov 16, 2021\n",
@@ -21,7 +21,7 @@
2121
{
2222
"cell_type": "code",
2323
"execution_count": 1,
24-
"id": "engaged-nicaragua",
24+
"id": "stable-worst",
2525
"metadata": {},
2626
"outputs": [],
2727
"source": [
@@ -32,7 +32,7 @@
3232
},
3333
{
3434
"cell_type": "markdown",
35-
"id": "violent-walker",
35+
"id": "stopped-syndication",
3636
"metadata": {},
3737
"source": [
3838
"## Example data\n",
@@ -43,7 +43,7 @@
4343
{
4444
"cell_type": "code",
4545
"execution_count": 2,
46-
"id": "furnished-vanilla",
46+
"id": "involved-insulin",
4747
"metadata": {},
4848
"outputs": [
4949
{
@@ -566,7 +566,7 @@
566566
{
567567
"cell_type": "code",
568568
"execution_count": 3,
569-
"id": "motivated-sustainability",
569+
"id": "extended-springfield",
570570
"metadata": {},
571571
"outputs": [
572572
{
@@ -599,7 +599,7 @@
599599
},
600600
{
601601
"cell_type": "markdown",
602-
"id": "muslim-policy",
602+
"id": "incorrect-spank",
603603
"metadata": {},
604604
"source": [
605605
"## Batch generation\n",
@@ -614,7 +614,7 @@
614614
{
615615
"cell_type": "code",
616616
"execution_count": 4,
617-
"id": "brown-danish",
617+
"id": "superb-riverside",
618618
"metadata": {},
619619
"outputs": [
620620
{
@@ -1040,7 +1040,7 @@
10401040
},
10411041
{
10421042
"cell_type": "markdown",
1043-
"id": "supported-equipment",
1043+
"id": "czech-covering",
10441044
"metadata": {},
10451045
"source": [
10461046
"We can verify that the outputs have the expected shapes. \n",
@@ -1051,7 +1051,7 @@
10511051
{
10521052
"cell_type": "code",
10531053
"execution_count": 5,
1054-
"id": "equal-profile",
1054+
"id": "brilliant-period",
10551055
"metadata": {},
10561056
"outputs": [
10571057
{
@@ -1069,7 +1069,7 @@
10691069
},
10701070
{
10711071
"cell_type": "markdown",
1072-
"id": "approximate-hurricane",
1072+
"id": "polished-change",
10731073
"metadata": {},
10741074
"source": [
10751075
"There are 145 lat points and 192 lon points, thus we're expecting 145 * 192 = 27840 samples in a batch."
@@ -1078,7 +1078,7 @@
10781078
{
10791079
"cell_type": "code",
10801080
"execution_count": 6,
1081-
"id": "identified-prototype",
1081+
"id": "tamil-wagon",
10821082
"metadata": {},
10831083
"outputs": [
10841084
{
@@ -1096,7 +1096,7 @@
10961096
},
10971097
{
10981098
"cell_type": "markdown",
1099-
"id": "tropical-danish",
1099+
"id": "seeing-straight",
11001100
"metadata": {},
11011101
"source": [
11021102
"## Controlling the size/shape of batches\n",
@@ -1107,7 +1107,7 @@
11071107
{
11081108
"cell_type": "code",
11091109
"execution_count": 7,
1110-
"id": "circular-array",
1110+
"id": "confused-consent",
11111111
"metadata": {},
11121112
"outputs": [
11131113
{
@@ -1559,7 +1559,7 @@
15591559
},
15601560
{
15611561
"cell_type": "markdown",
1562-
"id": "broadband-romance",
1562+
"id": "regulated-double",
15631563
"metadata": {},
15641564
"source": [
15651565
"## Last batch behavior\n",
@@ -1570,7 +1570,7 @@
15701570
{
15711571
"cell_type": "code",
15721572
"execution_count": 8,
1573-
"id": "funny-garbage",
1573+
"id": "fatty-satellite",
15741574
"metadata": {},
15751575
"outputs": [
15761576
{
@@ -2005,7 +2005,7 @@
20052005
},
20062006
{
20072007
"cell_type": "markdown",
2008-
"id": "affecting-preview",
2008+
"id": "sought-democracy",
20092009
"metadata": {},
20102010
"source": [
20112011
"## Overlapping inputs\n",
@@ -2017,7 +2017,7 @@
20172017
{
20182018
"cell_type": "code",
20192019
"execution_count": 9,
2020-
"id": "improved-coating",
2020+
"id": "improved-avatar",
20212021
"metadata": {},
20222022
"outputs": [
20232023
{
@@ -2473,7 +2473,7 @@
24732473
},
24742474
{
24752475
"cell_type": "markdown",
2476-
"id": "contrary-throat",
2476+
"id": "raising-acrylic",
24772477
"metadata": {},
24782478
"source": [
24792479
"We can inspect the samples in a batch for a lat/lon pixel, noting that the overlap only applies within a batch and not across. Thus, within the 20 time points in a batch, we can get 11 samples each with 10 time points and 9 time points allowed to overlap."
@@ -2482,7 +2482,7 @@
24822482
{
24832483
"cell_type": "code",
24842484
"execution_count": 10,
2485-
"id": "accepting-hundred",
2485+
"id": "colored-recipe",
24862486
"metadata": {},
24872487
"outputs": [
24882488
{
@@ -2944,7 +2944,7 @@
29442944
},
29452945
{
29462946
"cell_type": "markdown",
2947-
"id": "enclosed-investing",
2947+
"id": "democratic-horse",
29482948
"metadata": {},
29492949
"source": [
29502950
"## Example applications\n",
@@ -2957,7 +2957,7 @@
29572957
{
29582958
"cell_type": "code",
29592959
"execution_count": 11,
2960-
"id": "vulnerable-terminology",
2960+
"id": "fatal-reflection",
29612961
"metadata": {},
29622962
"outputs": [
29632963
{
@@ -3005,7 +3005,7 @@
30053005
},
30063006
{
30073007
"cell_type": "markdown",
3008-
"id": "primary-dance",
3008+
"id": "minus-sphere",
30093009
"metadata": {},
30103010
"source": [
30113011
"We can also use the Xarray's \"stack\" method to transform these into 2D inputs (n_samples, n_features) suitable for other machine learning algorithms implemented in libraries such as [sklearn](https://scikit-learn.org/stable/) and [xgboost](https://xgboost.readthedocs.io/en/stable/). In this case, we are expecting 9 x 9 x 9 = 729 features total."
@@ -3014,7 +3014,7 @@
30143014
{
30153015
"cell_type": "code",
30163016
"execution_count": 12,
3017-
"id": "numerous-computer",
3017+
"id": "future-honey",
30183018
"metadata": {},
30193019
"outputs": [
30203020
{
@@ -3055,14 +3055,15 @@
30553055
},
30563056
{
30573057
"cell_type": "markdown",
3058-
"id": "addressed-collapse",
3058+
"id": "russian-transaction",
30593059
"metadata": {},
30603060
"source": [
30613061
"## What's next?\n",
30623062
"\n",
30633063
"There are many additional useful features that were yet to be implemented in the context of batch generation for downstream machine learning model training purposes. One of the current efforts is adding a set of data loaders (see [working PR for PyTorch data loader](https://github.com/pangeo-data/xbatcher/pull/25)). \n",
30643064
"\n",
30653065
"Additional features of interest can include: \n",
3066+
"\n",
30663067
"1. Handling overlaps across batches. The common use case of batching in machine learning training involves generating all samples, then group them into batches. When overlap is enabled, this yields different results compared to first generating batches then creating possible samples within each batch. \n",
30673068
"\n",
30683069
"2. Shuffling/randomization of samples across batches. It is often desirable for each batch to be grouped randomly instead of along a specific dimension. \n",
@@ -3072,13 +3073,13 @@
30723073
"4. Handling preprocessing steps. For example, data augmentation, scaling/normalization, outlier detection, etc. \n",
30733074
"\n",
30743075
"\n",
3075-
"Interested users are welcomed to submit an issue in GitHub. "
3076+
"More thoughts on 1. and 2. can be found in [this issue](https://github.com/pangeo-data/xbatcher/issues/30). Interested users are welcomed to comment or submit other issues in GitHub. "
30763077
]
30773078
},
30783079
{
30793080
"cell_type": "code",
30803081
"execution_count": null,
3081-
"id": "continent-property",
3082+
"id": "analyzed-saudi",
30823083
"metadata": {},
30833084
"outputs": [],
30843085
"source": []

0 commit comments

Comments
 (0)