|
2 | 2 | "cells": [
|
3 | 3 | {
|
4 | 4 | "cell_type": "markdown",
|
5 |
| - "id": "vocal-florida", |
| 5 | + "id": "liquid-finland", |
6 | 6 | "metadata": {},
|
7 | 7 | "source": [
|
8 | 8 | "# demo\n",
|
|
21 | 21 | {
|
22 | 22 | "cell_type": "code",
|
23 | 23 | "execution_count": 1,
|
24 |
| - "id": "stable-worst", |
| 24 | + "id": "guided-johnston", |
25 | 25 | "metadata": {},
|
26 | 26 | "outputs": [],
|
27 | 27 | "source": [
|
|
32 | 32 | },
|
33 | 33 | {
|
34 | 34 | "cell_type": "markdown",
|
35 |
| - "id": "stopped-syndication", |
| 35 | + "id": "incorporated-november", |
36 | 36 | "metadata": {},
|
37 | 37 | "source": [
|
38 | 38 | "## Example data\n",
|
|
43 | 43 | {
|
44 | 44 | "cell_type": "code",
|
45 | 45 | "execution_count": 2,
|
46 |
| - "id": "involved-insulin", |
| 46 | + "id": "substantial-puzzle", |
47 | 47 | "metadata": {},
|
48 | 48 | "outputs": [
|
49 | 49 | {
|
|
566 | 566 | {
|
567 | 567 | "cell_type": "code",
|
568 | 568 | "execution_count": 3,
|
569 |
| - "id": "extended-springfield", |
| 569 | + "id": "imported-circus", |
570 | 570 | "metadata": {},
|
571 | 571 | "outputs": [
|
572 | 572 | {
|
|
599 | 599 | },
|
600 | 600 | {
|
601 | 601 | "cell_type": "markdown",
|
602 |
| - "id": "incorrect-spank", |
| 602 | + "id": "portuguese-decline", |
603 | 603 | "metadata": {},
|
604 | 604 | "source": [
|
605 | 605 | "## Batch generation\n",
|
|
614 | 614 | {
|
615 | 615 | "cell_type": "code",
|
616 | 616 | "execution_count": 4,
|
617 |
| - "id": "superb-riverside", |
| 617 | + "id": "rocky-dealer", |
618 | 618 | "metadata": {},
|
619 | 619 | "outputs": [
|
620 | 620 | {
|
|
1040 | 1040 | },
|
1041 | 1041 | {
|
1042 | 1042 | "cell_type": "markdown",
|
1043 |
| - "id": "czech-covering", |
| 1043 | + "id": "smoking-acrobat", |
1044 | 1044 | "metadata": {},
|
1045 | 1045 | "source": [
|
1046 | 1046 | "We can verify that the outputs have the expected shapes. \n",
|
|
1051 | 1051 | {
|
1052 | 1052 | "cell_type": "code",
|
1053 | 1053 | "execution_count": 5,
|
1054 |
| - "id": "brilliant-period", |
| 1054 | + "id": "looking-journalism", |
1055 | 1055 | "metadata": {},
|
1056 | 1056 | "outputs": [
|
1057 | 1057 | {
|
|
1069 | 1069 | },
|
1070 | 1070 | {
|
1071 | 1071 | "cell_type": "markdown",
|
1072 |
| - "id": "polished-change", |
| 1072 | + "id": "cellular-designer", |
1073 | 1073 | "metadata": {},
|
1074 | 1074 | "source": [
|
1075 | 1075 | "There are 145 lat points and 192 lon points, thus we're expecting 145 * 192 = 27840 samples in a batch."
|
|
1078 | 1078 | {
|
1079 | 1079 | "cell_type": "code",
|
1080 | 1080 | "execution_count": 6,
|
1081 |
| - "id": "tamil-wagon", |
| 1081 | + "id": "accurate-arthur", |
1082 | 1082 | "metadata": {},
|
1083 | 1083 | "outputs": [
|
1084 | 1084 | {
|
|
1096 | 1096 | },
|
1097 | 1097 | {
|
1098 | 1098 | "cell_type": "markdown",
|
1099 |
| - "id": "seeing-straight", |
| 1099 | + "id": "fewer-transfer", |
1100 | 1100 | "metadata": {},
|
1101 | 1101 | "source": [
|
1102 | 1102 | "## Controlling the size/shape of batches\n",
|
|
1107 | 1107 | {
|
1108 | 1108 | "cell_type": "code",
|
1109 | 1109 | "execution_count": 7,
|
1110 |
| - "id": "confused-consent", |
| 1110 | + "id": "charming-drive", |
1111 | 1111 | "metadata": {},
|
1112 | 1112 | "outputs": [
|
1113 | 1113 | {
|
|
1559 | 1559 | },
|
1560 | 1560 | {
|
1561 | 1561 | "cell_type": "markdown",
|
1562 |
| - "id": "regulated-double", |
| 1562 | + "id": "specialized-realtor", |
1563 | 1563 | "metadata": {},
|
1564 | 1564 | "source": [
|
1565 | 1565 | "## Last batch behavior\n",
|
|
1570 | 1570 | {
|
1571 | 1571 | "cell_type": "code",
|
1572 | 1572 | "execution_count": 8,
|
1573 |
| - "id": "fatty-satellite", |
| 1573 | + "id": "broadband-solid", |
1574 | 1574 | "metadata": {},
|
1575 | 1575 | "outputs": [
|
1576 | 1576 | {
|
|
2005 | 2005 | },
|
2006 | 2006 | {
|
2007 | 2007 | "cell_type": "markdown",
|
2008 |
| - "id": "sought-democracy", |
| 2008 | + "id": "boring-slide", |
2009 | 2009 | "metadata": {},
|
2010 | 2010 | "source": [
|
2011 | 2011 | "## Overlapping inputs\n",
|
|
2017 | 2017 | {
|
2018 | 2018 | "cell_type": "code",
|
2019 | 2019 | "execution_count": 9,
|
2020 |
| - "id": "improved-avatar", |
| 2020 | + "id": "fossil-wonder", |
2021 | 2021 | "metadata": {},
|
2022 | 2022 | "outputs": [
|
2023 | 2023 | {
|
|
2473 | 2473 | },
|
2474 | 2474 | {
|
2475 | 2475 | "cell_type": "markdown",
|
2476 |
| - "id": "raising-acrylic", |
| 2476 | + "id": "direct-mason", |
2477 | 2477 | "metadata": {},
|
2478 | 2478 | "source": [
|
2479 | 2479 | "We can inspect the samples in a batch for a lat/lon pixel, noting that the overlap only applies within a batch and not across. Thus, within the 20 time points in a batch, we can get 11 samples each with 10 time points and 9 time points allowed to overlap."
|
|
2482 | 2482 | {
|
2483 | 2483 | "cell_type": "code",
|
2484 | 2484 | "execution_count": 10,
|
2485 |
| - "id": "colored-recipe", |
| 2485 | + "id": "instructional-criticism", |
2486 | 2486 | "metadata": {},
|
2487 | 2487 | "outputs": [
|
2488 | 2488 | {
|
|
2944 | 2944 | },
|
2945 | 2945 | {
|
2946 | 2946 | "cell_type": "markdown",
|
2947 |
| - "id": "democratic-horse", |
| 2947 | + "id": "ranging-cologne", |
2948 | 2948 | "metadata": {},
|
2949 | 2949 | "source": [
|
2950 | 2950 | "## Example applications\n",
|
|
2957 | 2957 | {
|
2958 | 2958 | "cell_type": "code",
|
2959 | 2959 | "execution_count": 11,
|
2960 |
| - "id": "fatal-reflection", |
| 2960 | + "id": "premier-syria", |
2961 | 2961 | "metadata": {},
|
2962 | 2962 | "outputs": [
|
2963 | 2963 | {
|
|
3005 | 3005 | },
|
3006 | 3006 | {
|
3007 | 3007 | "cell_type": "markdown",
|
3008 |
| - "id": "minus-sphere", |
| 3008 | + "id": "raised-breakfast", |
3009 | 3009 | "metadata": {},
|
3010 | 3010 | "source": [
|
3011 | 3011 | "We can also use the Xarray's \"stack\" method to transform these into 2D inputs (n_samples, n_features) suitable for other machine learning algorithms implemented in libraries such as [sklearn](https://scikit-learn.org/stable/) and [xgboost](https://xgboost.readthedocs.io/en/stable/). In this case, we are expecting 9 x 9 x 9 = 729 features total."
|
|
3014 | 3014 | {
|
3015 | 3015 | "cell_type": "code",
|
3016 | 3016 | "execution_count": 12,
|
3017 |
| - "id": "future-honey", |
| 3017 | + "id": "protecting-aside", |
3018 | 3018 | "metadata": {},
|
3019 | 3019 | "outputs": [
|
3020 | 3020 | {
|
|
3055 | 3055 | },
|
3056 | 3056 | {
|
3057 | 3057 | "cell_type": "markdown",
|
3058 |
| - "id": "russian-transaction", |
| 3058 | + "id": "vocal-roots", |
3059 | 3059 | "metadata": {},
|
3060 | 3060 | "source": [
|
3061 | 3061 | "## What's next?\n",
|
|
3075 | 3075 | "\n",
|
3076 | 3076 | "More thoughts on 1. and 2. can be found in [this issue](https://github.com/pangeo-data/xbatcher/issues/30). Interested users are welcomed to comment or submit other issues in GitHub. "
|
3077 | 3077 | ]
|
3078 |
| - }, |
3079 |
| - { |
3080 |
| - "cell_type": "code", |
3081 |
| - "execution_count": null, |
3082 |
| - "id": "analyzed-saudi", |
3083 |
| - "metadata": {}, |
3084 |
| - "outputs": [], |
3085 |
| - "source": [] |
3086 | 3078 | }
|
3087 | 3079 | ],
|
3088 | 3080 | "metadata": {
|
|
0 commit comments