|
61 | 61 | "source": [
|
62 | 62 | "To account for Feb-29 being present in some years, we'll construct a time vector to group by as \"mmm-dd\" string.\n",
|
63 | 63 | "\n",
|
64 |
| - "For more options, see https://strftime.org/" |
| 64 | + "```{seealso}\n", |
| 65 | + "For more options, see [this great website](https://strftime.org/).\n", |
| 66 | + "```" |
65 | 67 | ]
|
66 | 68 | },
|
67 | 69 | {
|
|
80 | 82 | "id": "6",
|
81 | 83 | "metadata": {},
|
82 | 84 | "source": [
|
83 |
| - "## map-reduce\n", |
| 85 | + "## First, `method=\"map-reduce\"`\n", |
84 | 86 | "\n",
|
85 | 87 | "The default\n",
|
86 | 88 | "[method=\"map-reduce\"](https://flox.readthedocs.io/en/latest/implementation.html#method-map-reduce)\n",
|
|
110 | 112 | "id": "8",
|
111 | 113 | "metadata": {},
|
112 | 114 | "source": [
|
113 |
| - "## Rechunking for map-reduce\n", |
| 115 | + "### Rechunking for map-reduce\n", |
114 | 116 | "\n",
|
115 | 117 | "We can split each chunk along the `lat`, `lon` dimensions to make sure the\n",
|
116 | 118 | "output chunk sizes are more reasonable\n"
|
|
139 | 141 | "But what if we didn't want to rechunk the dataset so drastically (note the 10x\n",
|
140 | 142 | "increase in tasks). For that let's try `method=\"cohorts\"`\n",
|
141 | 143 | "\n",
|
142 |
| - "## method=cohorts\n", |
| 144 | + "## `method=\"cohorts\"`\n", |
143 | 145 | "\n",
|
144 | 146 | "We can take advantage of patterns in the groups here \"day of year\".\n",
|
145 | 147 | "Specifically:\n",
|
|
271 | 273 | "id": "21",
|
272 | 274 | "metadata": {},
|
273 | 275 | "source": [
|
274 |
| - "And now our cohorts contain more than one group\n" |
| 276 | + "And now our cohorts contain more than one group, *and* there is a substantial reduction in number of cohorts **162 -> 12**\n" |
275 | 277 | ]
|
276 | 278 | },
|
277 | 279 | {
|
|
281 | 283 | "metadata": {},
|
282 | 284 | "outputs": [],
|
283 | 285 | "source": [
|
284 |
| - "preferrd_method, new_cohorts = flox.core.find_group_cohorts(\n", |
| 286 | + "preferred_method, new_cohorts = flox.core.find_group_cohorts(\n", |
285 | 287 | " labels=codes,\n",
|
286 | 288 | " chunks=(rechunked.chunksizes[\"time\"],),\n",
|
287 | 289 | ")\n",
|
|
295 | 297 | "id": "23",
|
296 | 298 | "metadata": {},
|
297 | 299 | "outputs": [],
|
| 300 | + "source": [ |
| 301 | + "preferred_method" |
| 302 | + ] |
| 303 | + }, |
| 304 | + { |
| 305 | + "cell_type": "code", |
| 306 | + "execution_count": null, |
| 307 | + "id": "24", |
| 308 | + "metadata": {}, |
| 309 | + "outputs": [], |
298 | 310 | "source": [
|
299 | 311 | "new_cohorts.values()"
|
300 | 312 | ]
|
301 | 313 | },
|
302 | 314 | {
|
303 | 315 | "cell_type": "markdown",
|
304 |
| - "id": "24", |
| 316 | + "id": "25", |
305 | 317 | "metadata": {},
|
306 | 318 | "source": [
|
307 | 319 | "Now the groupby reduction **looks OK** in terms of number of tasks but remember\n",
|
|
311 | 323 | {
|
312 | 324 | "cell_type": "code",
|
313 | 325 | "execution_count": null,
|
314 |
| - "id": "25", |
| 326 | + "id": "26", |
315 | 327 | "metadata": {},
|
316 | 328 | "outputs": [],
|
317 | 329 | "source": [
|
|
320 | 332 | },
|
321 | 333 | {
|
322 | 334 | "cell_type": "markdown",
|
323 |
| - "id": "26", |
| 335 | + "id": "27", |
| 336 | + "metadata": {}, |
| 337 | + "source": [ |
| 338 | + "flox's heuristics will choose `\"cohorts\"` automatically!" |
| 339 | + ] |
| 340 | + }, |
| 341 | + { |
| 342 | + "cell_type": "code", |
| 343 | + "execution_count": null, |
| 344 | + "id": "28", |
| 345 | + "metadata": {}, |
| 346 | + "outputs": [], |
| 347 | + "source": [ |
| 348 | + "flox.xarray.xarray_reduce(rechunked, day, func=\"mean\")" |
| 349 | + ] |
| 350 | + }, |
| 351 | + { |
| 352 | + "cell_type": "markdown", |
| 353 | + "id": "29", |
324 | 354 | "metadata": {},
|
325 | 355 | "source": [
|
326 | 356 | "## How about other climatologies?\n",
|
|
331 | 361 | {
|
332 | 362 | "cell_type": "code",
|
333 | 363 | "execution_count": null,
|
334 |
| - "id": "27", |
| 364 | + "id": "30", |
335 | 365 | "metadata": {},
|
336 | 366 | "outputs": [],
|
337 | 367 | "source": [
|
|
340 | 370 | },
|
341 | 371 | {
|
342 | 372 | "cell_type": "markdown",
|
343 |
| - "id": "28", |
| 373 | + "id": "31", |
344 | 374 | "metadata": {},
|
345 | 375 | "source": [
|
346 | 376 | "This looks great. Why?\n",
|
|
0 commit comments