feat: Create chunk-strategy skill

## Description

Create the skill covering chunk size heuristics, formulas, decision trees, and real-world optimization case studies for the **zarr-chunk-optimization** plugin.

**Directory:** `plugins/zarr-chunk-optimization/skills/chunk-strategy/`

## Research Reference

Full research document: `.agents/research-zarr-chunk-optimization-and-zarr-plugin.md`

## Files to Create

### 1. SKILL.md (300+ lines)

**Frontmatter:**
```yaml
name: chunk-strategy
description: |
  Use this skill when the user asks to "choose chunk sizes", "optimize zarr chunking",
  "determine chunk dimensions", "analyze access patterns for chunking", "calculate optimal
  chunks", or needs guidance on Zarr chunk size selection, access pattern analysis, chunk
  alignment with Dask, sharding strategy, or the trade-offs between temporal and spatial
  chunking approaches.
```

**Content must include:**

- **Quick Reference Card** — chunk size formula table:

  | Metric | Value | Source |
  |--------|-------|--------|
  | Minimum uncompressed chunk | 1 MB | Zarr docs |
  | Optimal range (cloud) | 100 MB - 1 GB | Dask best practices |
  | S3 byte-range sweet spot | 8-16 MB | AWS S3 best practices |
  | Max task graph | 10K-100K chunks | Dask guidelines |
  | Parallelism target | chunks >= 2 * workers | Dask guidelines |
  | Total concurrency | `dask_threads * zarr_async_concurrency` | Zarr docs |
  | Dask alignment | Dask chunks = N * Zarr chunks | Dask docs |
  | Shard reduction | shard_volume / chunk_volume | Zarr v3 spec |

- **Decision Tree for Chunk Strategy Selection** — flowchart covering:
  - Cloud vs local storage?
  - Primary access pattern (temporal/spatial/mixed)?
  - Using Dask? → alignment rules
  - Zarr v3 available? → sharding option
  - Data sparse? → `write_empty_chunks=False`

- **Core Concepts:**
  - Chunk alignment with access patterns (the 63x performance evidence)
  - The fundamental trade-off: time-series vs spatial access (Nguyen et al. 1405x/713x)
  - Versatile middle-range strategies
  - Why minimum 1 MB matters for cloud (HTTP request overhead 10-100ms)

- **Sharding Section (Zarr v3):**
  - When to use: many small logical chunks needed but object count must be low
  - Sizing: 100GB array / 1MB chunks = 100K objects; with 1GB shards = 100 objects
  - Memory constraint: entire shard must fit in writer memory
  - Code example:
    ```python
    z = zarr.create_array(store={}, shape=(10000, 10000, 1000),
                          shards=(1000, 1000, 1000),
                          chunks=(100, 100, 100), dtype='uint8')
    ```

- **Access Pattern Taxonomy with Recommendations:**

  | Access Pattern | Chunk Strategy | Example |
  |---|---|---|
  | Single lat/lon time-series | Maximize time dim, minimize spatial | `(T, 10, 10)` |
  | Spatial maps at single time | Minimize time dim, maximize spatial | `(1, Y, X)` |
  | Mixed access | Balanced across all dims | `(30, 90, 180)` |
  | Ensemble/scenario queries | Include scenario in fast dims | `(1, 1, 2, 3, 1, 1)` |

- **Empty Chunk Optimization:**
  - `write_empty_chunks=False` (default): skips fill-value chunks — benchmark: 0.25s
  - `write_empty_chunks=True`: writes all chunks — benchmark: 0.48s (nearly 2x slower for sparse data)

- **Memory Layout:**
  - `config={'order': 'C'}` (row-major) vs `config={'order': 'F'}` (column-major / Fortran)
  - Different layouts provide different compression ratios depending on data correlation structure

- **Concurrency Configuration:**
  ```python
  zarr.config.set({'async.concurrency': 128})
  # Default: 10 (conservative). Increase for cloud, decrease for local.
  # WARNING: total_concurrency = dask_threads * zarr_async_concurrency
  ```

- Links to `references/PATTERNS.md`, `references/EXAMPLES.md`, `references/COMMON_ISSUES.md`

### 2. assets/chunk-calculator.py

Python script that calculates recommended chunk sizes given:
- **Inputs:**
  - `shape`: tuple — array shape (e.g., `(3650, 721, 1440)`)
  - `dtype`: string — data type (e.g., `'float32'`)
  - `access_pattern`: string — `'temporal'`, `'spatial'`, or `'balanced'`
  - `target_chunk_mb`: float — target chunk size in MB (default: 100)
  - `min_chunk_mb`: float — minimum chunk size in MB (default: 1)
  - `num_workers`: int — number of Dask workers (default: 4)
- **Outputs:**
  - Recommended chunk shape (tuple)
  - Estimated chunk size in MB
  - Estimated number of chunks
  - Estimated task graph size
  - Whether sharding is recommended
- Must be a working, runnable Python script using only numpy for calculations

### 3. assets/chunk-decision-tree.md

ASCII/markdown decision tree in visual format:

```
Is this for cloud or local storage?
├── Cloud → Minimum 1 MB chunks, target 100 MB+
│   ├── Primary access pattern?
│   │   ├── Temporal (time-series at locations)
│   │   │   → Maximize time dimension, minimize spatial
│   │   ├── Spatial (maps at time steps)
│   │   │   → Minimize time dimension, maximize spatial
│   │   └── Mixed/Unknown
│   │       → Balanced chunks across all dimensions
│   ├── Using Dask?
│   │   ├── Yes → Ensure Dask chunks are integer multiples of Zarr chunks
│   │   │       → Watch total_concurrency = dask_threads * zarr_async_concurrency
│   │   └── No → Focus on Zarr-level chunk optimization
│   ├── Zarr v3 available?
│   │   ├── Yes → Consider sharding if need small logical chunks
│   │   └── No → Optimize chunk count directly
│   └── Data sparse?
│       ├── Yes → Set write_empty_chunks=False (2x write speedup)
│       └── No → Default behavior
└── Local → More flexible sizing, 1-10 MB often sufficient
```

### 4. references/PATTERNS.md

Must include 6+ patterns, each with: description, when to use, chunk formula, code example, trade-offs:

1. **Temporal-First Chunking** — climate/weather time-series (e.g., `(3650, 10, 10)`)
2. **Spatial-First Chunking** — map generation, regional analysis (e.g., `(1, 721, 1440)`)
3. **Balanced Spatio-Temporal** — mixed workloads (e.g., `(30, 90, 180)`)
4. **Ensemble/Scenario Chunking** — multi-scenario datasets (e.g., `(1, 1, 2, 3, 1, 1)`)
5. **Sharded Chunks (Zarr v3)** — many small logical chunks within larger shards
6. **Dask-Aligned Chunking** — ensuring Dask chunks are multiples of Zarr chunks

### 5. references/EXAMPLES.md

Must include 4+ complete case studies:

1. **Climate Dataset Optimization** — shape (3650, 721, 1440), temporal vs spatial access, benchmarked results showing 63x difference
2. **Satellite Imagery** — high-resolution spatial data, tile-aligned chunks
3. **Ensemble Weather Forecasts** — scenario dimension chunking
4. **Pluvial Flooding Dataset** — shape (4, 1, 6, 3, 6000, 6000), pinned-location queries, recommended chunks `[1, 1, 2, 3, 1, 1]`

Each example: dataset description, shape, access patterns, chunking strategy, rationale, code snippet.

### 6. references/COMMON_ISSUES.md

Must include 6+ issues:

1. **Chunks too small for cloud storage (< 1 MB)** → excessive HTTP requests, 10-100ms overhead per request
2. **Chunks too large** → excessive memory usage and unnecessary data transfer
3. **Chunk orientation mismatched with access pattern** → orders of magnitude slower (up to 1405x)
4. **Dask chunks not aligned with Zarr chunks** → redundant decompression
5. **Total concurrency overflow** → cloud storage throttling (dask_threads * zarr_async_concurrency)
6. **Sparse data with `write_empty_chunks=True`** → 2x slower writes

Each issue: symptoms, cause, solution with code, prevention.

## Acceptance Criteria

- [ ] SKILL.md is 300+ lines with complete chunk sizing knowledge
- [ ] chunk-calculator.py is a working, runnable Python script
- [ ] chunk-decision-tree.md provides clear visual decision guidance
- [ ] PATTERNS.md covers 6+ distinct chunking patterns with code examples
- [ ] EXAMPLES.md has 4+ real-world case studies with concrete numbers
- [ ] COMMON_ISSUES.md covers 6+ common mistakes with solutions
- [ ] Follows the skill pattern from `plugins/scientific-domain-applications/skills/xarray-for-multidimensional-data/`

## Dependencies

- Depends on #61 (plugin scaffold)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Create chunk-strategy skill #64

Description

Research Reference

Files to Create

1. SKILL.md (300+ lines)

2. assets/chunk-calculator.py

3. assets/chunk-decision-tree.md

4. references/PATTERNS.md

5. references/EXAMPLES.md

6. references/COMMON_ISSUES.md

Acceptance Criteria

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metric	Value	Source
Minimum uncompressed chunk	1 MB	Zarr docs
Optimal range (cloud)	100 MB - 1 GB	Dask best practices
S3 byte-range sweet spot	8-16 MB	AWS S3 best practices
Max task graph	10K-100K chunks	Dask guidelines
Parallelism target	chunks >= 2 * workers	Dask guidelines
Total concurrency	`dask_threads * zarr_async_concurrency`	Zarr docs
Dask alignment	Dask chunks = N * Zarr chunks	Dask docs
Shard reduction	shard_volume / chunk_volume	Zarr v3 spec

Access Pattern	Chunk Strategy	Example
Single lat/lon time-series	Maximize time dim, minimize spatial	`(T, 10, 10)`
Spatial maps at single time	Minimize time dim, maximize spatial	`(1, Y, X)`
Mixed access	Balanced across all dims	`(30, 90, 180)`
Ensemble/scenario queries	Include scenario in fast dims	`(1, 1, 2, 3, 1, 1)`

feat: Create chunk-strategy skill #64

Description

Description

Research Reference

Files to Create

1. SKILL.md (300+ lines)

2. assets/chunk-calculator.py

3. assets/chunk-decision-tree.md

4. references/PATTERNS.md

5. references/EXAMPLES.md

6. references/COMMON_ISSUES.md

Acceptance Criteria

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions