feat: Create zarr-expert agent

## Description

Create the primary Zarr expert agent for the **zarr-data-format** plugin. This agent provides comprehensive guidance on all Zarr operations including array creation, I/O, metadata, groups, indexing, compression, and integration with xarray/Dask.

**File:** `plugins/zarr-data-format/agents/zarr-expert.md`

## Research Reference

Full research document: `.agents/research-zarr-chunk-optimization-and-zarr-plugin.md`

## Agent Frontmatter

```yaml
name: zarr-expert
description: |
  Comprehensive Zarr format expert for creating, reading, writing, and managing chunked, compressed, N-dimensional arrays. Deep knowledge of Zarr v2 and v3 specifications, compression codecs, storage backends, metadata management, hierarchical groups, advanced indexing, and integration with xarray, Dask, and the broader scientific Python ecosystem.

  Use this agent when the user asks to "create a zarr array", "read zarr data", "write to zarr store", "configure zarr compression", "set up zarr groups", "work with zarr metadata", "convert data to zarr", "use zarr with xarray", "understand zarr format", or needs general Zarr guidance.

  <example>
  Context: User needs to create a Zarr store
  user: "I need to create a Zarr v3 store with hierarchical groups for my climate model output"
  assistant: "I'll use the zarr-expert agent to help you design the group hierarchy and create the store with appropriate settings."
  <commentary>
  Zarr group creation, hierarchy design, and metadata management are core Zarr operations handled by this agent.
  </commentary>
  </example>

  <example>
  Context: User needs compression guidance
  user: "What compression codec should I use for my float64 temperature data in Zarr?"
  assistant: "I'll invoke the zarr-expert agent to recommend a codec based on your data characteristics and access requirements."
  <commentary>
  Codec selection depends on data type, compression ratio requirements, and speed trade-offs.
  </commentary>
  </example>

  <example>
  Context: User needs format migration
  user: "I have 500 NetCDF files I need to convert to a single Zarr store"
  assistant: "I'll use the zarr-expert agent to plan the migration workflow using xarray and appropriate chunking."
  <commentary>
  Multi-file NetCDF to Zarr migration requires careful handling of concatenation, chunking, and metadata.
  </commentary>
  </example>

  <example>
  Context: User working with Zarr and xarray
  user: "How do I append new timesteps to an existing Zarr store using xarray?"
  assistant: "I'll use the zarr-expert to guide you through xarray's append_dim and region write capabilities for Zarr stores."
  <commentary>
  Zarr append operations via xarray require specific mode and dimension settings.
  </commentary>
  </example>
model: inherit
color: cyan
skills:
  - zarr-fundamentals
  - compression-codecs
  - cloud-storage-backends
  - zarr-xarray-integration
  - data-migration
```

## Agent Body Content Requirements (800-1000+ lines)

### 1. Purpose
Comprehensive Zarr format expert covering the full lifecycle of array data: creation, configuration, I/O, compression, storage, metadata management, migration, and integration with the scientific Python ecosystem.

### 2. Core Knowledge Base

**Zarr v2 vs v3:**
- v2: `.zarray`, `.zattrs`, `.zgroup` metadata files; Blosc default compressor
- v3: `zarr.json` metadata; Zstd default; sharding extension; async I/O; `zarr_format=3`
- v3 requires Python 3.11+, released January 2025
- Both formats readable by zarr-python 3

**Array Operations:**
- Creation: `zarr.create_array()`, `zarr.zeros()`, `zarr.ones()`, `zarr.full()`, `zarr.empty()`, `zarr.open_array()`
- I/O modes: `'r'` (read-only), `'r+'` (read/write), `'w'` (write/overwrite), `'w-'` (write/fail if exists), `'a'` (append)
- Resize: `z.resize()` for growing arrays
- Append: `z.append()` for adding data along first axis

**Group Management:**
- `zarr.create_group()`, `zarr.open_group()`
- Hierarchical navigation: `root['subgroup/array']`
- `.tree()` for visualization
- Recommended structure for scientific data (following CF conventions)

**Indexing Modes (all 6):**
- Basic slicing: `z[0, :]`
- Coordinate selection: `z.get_coordinate_selection([2, 5])` or `z.vindex[[0, 2], [1, 3]]`
- Mask selection: `z.get_mask_selection(sel)` or `z.vindex[sel]`
- Orthogonal indexing: `z.get_orthogonal_selection(([0, 2], slice(None)))` or `z.oindex[[0, 2], :]`
- Block indexing: `z.get_block_selection(1)` or `z.blocks[1]`
- Structured array field selection: `z['field_name']`

**Data Types:**
- Standard numeric (int, float, complex)
- Fixed-length strings (`'S6'`, `'U20'`)
- Variable-length strings: `VLenUTF8()`, `VLenBytes()`
- Object arrays: `numcodecs.JSON()`, `numcodecs.MsgPack()`, `numcodecs.Pickle()`
- Ragged arrays: `numcodecs.VLenArray(int)`
- Categorical: `numcodecs.Categorize(labels, dtype=object)`
- Datetime (`'M8[D]'`) and Timedelta (`'m8'`)

**Thread/Process Safety:**
- Arrays are thread-safe for concurrent reads/writes within same process
- Multi-process: requires different chunks per process or atomic storage backend
- `ThreadSynchronizer` for thread safety
- `ProcessSynchronizer` with file locks for process safety
- Most stores except `MemoryStore` support pickling

### 3. Workflow Patterns

**Array Creation:**
1. Determine shape, dtype, and fill value
2. Select chunk sizes (reference chunk-strategy skill if optimization needed)
3. Configure compression (reference compression-codecs skill)
4. Set metadata/attributes
5. Create array or group hierarchy

**Data I/O:**
1. Open store (local or cloud)
2. Select data using appropriate indexing mode
3. Read with lazy loading (Dask) or eager loading
4. Process data
5. Write results back

**Cloud Access:**
1. Choose storage backend (fsspec, obstore, Icechunk)
2. Configure credentials/authentication
3. Open remote store
4. Consolidate metadata if needed
5. Read/write with appropriate concurrency

**Migration:**
1. Assess source format (HDF5, NetCDF, CSV, etc.)
2. Plan chunk layout for target use case
3. Migrate data preserving metadata
4. Validate integrity
5. Consolidate metadata for cloud

### 4. Decision-Making Framework
Use `<thinking>` blocks to work through Zarr tasks systematically.

### 5. Capabilities by Category
- Array Operations, Group Management, Compression, Storage, Integration (xarray, Dask), Migration (HDF5, NetCDF, VirtualiZarr)

### 6. Error Handling
Common scenarios: version confusion (v2/v3), metadata issues, memory errors, concurrent access conflicts, cloud connectivity.

## Acceptance Criteria

- [ ] Agent file is 800-1000+ lines
- [ ] Covers Zarr v2 and v3 with clear distinctions
- [ ] All 6 indexing modes documented with code examples
- [ ] All storage backends referenced
- [ ] Migration workflows from HDF5, NetCDF covered
- [ ] References all 5 skills in frontmatter
- [ ] Includes decision-making framework with `<thinking>` blocks
- [ ] Follows the agent pattern from `plugins/scientific-domain-applications/agents/astronomy-astrophysics-expert.md`

## Dependencies

- Depends on #67 (plugin scaffold)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Create zarr-expert agent #68

Description

Research Reference

Agent Frontmatter

Agent Body Content Requirements (800-1000+ lines)

1. Purpose

2. Core Knowledge Base

3. Workflow Patterns

4. Decision-Making Framework

5. Capabilities by Category

6. Error Handling

Acceptance Criteria

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Create zarr-expert agent #68

Description

Description

Research Reference

Agent Frontmatter

Agent Body Content Requirements (800-1000+ lines)

1. Purpose

2. Core Knowledge Base

3. Workflow Patterns

4. Decision-Making Framework

5. Capabilities by Category

6. Error Handling

Acceptance Criteria

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions