feat: Create zarr-cloud-architect agent

## Description

Create the cloud storage specialist agent for the **zarr-data-format** plugin. This agent provides expert guidance on integrating Zarr with AWS S3, Google Cloud Storage, and Azure Blob Storage.

**File:** `plugins/zarr-data-format/agents/zarr-cloud-architect.md`

## Research Reference

Full research document: `.agents/research-zarr-chunk-optimization-and-zarr-plugin.md`

## Agent Frontmatter

```yaml
name: zarr-cloud-architect
description: |
  Specialist in integrating Zarr with cloud object stores (AWS S3, Google Cloud Storage, Azure Blob Storage). Expert in storage backend selection (fsspec, obstore, Icechunk), authentication configuration, metadata consolidation for cloud performance, and cloud-specific Zarr optimization.

  Use this agent when the user asks to "store zarr on S3", "read zarr from GCS", "configure azure blob for zarr", "set up cloud zarr store", "optimize zarr for cloud", "use obstore with zarr", "configure icechunk", or needs cloud-specific Zarr guidance.

  <example>
  Context: User needs to set up S3 access
  user: "I need to read a public Zarr dataset from S3 and write processed results to my own S3 bucket"
  assistant: "I'll use the zarr-cloud-architect to set up both anonymous read access and authenticated write access to S3."
  <commentary>
  Cloud Zarr access requires proper backend configuration, credentials, and potentially different stores for read vs write.
  </commentary>
  </example>

  <example>
  Context: User choosing between storage backends
  user: "Should I use fsspec or obstore to access my Zarr data on GCS?"
  assistant: "I'll invoke the zarr-cloud-architect to compare the backends based on your performance and compatibility requirements."
  <commentary>
  Backend selection involves trade-offs between performance (obstore/Rust), ecosystem maturity (fsspec), and feature needs.
  </commentary>
  </example>

  <example>
  Context: User needs versioned Zarr storage
  user: "I need ACID transactions and version control for my Zarr data on S3"
  assistant: "I'll use the zarr-cloud-architect to guide you through setting up Icechunk as your storage engine."
  <commentary>
  Icechunk provides versioning, ACID transactions, and time-travel for Zarr data on cloud stores.
  </commentary>
  </example>
model: inherit
color: green
skills:
  - cloud-storage-backends
  - zarr-fundamentals
```

## Agent Body Content Requirements (500-800+ lines)

### 1. Purpose
Cloud storage integration specialist for Zarr, covering all major cloud providers and storage backends.

### 2. Storage Backend Expertise

**fsspec Ecosystem:**
- `s3fs` — AWS S3 (most mature, widely used)
- `gcsfs` — Google Cloud Storage
- `adlfs` — Azure Data Lake / Blob Storage
- `aiohttp` — HTTP/HTTPS read-only access
- `FsspecStore` — Zarr's wrapper for fsspec filesystems
- Caching protocols: `simplecache::`, `filecache::`, `blockcache::`

**obstore (Rust-based):**
- Built on Apache Arrow's `object_store` crate
- `obstore.store.S3Store`, `GCSStore`, `AzureStore`
- `zarr.store.ObjectStore(obstore_store)` — Zarr integration
- Performance: can fully saturate EC2↔S3 network bandwidth
- Smaller ecosystem than fsspec but significantly faster

**Icechunk:**
- Versioned storage engine for Zarr (Rust-based)
- ACID transactions for concurrent writes
- Time-travel: read data as of any previous version
- `IcechunkStore.open_or_create(storage=StorageConfig.s3_from_env(...))`
- Released 1.0 in July 2025
- Integrates with VirtualiZarr for zero-copy ingestion

**Backend Selection Matrix:**

| Need | Recommended Backend |
|------|-------------------|
| Maximum throughput | obstore |
| Ecosystem compatibility | fsspec (s3fs/gcsfs) |
| Versioning / ACID | Icechunk |
| Simple URL access | fsspec URL shorthand |
| Caching for repeated reads | fsspec with simplecache:: |

### 3. Cloud Provider Configuration

**AWS S3:**
- Credentials: AWS CLI profile, environment variables, IAM roles, anonymous
- Regions: `endpoint_url` for non-default regions
- Anonymous access: `storage_options={'anon': True}`
- fsspec: `s3fs.S3FileSystem(anon=True, region_name='us-east-1')`
- obstore: `obstore.store.S3Store(bucket, prefix=..., skip_signature=True)`

**Google Cloud Storage:**
- Authentication: service account JSON, application default credentials, anonymous
- Project ID requirement for authenticated access
- `gcsfs.GCSFileSystem(project='my-project', token=None)` for anonymous
- `obstore.store.GCSStore(bucket, prefix=..., skip_signature=True)` for anonymous

**Azure Blob Storage:**
- Connection strings, SAS tokens, managed identity, anonymous
- `adlfs.AzureBlobFileSystem(account_name='...', account_key='...')`
- Container + blob path structure

### 4. Performance Optimization

- **Metadata consolidation** — critical for cloud (reduces N metadata reads to 1):
  ```python
  zarr.consolidate_metadata(store)
  root = zarr.open_consolidated(store)  # v2
  ```
  - Note: Not yet in v3 spec but functionally useful
  
- **Concurrency tuning** — per cloud provider:
  ```python
  zarr.config.set({'async.concurrency': 128})
  ```
  - S3: 64-128 typically optimal
  - GCS: 32-64 typically optimal
  - Azure: 32-64 typically optimal

- **Caching layers** — for repeated reads:
  ```python
  g = zarr.open_group("simplecache::s3://bucket/data.zarr",
                      storage_options={"s3": {"anon": True}})
  ```

### 5. Cloud Data Catalogs
- Microsoft Planetary Computer
- AWS Registry of Open Data (filter by "Zarr")
- Google Cloud marketplace datasets
- Pangeo-Forge Data Catalog
- How to discover and access public Zarr datasets

### 6. Security Patterns
- IAM roles and policies for S3
- Service accounts for GCS
- Managed identity for Azure
- Pre-signed URLs for temporary access
- Cross-region access considerations and costs

## Acceptance Criteria

- [ ] Agent file is 500-800+ lines
- [ ] Covers AWS S3, GCS, and Azure Blob in depth with configuration examples
- [ ] Compares fsspec vs obstore vs Icechunk with clear guidance on when to use each
- [ ] Includes authentication patterns for all three cloud providers
- [ ] Covers metadata consolidation for cloud performance
- [ ] Documents concurrency tuning per cloud provider
- [ ] Includes caching layer configuration
- [ ] Follows the agent pattern from existing plugins

## Dependencies

- Depends on #67 (plugin scaffold)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Create zarr-cloud-architect agent #69

Description

Research Reference

Agent Frontmatter

Agent Body Content Requirements (500-800+ lines)

1. Purpose

2. Storage Backend Expertise

3. Cloud Provider Configuration

4. Performance Optimization

5. Cloud Data Catalogs

6. Security Patterns

Acceptance Criteria

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Need	Recommended Backend
Maximum throughput	obstore
Ecosystem compatibility	fsspec (s3fs/gcsfs)
Versioning / ACID	Icechunk
Simple URL access	fsspec URL shorthand
Caching for repeated reads	fsspec with simplecache::

feat: Create zarr-cloud-architect agent #69

Description

Description

Research Reference

Agent Frontmatter

Agent Body Content Requirements (500-800+ lines)

1. Purpose

2. Storage Backend Expertise

3. Cloud Provider Configuration

4. Performance Optimization

5. Cloud Data Catalogs

6. Security Patterns

Acceptance Criteria

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions