-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
enhancementNew feature or requestNew feature or requestskillSkill creation or modificationSkill creation or modification
Description
Description
Create the core Zarr operations skill covering array creation, I/O, metadata, groups, indexing, data types, and synchronization for the zarr-data-format plugin.
Directory: plugins/zarr-data-format/skills/zarr-fundamentals/
Research Reference
Full research document: .agents/research-zarr-chunk-optimization-and-zarr-plugin.md
Files to Create
1. SKILL.md (400+ lines)
Frontmatter:
name: zarr-fundamentals
description: |
Use this skill when the user asks to "create a zarr array", "open a zarr store",
"read zarr data", "write zarr arrays", "manage zarr groups", "set zarr metadata",
"use zarr indexing", "understand zarr format", "work with zarr v3", or needs guidance
on core Zarr operations including array creation, hierarchical groups, metadata/attributes,
advanced indexing modes, data types, thread/process safety, and Zarr v2 vs v3 differences.Content must include:
-
Quick Reference: Essential Imports and Operations
import zarr import numpy as np # Create array (v3) z = zarr.create_array(store="data.zarr", shape=(10000, 10000), chunks=(1000, 1000), dtype='float32', zarr_format=3) # Open existing z = zarr.open_array("data.zarr", mode='r') # Create group hierarchy root = zarr.open_group("data.zarr", mode='w') grp = root.create_group("temperature") arr = grp.create_array("t2m", shape=(365, 721, 1440), chunks=(30, 90, 180)) # Metadata arr.attrs['units'] = 'K' arr.attrs['standard_name'] = 'air_temperature' # Inspect print(z.info) # Quick summary print(root.tree()) # Group hierarchy
-
Installation:
# Using pixi (recommended) pixi add zarr numpy numcodecs # Using pip pip install zarr[extra]
-
Zarr v2 vs v3 Differences:
Feature v2 v3 Metadata files .zarray,.zattrs,.zgroupzarr.jsonDefault compressor Blosc Zstd Sharding Not available Supported I/O model Synchronous Async (asyncio) Python requirement 3.8+ 3.11+ Format parameter zarr_format=2zarr_format=3Consolidated metadata Supported Not in spec (functionally works) Store API Legacy store classes New Store ABC -
Array Creation: All creation functions with parameters:
zarr.create_array()— primary creation functionzarr.zeros(),zarr.ones(),zarr.full(),zarr.empty()zarr.open_array()— open existing or create new- Key parameters:
shape,chunks,dtype,fill_value,compressor/compressors,shards,zarr_format
-
Group Management:
zarr.create_group(),zarr.open_group()- Nested navigation:
root['subgroup/array'] .tree()for visualization- Recommended hierarchy for scientific data
-
Metadata and Attributes:
.attrsdictionary interface- CF conventions for scientific data (
long_name,units,standard_name,coordinates,grid_mapping) .infofor quick summary.info_complete()for detailed metadata (slow for large arrays)
-
All 6 Indexing Modes with Examples:
- Basic slicing:
z[0, :],z[10:20, :] - Coordinate selection:
z.get_coordinate_selection([2, 5])orz.vindex[[0, 2], [1, 3]] - Mask selection:
z.get_mask_selection(mask_array)orz.vindex[boolean_mask] - Orthogonal indexing:
z.get_orthogonal_selection(([0, 2], slice(None)))orz.oindex[[0, 2], :] - Block indexing:
z.get_block_selection(1)orz.blocks[1](chunk-aligned) - Structured field selection:
z['field_name'],z.get_coordinate_selection([0, 2], fields=['foo'])
- Basic slicing:
-
Supported Data Types:
- Standard numeric: int8-64, uint8-64, float16-64, complex64-128
- Fixed-length strings:
'S6','U20' - Variable-length:
VLenUTF8(),VLenBytes() - Object arrays:
numcodecs.JSON(),MsgPack(),Pickle() - Ragged arrays:
VLenArray(int) - Categorical:
Categorize(labels, dtype=object) - Datetime/Timedelta:
'M8[D]','m8'
-
Thread/Process Safety:
# Thread-safe z = zarr.open_array('data.zarr', synchronizer=zarr.ThreadSynchronizer()) # Process-safe (file locking) sync = zarr.ProcessSynchronizer('data.sync') z = zarr.open_array('data.zarr', synchronizer=sync)
- Arrays thread-safe for concurrent reads/writes within same process
- Multi-process: requires
ProcessSynchronizeror separate chunks per process
-
Sharding (v3):
z = zarr.create_array(store={}, shape=(10000, 10000, 1000), shards=(1000, 1000, 1000), chunks=(100, 100, 100), dtype='uint8')
- Shards group multiple chunks into single storage objects
- Shard is minimum unit of writing
-
v2 → v3 Migration Notes
2. assets/zarr-quickstart.py
Complete quickstart script demonstrating:
- Creating a Zarr v3 array
- Writing data
- Reading data back
- Creating group hierarchy
- Setting metadata
- Inspecting with
.infoand.tree() - Basic and advanced indexing
- Compression configuration
3. references/PATTERNS.md (6+ patterns)
- Creating a Hierarchical Scientific Data Store — group hierarchy with CF-compliant metadata
- Opening and Reading Remote Zarr Data — URL access patterns
- Appending Data to Existing Arrays — resize + write, append_dim via xarray
- Advanced Indexing Patterns — orthogonal, block, coordinate selection use cases
- Using Shards in Zarr v3 — configuration and trade-offs
- Concurrent Access with Synchronizers — thread and process safety patterns
4. references/EXAMPLES.md (4+ examples)
- Creating a Scientific Dataset from Scratch — climate-like data with dimensions, coords, metadata
- Reading and Querying a Remote Zarr Store — public data from Pangeo/AWS
- Building a Hierarchical Store with Groups — multi-variable, multi-level
- Working with Structured and Ragged Arrays — custom dtypes, variable-length data
5. references/COMMON_ISSUES.md (5+ issues)
- Zarr v2 vs v3 API confusion — common API differences and how to handle
- Metadata not persisting — need explicit
.attrsassignment - Memory errors with large arrays — chunking not configured
- Concurrent write corruption — missing synchronizer
.info_complete()slow on large arrays — use.infofor quick checks
Acceptance Criteria
- SKILL.md is 400+ lines with comprehensive Zarr fundamentals
- Covers both v2 and v3 APIs with clear distinctions
- All 6 indexing modes documented with complete code examples
- All data types documented
- Thread/process safety documented
- Quickstart script works end-to-end
- Follows the skill pattern from
plugins/scientific-domain-applications/skills/xarray-for-multidimensional-data/
Dependencies
- Depends on feat: Create zarr-data-format plugin scaffold #67 (plugin scaffold)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestskillSkill creation or modificationSkill creation or modification