CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a Python package for training and evaluating SLEAP (Social LEAP Estimates Animal Poses) models for root tracking, with integrated Weights & Biases (W&B) logging. The codebase provides a wrapper around SLEAP and W&B for model training, evaluation, and experiment management.

Key Dependencies

sleap - Core pose estimation library
wandb - Experiment tracking and model management
jupyterlab - For interactive notebooks
matplotlib, seaborn - Visualization
pandas, numpy - Data manipulation

Development Setup

1. Install SLEAP

Windows/Linux:

conda create -y -n sleap -c conda-forge -c nvidia -c sleap/label/dev -c sleap -c anaconda sleap=1.4.1
conda activate sleap

macOS:

conda create -y -n sleap -c conda-forge -c anaconda -c sleap sleap=1.4.1
conda activate sleap

PyPI (alternative):

pip install sleap[pypi]==1.4.1

2. Setup Development Environment

# Install development dependencies
pip install -e .[dev]

# Login to W&B
wandb login

3. Environment Activation

Setting Environment Variables

Before running commands, set these environment variables based on your system:

Windows:

# Set these variables to match your system
set SLEAP_REPO_PATH=C:\path\to\sleap-roots-training
set CONDA_PATH=C:\path\to\miniforge3  # or Anaconda3, Miniconda3, etc.
set SLEAP_ENV_NAME=sleap  # or sleap_v1.4.1, or your custom env name

# Example with typical values:
set SLEAP_REPO_PATH=C:\Users\%USERNAME%\repos\sleap-roots-training
set CONDA_PATH=C:\Users\%USERNAME%\miniforge3
set SLEAP_ENV_NAME=sleap_v1.4.1

Linux/macOS:

# Set these variables to match your system
export SLEAP_REPO_PATH=/path/to/sleap-roots-training
export CONDA_PATH=/path/to/miniforge3  # or anaconda3, miniconda3, etc.
export SLEAP_ENV_NAME=sleap  # or sleap_v1.4.1, or your custom env name

# Example with typical values:
export SLEAP_REPO_PATH=$HOME/repos/sleap-roots-training
export CONDA_PATH=$HOME/miniforge3
export SLEAP_ENV_NAME=sleap_v1.4.1

Activation Commands

Windows:

cd "%SLEAP_REPO_PATH%" && source %CONDA_PATH%/etc/profile.d/conda.sh && conda activate %SLEAP_ENV_NAME%

Linux/macOS:

cd "$SLEAP_REPO_PATH" && source $CONDA_PATH/etc/profile.d/conda.sh && conda activate $SLEAP_ENV_NAME

Note: Adjust paths based on your conda installation:

miniforge3 for Miniforge users
anaconda3 for Anaconda users
miniconda3 for Miniconda users
Custom path if installed elsewhere

4. Development Notes

Work from repository root so sleap_roots_training imports work correctly
Use separate branches for different experiments
Follow the testing guidelines in this document

Common Commands

Installation

pip install -e .[dev]  # Install in development mode

Testing

make test              # Run all tests with coverage
make test-fast         # Run tests without coverage (faster)
make test-unit         # Run only unit tests
make test-imports      # Test imports only
pytest tests/test_config.py -v  # Test specific module

Development Tools

make format           # Format code with black
make lint            # Check code formatting
make clean           # Clean build artifacts
make build           # Build package
make ci              # Run full CI pipeline locally

# Manual formatting (when make is not available)
python -m black <file_paths>  # Format specific files
python -m black tests/        # Format all test files

Architecture

Core Modules

sleap_roots_training/config.py: Configuration management with YAML file support. Handles W&B project settings, experiment names, and registry configuration.
sleap_roots_training/train.py: Main training orchestration. Contains the primary main() function that processes training runs, handles W&B logging, and manages model artifacts. Supports both single training runs and parameter sweeps.
sleap_roots_training/models.py: Model artifact management. Functions for fetching, linking, and promoting models in W&B registries.
sleap_roots_training/evaluate.py: Model evaluation and visualization. Contains functions for generating predictions, creating visualizations, and evaluating model performance against test datasets.
sleap_roots_training/datasets.py: Dataset artifact creation and management for W&B.

Configuration System

The configuration is managed through config.yaml in the main module directory. Key configuration parameters:

project_name: W&B project name
entity_name: W&B entity/organization
experiment_name: Current experiment identifier
registry: W&B model registry name
collection_name: Registry collection name

Configuration can be updated programmatically using functions in config.py.

Training Workflow

Data Preparation: Train/test splits are managed via CSV files containing paths to configuration files
Configuration: Each training version has an initial_config_modified_v00{version}.json file
Training Execution: Uses sleap-train command with configuration files
Artifact Logging: Models are logged to W&B with evaluation metrics and visualizations
Registry Management: Models can be automatically linked to W&B model registries

Notebook Integration

The repository contains numerous Jupyter notebooks following naming patterns:

YYYYMMDD_experiment_description.ipynb - Main experiment notebooks
helper_notebooks/ - Reusable notebook templates

Always save copies of helper notebooks with experiment-specific names and work on separate branches.

Key Functions

Training (`train.py`)

main(): Main entry point for training runs
run_single_training(): Execute single training run
run_sweep_training(): Execute W&B parameter sweeps
log_model_artifact_with_evals(): Log trained models with evaluations

Evaluation (`evaluate.py`)

evaluate_model(): Evaluate model against test dataset
fetch_sweep_metrics(): Retrieve metrics from W&B sweeps
predictions_viz(): Generate prediction visualizations
fetch_metrics_from_sweep_pattern(): [NEW] Find and fetch metrics from sweeps by name pattern
group_sweep_runs_retroactively(): [NEW] Retroactively group sweep runs for organization

Configuration (`config.py`)

load_config(): Load configuration from YAML
update_config(): Update specific configuration values
CONFIG: Global configuration dictionary

Testing

Comprehensive test suite with high code coverage (target: 80%+) using pytest:

Running Tests

# Run all tests with coverage
pytest --cov=sleap_roots_training --cov-report=term-missing --cov-report=html

# Run tests without coverage (faster)
pytest -v

# Run specific test file
pytest tests/test_config.py -v

# Run tests with specific markers
pytest -m "unit" -v
pytest -m "integration" -v

# Using Makefile shortcuts
make test          # Run all tests with coverage
make test-fast     # Run tests without coverage
make test-unit     # Run only unit tests
make test-imports  # Test imports only

Test Structure

Test Organization Guidelines:

One-to-one mapping: For every module sleap_roots_training/<module>.py, there is a corresponding test file tests/test_<module>.py
Centralized fixtures: All fixtures are defined in tests/fixtures.py and imported by test modules
Real test data: Test data is stored in tests/data/ directory with actual SLEAP experiment files

Test Files:

tests/test_config.py - Configuration management tests
tests/test_train.py - Training workflow tests (unit tests with mocking)
tests/test_evaluate.py - Evaluation and metrics tests
tests/test_models.py - Model artifact management tests
tests/test_datasets.py - Dataset artifact tests
tests/test_sweep_integration.py - Sweep integration tests with real data
tests/test_imports.py - Basic import verification
tests/conftest.py - Shared fixtures and test configuration
tests/fixtures.py - Reusable test fixtures for real data
tests/data/ - Real test data including SLEAP experiment files

Test Fixtures

Reusable fixtures are defined in tests/fixtures.py for use across all test modules:

sweep_experiment_data - Real SLEAP experiment data with CSV, config, and SLEAP files
temp_experiment_dir - Temporary copy of experiment data for safe testing
realistic_sweep_config - Full W&B sweep configuration with multiple parameters
small_sweep_config - Minimal sweep configuration for faster testing
mock_models_dir - Mock directory structure for testing model discovery
environment_config - Test environment configuration values

Usage in tests:

# Import fixtures at top of test file
from tests.fixtures import sweep_experiment_data, temp_experiment_dir

# Use fixtures in test functions
def test_my_function(sweep_experiment_data, temp_experiment_dir):
    # Access real SLEAP data
    config = sweep_experiment_data["config"]
    df = sweep_experiment_data["df"]
    
    # Use temporary directory for safe testing
    temp_config = temp_experiment_dir["config"]
    temp_csv = temp_experiment_dir["csv_path"]

Cross-platform compatibility:

All fixtures handle Windows/Linux/macOS path differences
Use forward slashes in paths to avoid Windows backslash issues
Temporary directories are automatically cleaned up after tests

Test Coverage

Code coverage is measured and reported for all modules
Minimum coverage threshold: 80%
Coverage reports generated in HTML format (htmlcov/)
XML coverage reports for CI integration (coverage.xml)

Test Development Workflow

When developing or modifying tests, follow this workflow:

Activate environment: Use the correct conda environment activation

# Windows:
cd "%SLEAP_REPO_PATH%" && source %CONDA_PATH%/etc/profile.d/conda.sh && conda activate %SLEAP_ENV_NAME%

# Linux/macOS:
cd "$SLEAP_REPO_PATH" && source $CONDA_PATH/etc/profile.d/conda.sh && conda activate $SLEAP_ENV_NAME

Run tests: Execute tests to check current status

python -m pytest --cov=sleap_roots_training --cov-report=term-missing tests/test_<module>.py

Format code: Always format test files before committing

python -m black tests/test_<module>.py tests/fixtures.py

Verify formatting: Ensure code follows project standards
```
make lint  # or python -m black --check tests/
```

Test Categories

Unit Tests (test_train.py):

Comprehensive mocking of external dependencies
Fast execution with isolated testing
Tests individual function behavior

Integration Tests (test_sweep_integration.py):

Uses real SLEAP experiment data from tests/data/
Two classes: TestSweepIntegrationWithMocks and TestPureIntegration
Tests actual workflow with minimal or no mocking
Verifies cross-platform compatibility and path handling

Test Best Practices

Import Management:

Always import at module level: Place all imports at the top of test files, not inside test functions
Example: Import matplotlib.pyplot as plt at the top rather than importing it inside each test
Benefits: Cleaner code, follows Python conventions, better maintainability

Figure Management in Tests:

Close matplotlib figures: Always call plt.close('all') after tests that create visualizations
Prevent test hangs: Unclosed figures can cause tests to hang or run slowly
Mock when possible: Use @patch decorators to mock matplotlib functions for faster tests

Example of proper test structure:

import matplotlib.pyplot as plt
from unittest.mock import patch

class TestVisualization:
    @patch("module.plt.savefig")
    def test_visualization_function(self, mock_savefig):
        # Test code here
        visualization_function()
        
        # Clean up any figures
        plt.close('all')

CI/CD Integration

Multiple GitHub Actions workflows run automatically:

Test Imports (test-imports.yml):

Triggers: Push to all branches + daily schedule (02:00 UTC)
Platforms: Ubuntu, Windows, macOS
Purpose: Cross-platform import validation
Features:
- Python 3.8 compatibility testing
- Lightweight without full SLEAP installation
- Daily monitoring for dependency issues

CI (ci.yml):

Triggers: Pull requests (opened, reopened, synchronize)
Platform: Ubuntu
Purpose: Complete integration testing
Features:
- Full SLEAP installation via pip
- Comprehensive test suite
- Code coverage reporting
- Package building verification

Workflow Priority:

test-imports.yml - Must pass (cross-platform compatibility)
ci.yml - Must pass for PRs (full validation)

Data Management

Training data is stored in sleap packages with embedded images
Labels are stored as SLEAP files (.slp)
Models are stored in timestamped directories under models/
All artifacts are tracked in W&B with comprehensive metadata

Sweep Metrics Evaluation

Quick Start - Get All Metrics from Recent Sweeps

For most use cases, this is what you need:

from sleap_roots_training.evaluate import fetch_metrics_from_sweep_pattern

# Define your target metrics
TARGET_METRICS = ["dist_p50", "dist_p90", "dist_p95", "dist_avg", "vis_prec", "vis_recall"]

# Get all metrics from sweeps matching your experiment pattern
sweep_df = fetch_metrics_from_sweep_pattern(
    name_pattern="20250717_plate_medicago_primary_sweep_receptive_field",
    target_metrics=TARGET_METRICS,
    earliest_time="2025-07-17T00:00:00Z",
    include_config=True
)

print(f"Found {len(sweep_df)} runs from {sweep_df['sweep_id'].nunique()} sweeps")
print(f"Columns: {list(sweep_df.columns)}")

# Analyze results
summary = sweep_df.groupby('sweep_name')[TARGET_METRICS].agg(['mean', 'std', 'count'])
print(summary)

Advanced Usage

Group runs for future organization:

# This will also retroactively group runs with proper names
sweep_df = fetch_metrics_from_sweep_pattern(
    name_pattern="medicago_primary_sweep",
    target_metrics=TARGET_METRICS,
    group_runs=True,  # Automatically group runs
    group_name_base="medicago_receptive_field"
)

Find recent experiments by prefix:

from sleap_roots_training.evaluate import find_and_evaluate_recent_sweeps

# Get all medicago experiments from the last 7 days
df = find_and_evaluate_recent_sweeps(
    experiment_prefix="medicago",
    days_back=7
)

Retroactively group existing ungrouped runs:

from sleap_roots_training.evaluate import group_sweep_runs_retroactively

# Group all runs from a specific sweep ID
updated_runs = group_sweep_runs_retroactively(
    sweep_id="4zkofrue",
    group_name="20250717_plate_medicago_primary_sweep_receptive_field"
)
print(f"Updated {len(updated_runs)} runs")

Key Benefits

No manual sweep ID tracking - Finds sweeps automatically by name pattern
Multi-sweep support - Handles multiple train/test splits in one dataframe
Automatic grouping - Can organize runs retroactively or during fetch
Cross-platform compatibility - Works on Windows, macOS, and Linux
Integration ready - Works with existing evaluation and visualization functions

Migration from Old Workflow

Before (manual sweep IDs):

sweep_ids = ["4zkofrue", "abc123", "xyz789"]  # Manual tracking
sweep_df = fetch_sweep_metrics(sweep_ids=sweep_ids, ...)

After (automatic discovery):

sweep_df = fetch_metrics_from_sweep_pattern(
    name_pattern="your_experiment_name",
    target_metrics=TARGET_METRICS,
    earliest_time="2025-07-17T00:00:00Z"
)

This automatically finds all matching sweeps and combines metrics from all runs across all train/test splits.

Important Notes

Always run notebooks from repository root for proper imports
Use separate branches for different experiments
Model evaluation uses 17.0 px/mm as default scaling factor
W&B runs are automatically tagged and grouped by experiment names
Configuration files are timestamped to maintain experiment reproducibility

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Key Dependencies

Development Setup

1. Install SLEAP

2. Setup Development Environment

3. Environment Activation

Setting Environment Variables

Activation Commands

4. Development Notes

Common Commands

Installation

Testing

Development Tools

Architecture

Core Modules

Configuration System

Training Workflow

Notebook Integration

Key Functions

Training (train.py)

Evaluation (evaluate.py)

Configuration (config.py)

Testing

Running Tests

Test Structure

Test Fixtures

Test Coverage

Test Development Workflow

Test Categories

Test Best Practices

CI/CD Integration

Data Management

Sweep Metrics Evaluation

Quick Start - Get All Metrics from Recent Sweeps

Advanced Usage

Key Benefits

Migration from Old Workflow

Important Notes

Training (`train.py`)

Evaluation (`evaluate.py`)

Configuration (`config.py`)