This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is a Python package for training and evaluating SLEAP (Social LEAP Estimates Animal Poses) models for root tracking, with integrated Weights & Biases (W&B) logging. The codebase provides a wrapper around SLEAP and W&B for model training, evaluation, and experiment management.
sleap- Core pose estimation librarywandb- Experiment tracking and model managementjupyterlab- For interactive notebooksmatplotlib,seaborn- Visualizationpandas,numpy- Data manipulation
Windows/Linux:
conda create -y -n sleap -c conda-forge -c nvidia -c sleap/label/dev -c sleap -c anaconda sleap=1.4.1
conda activate sleapmacOS:
conda create -y -n sleap -c conda-forge -c anaconda -c sleap sleap=1.4.1
conda activate sleapPyPI (alternative):
pip install sleap[pypi]==1.4.1# Install development dependencies
pip install -e .[dev]
# Login to W&B
wandb loginBefore running commands, set these environment variables based on your system:
Windows:
# Set these variables to match your system
set SLEAP_REPO_PATH=C:\path\to\sleap-roots-training
set CONDA_PATH=C:\path\to\miniforge3 # or Anaconda3, Miniconda3, etc.
set SLEAP_ENV_NAME=sleap # or sleap_v1.4.1, or your custom env name
# Example with typical values:
set SLEAP_REPO_PATH=C:\Users\%USERNAME%\repos\sleap-roots-training
set CONDA_PATH=C:\Users\%USERNAME%\miniforge3
set SLEAP_ENV_NAME=sleap_v1.4.1Linux/macOS:
# Set these variables to match your system
export SLEAP_REPO_PATH=/path/to/sleap-roots-training
export CONDA_PATH=/path/to/miniforge3 # or anaconda3, miniconda3, etc.
export SLEAP_ENV_NAME=sleap # or sleap_v1.4.1, or your custom env name
# Example with typical values:
export SLEAP_REPO_PATH=$HOME/repos/sleap-roots-training
export CONDA_PATH=$HOME/miniforge3
export SLEAP_ENV_NAME=sleap_v1.4.1Windows:
cd "%SLEAP_REPO_PATH%" && source %CONDA_PATH%/etc/profile.d/conda.sh && conda activate %SLEAP_ENV_NAME%Linux/macOS:
cd "$SLEAP_REPO_PATH" && source $CONDA_PATH/etc/profile.d/conda.sh && conda activate $SLEAP_ENV_NAMENote: Adjust paths based on your conda installation:
miniforge3for Miniforge usersanaconda3for Anaconda usersminiconda3for Miniconda users- Custom path if installed elsewhere
- Work from repository root so
sleap_roots_trainingimports work correctly - Use separate branches for different experiments
- Follow the testing guidelines in this document
pip install -e .[dev] # Install in development modemake test # Run all tests with coverage
make test-fast # Run tests without coverage (faster)
make test-unit # Run only unit tests
make test-imports # Test imports only
pytest tests/test_config.py -v # Test specific modulemake format # Format code with black
make lint # Check code formatting
make clean # Clean build artifacts
make build # Build package
make ci # Run full CI pipeline locally
# Manual formatting (when make is not available)
python -m black <file_paths> # Format specific files
python -m black tests/ # Format all test files-
sleap_roots_training/config.py: Configuration management with YAML file support. Handles W&B project settings, experiment names, and registry configuration. -
sleap_roots_training/train.py: Main training orchestration. Contains the primarymain()function that processes training runs, handles W&B logging, and manages model artifacts. Supports both single training runs and parameter sweeps. -
sleap_roots_training/models.py: Model artifact management. Functions for fetching, linking, and promoting models in W&B registries. -
sleap_roots_training/evaluate.py: Model evaluation and visualization. Contains functions for generating predictions, creating visualizations, and evaluating model performance against test datasets. -
sleap_roots_training/datasets.py: Dataset artifact creation and management for W&B.
The configuration is managed through config.yaml in the main module directory. Key configuration parameters:
project_name: W&B project nameentity_name: W&B entity/organizationexperiment_name: Current experiment identifierregistry: W&B model registry namecollection_name: Registry collection name
Configuration can be updated programmatically using functions in config.py.
- Data Preparation: Train/test splits are managed via CSV files containing paths to configuration files
- Configuration: Each training version has an
initial_config_modified_v00{version}.jsonfile - Training Execution: Uses
sleap-traincommand with configuration files - Artifact Logging: Models are logged to W&B with evaluation metrics and visualizations
- Registry Management: Models can be automatically linked to W&B model registries
The repository contains numerous Jupyter notebooks following naming patterns:
YYYYMMDD_experiment_description.ipynb- Main experiment notebookshelper_notebooks/- Reusable notebook templates
Always save copies of helper notebooks with experiment-specific names and work on separate branches.
main(): Main entry point for training runsrun_single_training(): Execute single training runrun_sweep_training(): Execute W&B parameter sweepslog_model_artifact_with_evals(): Log trained models with evaluations
evaluate_model(): Evaluate model against test datasetfetch_sweep_metrics(): Retrieve metrics from W&B sweepspredictions_viz(): Generate prediction visualizationsfetch_metrics_from_sweep_pattern(): [NEW] Find and fetch metrics from sweeps by name patterngroup_sweep_runs_retroactively(): [NEW] Retroactively group sweep runs for organization
load_config(): Load configuration from YAMLupdate_config(): Update specific configuration valuesCONFIG: Global configuration dictionary
Comprehensive test suite with high code coverage (target: 80%+) using pytest:
# Run all tests with coverage
pytest --cov=sleap_roots_training --cov-report=term-missing --cov-report=html
# Run tests without coverage (faster)
pytest -v
# Run specific test file
pytest tests/test_config.py -v
# Run tests with specific markers
pytest -m "unit" -v
pytest -m "integration" -v
# Using Makefile shortcuts
make test # Run all tests with coverage
make test-fast # Run tests without coverage
make test-unit # Run only unit tests
make test-imports # Test imports onlyTest Organization Guidelines:
- One-to-one mapping: For every module
sleap_roots_training/<module>.py, there is a corresponding test filetests/test_<module>.py - Centralized fixtures: All fixtures are defined in
tests/fixtures.pyand imported by test modules - Real test data: Test data is stored in
tests/data/directory with actual SLEAP experiment files
Test Files:
tests/test_config.py- Configuration management teststests/test_train.py- Training workflow tests (unit tests with mocking)tests/test_evaluate.py- Evaluation and metrics teststests/test_models.py- Model artifact management teststests/test_datasets.py- Dataset artifact teststests/test_sweep_integration.py- Sweep integration tests with real datatests/test_imports.py- Basic import verificationtests/conftest.py- Shared fixtures and test configurationtests/fixtures.py- Reusable test fixtures for real datatests/data/- Real test data including SLEAP experiment files
Reusable fixtures are defined in tests/fixtures.py for use across all test modules:
sweep_experiment_data- Real SLEAP experiment data with CSV, config, and SLEAP filestemp_experiment_dir- Temporary copy of experiment data for safe testingrealistic_sweep_config- Full W&B sweep configuration with multiple parameterssmall_sweep_config- Minimal sweep configuration for faster testingmock_models_dir- Mock directory structure for testing model discoveryenvironment_config- Test environment configuration values
Usage in tests:
# Import fixtures at top of test file
from tests.fixtures import sweep_experiment_data, temp_experiment_dir
# Use fixtures in test functions
def test_my_function(sweep_experiment_data, temp_experiment_dir):
# Access real SLEAP data
config = sweep_experiment_data["config"]
df = sweep_experiment_data["df"]
# Use temporary directory for safe testing
temp_config = temp_experiment_dir["config"]
temp_csv = temp_experiment_dir["csv_path"]Cross-platform compatibility:
- All fixtures handle Windows/Linux/macOS path differences
- Use forward slashes in paths to avoid Windows backslash issues
- Temporary directories are automatically cleaned up after tests
- Code coverage is measured and reported for all modules
- Minimum coverage threshold: 80%
- Coverage reports generated in HTML format (
htmlcov/) - XML coverage reports for CI integration (
coverage.xml)
When developing or modifying tests, follow this workflow:
-
Activate environment: Use the correct conda environment activation
# Windows: cd "%SLEAP_REPO_PATH%" && source %CONDA_PATH%/etc/profile.d/conda.sh && conda activate %SLEAP_ENV_NAME% # Linux/macOS: cd "$SLEAP_REPO_PATH" && source $CONDA_PATH/etc/profile.d/conda.sh && conda activate $SLEAP_ENV_NAME
-
Run tests: Execute tests to check current status
python -m pytest --cov=sleap_roots_training --cov-report=term-missing tests/test_<module>.py
-
Format code: Always format test files before committing
python -m black tests/test_<module>.py tests/fixtures.py
-
Verify formatting: Ensure code follows project standards
make lint # or python -m black --check tests/
Unit Tests (test_train.py):
- Comprehensive mocking of external dependencies
- Fast execution with isolated testing
- Tests individual function behavior
Integration Tests (test_sweep_integration.py):
- Uses real SLEAP experiment data from
tests/data/ - Two classes:
TestSweepIntegrationWithMocksandTestPureIntegration - Tests actual workflow with minimal or no mocking
- Verifies cross-platform compatibility and path handling
Import Management:
- Always import at module level: Place all imports at the top of test files, not inside test functions
- Example: Import
matplotlib.pyplot as pltat the top rather than importing it inside each test - Benefits: Cleaner code, follows Python conventions, better maintainability
Figure Management in Tests:
- Close matplotlib figures: Always call
plt.close('all')after tests that create visualizations - Prevent test hangs: Unclosed figures can cause tests to hang or run slowly
- Mock when possible: Use
@patchdecorators to mock matplotlib functions for faster tests
Example of proper test structure:
import matplotlib.pyplot as plt
from unittest.mock import patch
class TestVisualization:
@patch("module.plt.savefig")
def test_visualization_function(self, mock_savefig):
# Test code here
visualization_function()
# Clean up any figures
plt.close('all')Multiple GitHub Actions workflows run automatically:
Test Imports (test-imports.yml):
- Triggers: Push to all branches + daily schedule (02:00 UTC)
- Platforms: Ubuntu, Windows, macOS
- Purpose: Cross-platform import validation
- Features:
- Python 3.8 compatibility testing
- Lightweight without full SLEAP installation
- Daily monitoring for dependency issues
CI (ci.yml):
- Triggers: Pull requests (opened, reopened, synchronize)
- Platform: Ubuntu
- Purpose: Complete integration testing
- Features:
- Full SLEAP installation via pip
- Comprehensive test suite
- Code coverage reporting
- Package building verification
Workflow Priority:
test-imports.yml- Must pass (cross-platform compatibility)ci.yml- Must pass for PRs (full validation)
- Training data is stored in sleap packages with embedded images
- Labels are stored as SLEAP files (
.slp) - Models are stored in timestamped directories under
models/ - All artifacts are tracked in W&B with comprehensive metadata
For most use cases, this is what you need:
from sleap_roots_training.evaluate import fetch_metrics_from_sweep_pattern
# Define your target metrics
TARGET_METRICS = ["dist_p50", "dist_p90", "dist_p95", "dist_avg", "vis_prec", "vis_recall"]
# Get all metrics from sweeps matching your experiment pattern
sweep_df = fetch_metrics_from_sweep_pattern(
name_pattern="20250717_plate_medicago_primary_sweep_receptive_field",
target_metrics=TARGET_METRICS,
earliest_time="2025-07-17T00:00:00Z",
include_config=True
)
print(f"Found {len(sweep_df)} runs from {sweep_df['sweep_id'].nunique()} sweeps")
print(f"Columns: {list(sweep_df.columns)}")
# Analyze results
summary = sweep_df.groupby('sweep_name')[TARGET_METRICS].agg(['mean', 'std', 'count'])
print(summary)Group runs for future organization:
# This will also retroactively group runs with proper names
sweep_df = fetch_metrics_from_sweep_pattern(
name_pattern="medicago_primary_sweep",
target_metrics=TARGET_METRICS,
group_runs=True, # Automatically group runs
group_name_base="medicago_receptive_field"
)Find recent experiments by prefix:
from sleap_roots_training.evaluate import find_and_evaluate_recent_sweeps
# Get all medicago experiments from the last 7 days
df = find_and_evaluate_recent_sweeps(
experiment_prefix="medicago",
days_back=7
)Retroactively group existing ungrouped runs:
from sleap_roots_training.evaluate import group_sweep_runs_retroactively
# Group all runs from a specific sweep ID
updated_runs = group_sweep_runs_retroactively(
sweep_id="4zkofrue",
group_name="20250717_plate_medicago_primary_sweep_receptive_field"
)
print(f"Updated {len(updated_runs)} runs")- No manual sweep ID tracking - Finds sweeps automatically by name pattern
- Multi-sweep support - Handles multiple train/test splits in one dataframe
- Automatic grouping - Can organize runs retroactively or during fetch
- Cross-platform compatibility - Works on Windows, macOS, and Linux
- Integration ready - Works with existing evaluation and visualization functions
Before (manual sweep IDs):
sweep_ids = ["4zkofrue", "abc123", "xyz789"] # Manual tracking
sweep_df = fetch_sweep_metrics(sweep_ids=sweep_ids, ...)After (automatic discovery):
sweep_df = fetch_metrics_from_sweep_pattern(
name_pattern="your_experiment_name",
target_metrics=TARGET_METRICS,
earliest_time="2025-07-17T00:00:00Z"
)This automatically finds all matching sweeps and combines metrics from all runs across all train/test splits.
- Always run notebooks from repository root for proper imports
- Use separate branches for different experiments
- Model evaluation uses 17.0 px/mm as default scaling factor
- W&B runs are automatically tagged and grouped by experiment names
- Configuration files are timestamped to maintain experiment reproducibility