-
Notifications
You must be signed in to change notification settings - Fork 0
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Nexify is a workflow orchestration engine featuring FlowForge, a YAML-driven system for building data pipelines and automating complex workflows. The engine executes self-registering actions through a central registry, providing type-safe execution with Pydantic validation and flexible context handling.
This codebase was extracted from the BiOMapper project to serve as a general-purpose orchestration engine while BiOMapper continues as a specialized biological data harmonization toolkit.
# Install dependencies (base installation)
poetry install
# Install with API support (FastAPI + Uvicorn)
poetry install -E api
# Install all extras
poetry install -E full# Run all tests with coverage
poetry run pytest
# Run specific test categories
poetry run pytest tests/unit/
poetry run pytest tests/integration/
# Run with coverage report
poetry run pytest --cov=nexify --cov-report=html
# Run specific test file
poetry run pytest -xvs tests/unit/path/to/test_file.py
# Run tests matching a pattern
poetry run pytest -k "test_pattern_name"# Format code with Ruff
poetry run ruff format .
# Lint and auto-fix issues
poetry run ruff check . --fix
# Type checking with mypy
poetry run mypy src/nexify/
# Run all quality checks (recommended before commits)
poetry run ruff format . && poetry run ruff check . --fix && poetry run mypy src/nexify/# Execute a workflow via Python
python -c "
from nexify.core.minimal_strategy_service import MinimalStrategyService
import asyncio
async def run():
service = MinimalStrategyService('examples/strategies')
result = await service.execute_strategy('strategy_name', input_identifiers=['id1', 'id2'])
print(result)
asyncio.run(run())
"FlowForge Orchestration Engine (src/nexify/core/)
-
minimal_strategy_service.py: Main orchestration service that loads and executes YAML workflows -
exceptions.py: Standardized error handling with error codes -
models/execution_context.py: Pydantic models for execution state -
infrastructure/parameter_resolver.py: Resolves${parameters.key}placeholders in workflows
Action Registry System (src/nexify/actions/)
-
registry.py: CentralACTION_REGISTRYdict mapping action names to classes -
typed_base.py:TypedStrategyActionbase class for type-safe actions -
base.py: Legacy base class for backward compatibility - Actions auto-register via
@register_action("ACTION_NAME")decorator
Standards Layer (src/nexify/standards/)
-
context_handler.py:UniversalContextwrapper for dict/object context compatibility -
base_models.py: Pydantic base classes (ActionParamsBase,FlexibleBaseModel, etc.) -
file_loader.py: Robust file loading with format detection -
debug_tracer.py: Debug tracing for specific identifiers through pipelines -
known_issues.py: Registry of documented edge cases and workarounds
API Layer (src/nexify/api/) - Optional
-
main.py: FastAPI application entry point -
routes/: API endpoints for strategy execution -
services/: Service layer wrapping core functionality
Client (src/nexify/client/)
-
client_v2.py: Python client for API interactions -
models.py: Client-side data models -
progress.py: Progress tracking utilities
YAML Workflow → MinimalStrategyService → Parameter Resolver
↓
Load from ACTION_REGISTRY
↓
TypedStrategyAction.execute()
↓
Pydantic validation (ActionParamsBase)
↓
UniversalContext (datasets/stats/files)
↓
execute_typed() implementation
↓
ActionResult → Context updates
Actions receive context in two forms:
- Dict context: Legacy format, used by MVP actions
-
Pydantic context:
StrategyExecutionContextfor type safety
Always use UniversalContext to handle both:
from nexify.standards.context_handler import UniversalContext
async def execute_typed(self, params: MyParams, context: Dict) -> ActionResult:
ctx = UniversalContext.wrap(context)
input_data = ctx.get("datasets", {}).get(params.input_key)
# ... process data ...
ctx.set("datasets", {params.output_key: output_data})The orchestration service (MinimalStrategyService) automatically:
- Creates dual contexts (dict + Pydantic)
- Syncs data between them after each action
- Chooses appropriate context based on action compatibility
Actions are organized by domain:
-
entities/: Entity-specific actions (proteins, metabolites, chemistry)-
proteins/: UniProt, Ensembl, gene symbol processing -
metabolites/: HMDB, InChIKey, CHEBI handling -
chemistry/: LOINC, clinical test matching
-
-
algorithms/: Reusable computational algorithms -
utils/: General-purpose utilities -
io/: Data input/output operations -
workflows/: Composite multi-step actions -
reports/: Analysis and reporting actions
Workflows support dynamic parameter substitution:
parameters:
input_file: "/data/proteins.tsv"
output_dir: "/results"
steps:
- action:
type: LOAD_DATASET
params:
file_path: "${parameters.input_file}" # Resolved at runtime
output_key: "raw_data"Supports:
- Parameter references:
${parameters.key} - Metadata references:
${metadata.key} - Nested resolution in dicts, lists, and strings
Steps can include conditions:
steps:
- name: optional_step
condition: "1 in ${parameters.stages_to_run}"
action:
type: SOME_ACTIONConditions are Python expressions evaluated safely with parameter substitution.
from nexify.actions.typed_base import TypedStrategyAction
from nexify.actions.registry import register_action
from nexify.standards.base_models import ActionParamsBase
from nexify.standards.context_handler import UniversalContext
from nexify.core.exceptions import ActionResult
from pydantic import Field
from typing import Dict, Any
class MyActionParams(ActionParamsBase):
"""Parameters for MyAction.
Inherits common fields: debug, trace, timeout, continue_on_error, etc.
"""
input_key: str = Field(..., description="Input dataset key from context")
output_key: str = Field(..., description="Output dataset key to store in context")
threshold: float = Field(0.8, ge=0.0, le=1.0, description="Processing threshold")
@register_action("MY_CUSTOM_ACTION")
class MyCustomAction(TypedStrategyAction[MyActionParams, ActionResult]):
"""Brief description of what this action does.
Detailed explanation of:
- Input expectations
- Processing logic
- Output format
- Edge cases handled
"""
def get_params_model(self) -> type[MyActionParams]:
return MyActionParams
async def execute_typed(self, params: MyActionParams, context: Dict) -> ActionResult:
"""Execute the action with validated parameters.
Args:
params: Validated action parameters
context: Execution context (dict or StrategyExecutionContext)
Returns:
ActionResult with success status and details
"""
# Wrap context for safe access
ctx = UniversalContext.wrap(context)
# Retrieve input data
datasets = ctx.get("datasets", {})
if params.input_key not in datasets:
return ActionResult(
success=False,
message=f"Dataset '{params.input_key}' not found",
details={"available_keys": list(datasets.keys())}
)
input_data = datasets[params.input_key]
# Process data
output_data = self._process_data(input_data, params.threshold)
# Store result in context
datasets[params.output_key] = output_data
ctx.set("datasets", datasets)
return ActionResult(
success=True,
message=f"Processed {len(output_data)} items",
details={
"input_count": len(input_data),
"output_count": len(output_data),
"threshold": params.threshold
}
)
def _process_data(self, data: Any, threshold: float) -> Any:
"""Private helper method for processing logic."""
# Implementation here
return dataActions auto-register when imported. Ensure your action module is imported in the appropriate __init__.py:
# src/nexify/actions/entities/proteins/__init__.py
from .my_action import * # Triggers @register_action decoratorCreate tests following the three-level pattern:
# tests/unit/core/actions/test_my_action.py
import pytest
from nexify.actions.registry import ACTION_REGISTRY
@pytest.mark.asyncio
async def test_my_action_minimal():
"""Level 1: Minimal unit test (<1s)"""
action_class = ACTION_REGISTRY["MY_CUSTOM_ACTION"]
action = action_class()
params = {"input_key": "test_data", "output_key": "result", "threshold": 0.8}
context = {"datasets": {"test_data": [1, 2, 3]}}
result = await action.execute(
current_identifiers=[],
current_ontology_type="protein",
action_params=params,
source_endpoint=None,
target_endpoint=None,
context=context
)
assert "result" in context["datasets"]
assert result.get("details", {}).get("success") is not False
@pytest.mark.asyncio
async def test_my_action_integration():
"""Level 2: Integration test with realistic data (<10s)"""
# Test with larger dataset, edge cases, etc.
pass
@pytest.mark.integration
@pytest.mark.asyncio
async def test_my_action_production_subset():
"""Level 3: Production-like subset test (<60s)"""
# Test with real data patterns, performance validation
passAlways wrap context to handle both dict and object types:
ctx = UniversalContext.wrap(context)
datasets = ctx.get("datasets", {})
ctx.set("datasets", updated_datasets)Use ActionParamsBase for parameters that should accept extra fields:
class MyParams(ActionParamsBase): # Inherits extra='allow'
required_field: str
# Unknown fields won't cause validation errorsFor strict validation, use StrictBaseModel instead.
Return structured errors via ActionResult:
if error_condition:
return ActionResult(
success=False,
message="Clear error description",
details={"error_code": "VALIDATION_ERROR", "field": "problematic_field"}
)Store tabular data in context["datasets"]:
ctx.set("datasets", {
"dataset_key": df, # pandas DataFrame or list of dicts
"another_key": processed_data
})Other context keys:
-
statistics: Dict of computed statistics -
output_files: List of file paths generated -
provenance: List of processing history records -
custom_action_data: Free-form action-specific data
Enable identifier tracing through pipelines:
debug_config = {
'trace_identifiers': ['P12345', 'Q6EMK4'],
'save_trace': '/tmp/debug_trace.json',
'check_known_issues': True
}
result = await service.execute_strategy(
'my_workflow',
debug_config=debug_config
)Traces log each action for specified identifiers to help debug data transformations.
name: workflow_name
description: Brief description of workflow purpose
parameters:
input_file: "/default/path.tsv"
threshold: 0.8
metadata:
version: "1.0"
author: "Your Name"
steps:
- name: load_data
action:
type: LOAD_DATASET_IDENTIFIERS
params:
file_path: "${parameters.input_file}"
identifier_column: id
output_key: raw_data
- name: process
condition: "${parameters.threshold} > 0.5" # Optional
action:
type: MY_CUSTOM_ACTION
params:
input_key: raw_data
output_key: processed_data
threshold: "${parameters.threshold}"
- name: export
action:
type: EXPORT_DATASET_V2
params:
input_key: processed_data
file_path: "${parameters.output_dir}/results.tsv"
format: tsv-
LOAD_DATASET_IDENTIFIERS: Load data from file -
MERGE_DATASETS: Merge multiple datasets -
EXPORT_DATASET_V2: Export dataset to file -
CUSTOM_TRANSFORM_EXPRESSION: Apply pandas transformations - See
ACTION_REGISTRYkeys for full list
from nexify.core.minimal_strategy_service import MinimalStrategyService
import asyncio
async def main():
service = MinimalStrategyService(strategies_dir="./workflows")
result = await service.execute_strategy(
strategy_name="my_workflow",
input_identifiers=["id1", "id2", "id3"],
context={"parameters": {"threshold": 0.9}} # Override defaults
)
print(f"Processed {len(result['current_identifiers'])} identifiers")
print(f"Output datasets: {list(result['datasets'].keys())}")
asyncio.run(main())from nexify.actions.registry import ACTION_REGISTRY
# List all registered actions
print(f"Available actions: {list(ACTION_REGISTRY.keys())}")
# Get action class
action_class = ACTION_REGISTRY["MY_ACTION"]
action = action_class()# Reading datasets
ctx = UniversalContext.wrap(context)
datasets = ctx.get("datasets", {})
my_data = datasets.get("key_name")
# Writing datasets
datasets["new_key"] = processed_df
ctx.set("datasets", datasets)
# Or use direct access if available
ctx.set("datasets", {**datasets, "new_key": processed_df})-
tests/unit/: Fast unit tests (<1s each) -
tests/integration/: Integration tests (<10s each) -
tests/performance/: Performance benchmarks -
tests/test_edge_cases.py: Known edge case validation
# Test specific action
poetry run pytest tests/unit/core/actions/test_my_action.py -v
# Test with debugging
poetry run pytest tests/unit/core/actions/test_my_action.py -xvs
# Test with coverage for specific module
poetry run pytest tests/unit/core/actions/ --cov=nexify.actions --cov-report=term- Minimum coverage: 75% (enforced in pyproject.toml)
- New actions should have >80% coverage
- Critical paths (orchestration, registry) should have >90% coverage
For datasets >1000 rows, consider chunking:
def process_large_dataset(df: pd.DataFrame, chunk_size: int = 10000):
for i in range(0, len(df), chunk_size):
chunk = df.iloc[i:i+chunk_size]
yield process_chunk(chunk)Avoid O(n²) operations on large datasets. Prefer:
- Vectorized pandas operations
- Set-based lookups instead of nested loops
- Pre-computed indexes/mappings
Use @lru_cache for expensive computations called repeatedly:
from functools import lru_cache
@lru_cache(maxsize=128)
def expensive_lookup(identifier: str) -> str:
# Computation here
return result- Action classes:
PascalCase(e.g.,LoadDatasetIdentifiersAction) - Action registration names:
UPPER_SNAKE_CASE(e.g.,LOAD_DATASET_IDENTIFIERS) - Functions/methods:
snake_case(e.g.,process_identifiers) - Constants:
UPPER_SNAKE_CASE(e.g.,DEFAULT_TIMEOUT)
- Standard library imports
- Third-party imports
- Local application imports
- Always include type hints for function signatures
- Use
from typing importfor generic types - Mypy strict mode is enabled for
src/nexify/
- Use Google-style docstrings
- Required for public functions/classes
- Include Args, Returns, Raises sections as applicable
While Nexify is now independent, it maintains compatibility with BiOMapper patterns:
- Action naming conventions align with BiOMapper standards
- Context handling supports both legacy and modern patterns
- Biological data actions (proteins, metabolites) originated from BiOMapper
When working on biological data features, refer to BiOMapper documentation for domain-specific patterns.