System Arena

A benchmark and development framework for case-based agents.

Replay historical decision points, compare agent policies against baselines, and evaluate operational outcomes before production deployment.

Installation

pip install system-arena

# With process mining integrations
pip install system-arena[integrations]

Core Concepts

Event Log — Canonical schema for process events (entity.created, state.changed, interaction.occurred, etc.)
Decision Point — A moment where a policy could intervene
Policy — Interface for agent decision-making (threshold, rule-based, or custom)
Replay — Offline evaluation with no-future-leakage guarantee
Metrics — Precision@K, conversion, policy violations

Usage

Each workflow/use-case should be a separate repository that depends on system-arena.

Workflow Repository Structure

my-workflow/
├── manifest.yaml          # Workflow definition
├── data/
│   └── events.parquet     # Historical event log
├── policies/
│   └── my_policy.py       # Custom policies
└── analysis/
    └── notebooks/         # Analysis notebooks

Example: manifest.yaml

name: my_workflow
version: "1.0"

case:
  id_field: lead_id
  timestamp_field: event_timestamp

decision_points:
  - name: followup_due
    trigger: "timer.elapsed"
    condition: "days_since_last_contact > 3"

allowed_actions:
  - action: wait
  - action: send_reminder
  - action: escalate

outcomes:
  positive: [converted, meeting_booked]
  negative: [went_cold, lost]

Example: Running a Benchmark

from arena.datasets import load_event_log, load_manifest
from arena.replay import BenchmarkRunner, BenchmarkConfig
from arena.policy import ThresholdPolicy, ThresholdConfig

# Load data
manifest = load_manifest("manifest.yaml")
event_log = load_event_log("data/events.parquet", manifest)

# Configure policy
policy = ThresholdPolicy(
    thresholds=[
        ThresholdConfig(
            field="days_since_last_contact",
            threshold=3.0,
            action_above="send_reminder",
            action_below="wait",
        )
    ]
)

# Run benchmark
config = BenchmarkConfig(
    decision_rules=manifest.get_decision_rules(),
    constraints=manifest.get_constraints(),
)
runner = BenchmarkRunner(config)
result = runner.run(event_log, policy)

# View results
print(f"Decisions: {result.total_decisions}")
print(f"Violations: {result.total_violations}")

Object-Centric Process Mining

Events can reference multiple objects (not just a single case_id):

from arena.core.types import ObjectRef

event = Event(
    case_id=lead_id,
    event_type=EventType.INTERACTION_OCCURRED,
    object_refs=(
        ObjectRef("lead", "lead_123"),
        ObjectRef("clinic", "clinic_456", role="target"),
        ObjectRef("assistant", "asst_789", role="owner"),
    ),
)

# Filter by object
clinic_events = event_log.filter_by_object("clinic", "clinic_456")

Integrations

PM4Py (Process Discovery)

from arena.integrations import export_to_pm4py, discover_process

# Export for PM4Py analysis
pm4py_log = export_to_pm4py(event_log)

# Discover process model
net, im, fm = discover_process(event_log, algorithm="inductive")

Variant Analysis

from arena.integrations import compare_variants

# Compare successful vs unsuccessful trajectories
comparison = compare_variants(event_log)
print(comparison.summary())

Performance Analysis

from arena.integrations import PerformanceAnalyzer

analyzer = PerformanceAnalyzer()
report = analyzer.analyze(event_log)
print(report.summary())

Modules

Module	Purpose
`arena.core`	Event, EventLog, CaseSnapshot, ObjectRef
`arena.decision`	DecisionPoint extraction, labeling
`arena.policy`	Policy interface, baselines (Random, Rule, Threshold)
`arena.replay`	BenchmarkRunner, no-future-leakage timeline
`arena.eval`	Metrics, comparison, reports
`arena.datasets`	Parquet/JSONL loading, manifest parsing
`arena.integrations`	PM4Py, variant analysis, performance overlays
`arena.cli`	Command-line interface

Related Projects

System Arena builds on ideas from these open-source projects:

Project	Purpose
Temporal	Workflow orchestration and durable execution
PM4Py	Process mining algorithms and analysis
Retentioneering	User behavior and clickstream analysis
Langfuse	LLM observability and tracing
LangGraph	Agent orchestration with state machines
SimPy	Discrete-event process simulation
Camunda/Zeebe	BPMN workflow engine

Architecture Resources

Recommended reading for understanding the architectural patterns used:

awesome-software-architecture — Comprehensive architecture patterns
architecture-decision-record — ADR templates and examples
domain-driven-design-roadmap — DDD learning path
awesome-cqrs-event-sourcing — CQRS and event sourcing resources

Using in Your Project

Each workflow/use-case should be a separate repository that depends on system-arena:

# In your workflow repository
pip install system-arena

Your repository structure:

my-workflow/
├── manifest.yaml          # Workflow definition
├── data/
│   └── events.parquet     # Historical event log
├── policies/
│   └── my_policy.py       # Custom policies
└── analysis/
    └── notebooks/         # Analysis notebooks

See the Usage section for code examples.

Contributing

Contributions are welcome! Here's how to get started:

Fork the repository
Clone your fork: git clone https://github.com/YOUR_USERNAME/system-arena.git
Install dev dependencies: pip install -e ".[dev]"
Create a branch: git checkout -b feature/your-feature
Make changes and add tests
Run checks: ruff check . && mypy src/
Submit a pull request

Please ensure your PR:

Follows existing code style
Includes tests for new functionality
Updates documentation if needed

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.claude		.claude
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
lessons		lessons
src/arena		src/arena
tests/feedback		tests/feedback
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

System Arena

Installation

Core Concepts

Usage

Workflow Repository Structure

Example: manifest.yaml

Example: Running a Benchmark

Object-Centric Process Mining

Integrations

PM4Py (Process Discovery)

Variant Analysis

Performance Analysis

Modules

Related Projects

Architecture Resources

Using in Your Project

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

System Arena

Installation

Core Concepts

Usage

Workflow Repository Structure

Example: manifest.yaml

Example: Running a Benchmark

Object-Centric Process Mining

Integrations

PM4Py (Process Discovery)

Variant Analysis

Performance Analysis

Modules

Related Projects

Architecture Resources

Using in Your Project

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages