HARBOR & ABF — Plan and Roadmap

Project Owner: Lewis Smith Last Updated: 2026-03-16

0. Vision

HARBOR is an asset management algorithm that evolved into an empirical research platform for studying how autonomous trading agents reshape financial markets. The core thesis: autonomous agents (LLM-powered, RL-based, tool-using) are qualitatively different from traditional systematic algorithms — they reason, adapt, and interact strategically — and when deployed at scale, they create emergent coordination, manufactured regimes, and adversarial dynamics that traditional finance cannot explain.

HARBOR's triple role:

Infrastructure — agents use HARBOR's risk models, portfolio construction, and data pipeline
Participant — HARBOR-as-agent competes against autonomous agents, testing whether regime-awareness provides edge
Origin story — the asset management algorithm that sparked the research question

ABF (Artificial Behavioral Finance) is the research track that uses HARBOR's infrastructure and the agent simulation framework to test whether autonomous trading agents create novel market dynamics.

Outputs:

A modular Python library (HARBOR) with quant, ML, and agent simulation internals.
A reproducible ABF research pipeline + working-paper-style draft on autonomous agent market dynamics.

1. Architecture Overview

High-level HARBOR modules:

harbor.data – SP500 data, universe, features, Massive API client.
harbor.risk – HRP, covariance estimation, Monte Carlo VaR/CVaR, regime detection, scenarios, decomposition.
harbor.portfolio – optimization, constraints, allocation logic.
harbor.backtest – engine, metrics, experiment runners.
harbor.ml.volatility – NN volatility forecasters (experimental) + classical baselines (GARCH, EWMA).
harbor.ml.behavior_agents – deep RL behavioral portfolio agents (experimental).
harbor.agents – Agent simulation framework: market environment, autonomous agents, population experiments.
harbor.abf – Research experiments and analysis utilities.
harbor.retail – Retail impact metrics and analysis (planned).

The system has two arms: the asset management stack (data → risk → portfolio → backtest) and the research stack (agents + abf + retail). They share infrastructure and feed results into each other.

2. HARBOR Roadmap (Framework)

2.1 Phase H1 — Core Quant Stack ✅ COMPLETE

Goal: Solid, math-heavy portfolio/risk core with clean APIs.

Deliverables:

harbor.data
- SP500 survivorship-bias-free loaders (CRSP/WRDS if available; fallback acceptable but documented).
harbor.risk
- Covariance estimators (sample, shrinkage).
- Hierarchical Risk Parity (HRP) implementation.
- Basic Monte Carlo return generator + portfolio VaR/CVaR.
harbor.portfolio
- Mean–variance / risk-parity / HRP allocation interfaces.
harbor.backtest
- Cross-sectional backtest loop with transaction costs and standard performance/risk metrics.

Exit criteria:

One end-to-end script: "load SP500 → build HRP portfolio → backtest → report risk metrics".

Phase H1 checklist (execution status):

harbor.data: point-in-time membership loader with documented fallback, price loader, risk-free loader, local cache scaffolding.
harbor.risk: sample/shrinkage covariance estimators, HRP implementation, Monte Carlo VaR/CVaR utilities.
harbor.portfolio: mean-variance, risk parity, and HRP allocation interfaces.
harbor.backtest: cross-sectional backtest engine with transaction costs and core metrics.
Exit-criteria script added: experiments/h1_end_to_end_hrp_backtest.py.
Phase H1 unit-test suite added for core data/risk/portfolio/backtest pathways.

2.2 Phase H2 — Advanced Risk & Simulation ✅ COMPLETE

Goal: Deepen risk modeling and scenario analysis.

Deliverables:

harbor.risk
- Regime-aware covariance (regime switches and shrinkage based on vol state).
- Robust Monte Carlo engines (Student-t marginals, factor-driven simulation).
Stress testing:
- Shock scenarios (vol spikes, correlation spikes, sector crashes).
- Risk decomposition by factor/cluster.

Exit criteria:

Config-driven scenario runs with clear outputs (risk decomposition, stress-test reports).
Pluggable risk engine used by ABF experiments.

Phase H2 checklist (execution status):

Regime-aware covariance estimators.
Student-t and factor-driven Monte Carlo engines.
Config-driven stress scenario runner.
Risk decomposition by factor and cluster.
Pluggable risk engine interface.
48 H2 tests passing.
Demo script: experiments/h2_risk_engine_demo.py.

2.3 Phase H3 — Agent Simulation Core

Goal: Build the market simulation environment and agent interface — the primary research instrument.

Status: Next. This is the foundation for all agent-first research.

Depends on: H2 (complete).

Module: harbor/agents/

harbor/agents/
    environment.py      # Market sim: price-impact model, order matching, state
    base_agent.py       # Abstract interface: observe() → decide() → act()
    rule_agents.py      # Momentum, vol-targeting, mean-reversion (simple baselines)
    config.py           # Population configs, parameter distributions
    metrics.py          # Market-level: crowding index, flow imbalance, regime labels

Key design decisions:

Decision	Choice	Rationale
Price formation	Impact model (linear temporary + square-root permanent)	Captures dynamics we care about; standard in agent-based finance lit
Agent interface	observe() → decide() → act()	Clean separation; works for all agent types
Rule-based agents	First implementation	Validates environment before adding complexity
Output format	Same DataFrame structure as real data	Enables reuse of entire ABF analysis pipeline

Deliverables:

harbor.agents.environment: Market environment with price-impact model
harbor.agents.base_agent: Abstract agent interface
harbor.agents.rule_agents: Momentum, vol-targeting, mean-reversion baselines
harbor.agents.config: Population configuration system
harbor.agents.metrics: Crowding index, flow imbalance, regime labels

Exit criteria:

Market environment runs end-to-end with rule-based agents
Price-impact model produces realistic price dynamics (no drift explosion, reasonable vol)
Metrics module computes crowding, flow imbalance, regime labels
Output format compatible with ABF analysis pipelines
Unit tests for environment, agents, metrics

Sprint plan:

Sprint	Duration	Focus
H3-S1	2 weeks	Market environment + price-impact model + state management
H3-S2	2 weeks	Base agent interface + rule-based agents (momentum, vol-target, mean-rev)
H3-S3	2 weeks	Metrics module + config system + end-to-end simulation runs

2.4 Phase H4 — Autonomous Agent Types

Goal: Implement the autonomous agent types that make this project distinctive — LLM agents, RL agents, and HARBOR-as-agent.

Depends on: H3 (simulation core running).

harbor/agents/
    llm_agents.py       # LLM-based agents (Claude/GPT via API, prompted with market data)
    rl_agents.py        # RL agents (wraps harbor.ml.behavior_agents, trains in environment)
    harbor_agent.py     # HARBOR-as-agent: uses harbor.risk + harbor.portfolio to trade
    adaptation.py       # Agent learning/adaptation between rounds

Agent types:

Type	Source	Distinguishing Feature
LLM-based	`llm_agents.py`	Reasons about market state via API call, adapts via prompting
RL-based	`rl_agents.py`	Learns optimal policy through environment interaction (experimental baseline — not validated against classical benchmarks)
HARBOR	`harbor_agent.py`	Uses HARBOR's own risk/portfolio logic — the system tests itself

Key design decisions:

Decision	Choice	Rationale
LLM agents	Real API calls (Claude/GPT)	Mocking loses the point — real reasoning is what makes agents different
RL agents	Wrap existing harbor.ml.behavior_agents	Reuses validated code
HARBOR-as-agent	Wraps harbor.risk + harbor.portfolio	Tests the system against itself
Adaptation	Between-round strategy updates	Agents learn from past performance

Exit criteria:

LLM agents receive market state, output trading decisions via API
RL agents train within the simulation environment
HARBOR agent uses existing portfolio logic to trade
Adaptation mechanism allows strategy updates between rounds
Unit tests for each agent type (mocked API for CI)

Sprint plan:

Sprint	Duration	Focus
H4-S1	2 weeks	LLM agent implementation + prompt engineering + API integration
H4-S2	2 weeks	RL agent wrapper + HARBOR-as-agent + adaptation mechanism
H4-S3	2 weeks	Integration testing: all agent types in shared simulation

2.5 Phase H5 — Population Dynamics & Experiments

Goal: Multi-agent population management and predefined experiment configurations for causal testing.

Depends on: H4 (all agent types working).

harbor/agents/
    population.py       # Population manager: spawn, configure, mix agent types
    experiments.py      # Predefined experiment configs
    analysis.py         # Bridge: convert sim output → ABF-compatible DataFrames

Experiment configurations:

Homogeneous LLM: 50 LLM agents, same base prompt → emergent coordination test
Mixed autonomous: 20 LLM + 20 RL + 10 HARBOR → interaction dynamics
Stress injection: External vol shock + mixed population → cascading behavior
Adaptation test: Agents learn between rounds → convergence or divergence
HARBOR resilience: HARBOR agent in autonomous-agent-dominated market

Exit criteria:

Population manager configures and runs multi-agent simulations end-to-end
At least 3 experiment configurations produce non-trivial results
Analysis module converts simulation output to ABF-compatible DataFrames
Results reproducible via config files and make targets

Sprint plan:

Sprint	Duration	Focus
H5-S1	2 weeks	Population manager + experiment config system
H5-S2	2 weeks	Run all 5 experiment configs + analysis bridge
H5-S3	2 weeks	Results analysis, reproducibility, documentation

3. ABF Roadmap (Research)

ABF = research layer that uses HARBOR infrastructure and the agent simulation framework.

3.1 Phase A1 — Spec & Data ✅ COMPLETE

Goal: Lock questions, universe, and primary metrics.

Deliverables:

PRD (docs/abf-prd.md) v1 done.
SP500 universe with historical membership + clean daily panel.
Config files for shock definitions and regime definitions.

3.2 Phase A2 — Old Q1: Shock → Persistence → Reversal ✅ COMPLETE (Deprecated)

Goal: Test whether vol-targeting algorithms create momentum and reversal patterns.

Status: Complete but deprecated. Results were weak and time-dependent (post-2020 only, ~1.3bps, economically insignificant). This investigation motivated the pivot to autonomous-agent-specific research. Code remains in harbor/abf/q1/ as historical work. See docs/abf-q1-research-summary.md.

3.3 Phase A3 — Q1: Emergent Coordination

Goal: Test whether independently-built autonomous agents converge on similar strategies without explicit coordination.

Depends on: H4 (autonomous agent types working).

Module: harbor/abf/q1_coordination/

Method:

Run H4 agent populations with diverse initial configurations (different LLM prompts, different RL training seeds, different HARBOR configs)
Measure convergence metrics:

Metric	Definition
Position correlation	Cross-agent correlation of portfolio weights over time
Strategy similarity	Rolling cosine similarity of agent trade signals
Herding intensity	Fraction of agents on the same side of each trade
Convergence speed	Time (in rounds) for similarity metrics to plateau

Key test: do agents that start different become more similar over time?
Empirical complement: post-2023 real-market signatures (did cross-asset correlations change character after LLM agent adoption?)

Target figures:

Convergence trajectories: similarity metrics over simulation rounds for each experiment config
Herding intensity heatmap: agent-by-agent position correlation matrix at start vs end
Empirical comparison: real-market correlation structure pre-2020 vs post-2023
Robustness: sensitivity to LLM prompt variation, RL seed variation

Exit criteria:

Statistical evidence that independent agents converge (or clear null result)
At least one convergence metric shows clear temporal trend in simulation
Empirical comparison with real-market patterns
Draft text for paper section on emergent coordination

Sprint plan:

Sprint	Duration	Focus
A3-S1	2 weeks	Convergence metric computation + simulation runs
A3-S2	2 weeks	Empirical complement (real-market analysis)
A3-S3	2 weeks	Figures, robustness, write-up

3.4 Phase A4 — Q2: Regime Manufacturing

Goal: Test whether autonomous agent populations CREATE market regimes that wouldn't exist without them.

Depends on: H5 (population experiments) and A3 (convergence findings inform regime analysis).

Module: harbor/abf/q2_regimes/

Method:

Ablation experiments: run simulation with and without agent populations, compare regime structure using H2's regime detection tools
Dose-response: vary agent concentration from 10% to 80% of market volume, measure regime intensity
Regime detection: apply existing harbor.risk regime tools to synthetic data
Empirical complement: compare regime frequency and intensity pre-2020 vs post-2023

Target figures:

Ablation comparison: regime structure with vs without agents
Dose-response curves: agent concentration → regime intensity
Regime type identification: which regimes are agent-manufactured?
Empirical comparison: regime frequency pre-2020 vs post-2023

Exit criteria:

Ablation shows statistically significant regime differences with/without agents
Dose-response shows monotonic relationship between agent concentration and regime intensity
At least one manufactured regime type clearly identified
Draft text for paper section

Sprint plan:

Sprint	Duration	Focus
A4-S1	2 weeks	Ablation experiments + regime detection on synthetic data
A4-S2	2 weeks	Dose-response curves + empirical complement
A4-S3	2 weeks	Figures, statistical tests, write-up

3.5 Phase A5 — Q3: Adversarial Adaptation

Goal: Test whether autonomous agents learn to exploit each other and whether this destabilizes prices.

Depends on: A4 (regime findings inform adaptation analysis).

Module: harbor/abf/q3_adversarial/

Method:

Multi-round simulation where agents adapt strategies between rounds
Measure: price stability over time, strategy divergence/convergence, Sharpe ratio decay
Test whether adaptation leads to arms race (increasing instability), equilibrium (stability), or cycles
Retail impact: how does a naive buy-and-hold investor fare as agents adapt?
HARBOR impact: does regime-awareness mitigate the damage?

Target figures:

Price stability over adaptation rounds
Strategy divergence/convergence trajectories
Sharpe decay curves for different agent types
Retail portfolio impact during adversarial periods
HARBOR-as-agent performance vs naive strategies

Exit criteria:

Clear characterization of adaptation dynamics (arms race, equilibrium, or cycles)
Quantified retail portfolio impact during adversarial periods
HARBOR-as-agent performance comparison
Draft text for paper section

Sprint plan:

Sprint	Duration	Focus
A5-S1	2 weeks	Multi-round adaptation experiments
A5-S2	2 weeks	Retail + HARBOR impact analysis
A5-S3	2 weeks	Figures, characterization, write-up

3.6 Phase A6 — Retail Impact & Write-up

Goal: Complete paper draft with HARBOR-as-participant results and retail impact quantification.

Depends on: A5.

Deliverables:

HARBOR-as-agent performance across all experiment configs
Quantified regime-awareness benefit for retail portfolios
Paper-quality draft: Q1 (coordination) + Q2 (regimes) + Q3 (adaptation) + retail impact
Reproducible pipeline via make targets
Public or semi-public repo with docs and notebooks aligned to the paper

Exit criteria:

Complete paper draft with figures, tables, and statistical tests
All experiments reproducible in <5 commands
At least one expert review (faculty or practitioner)
Concise writeup suitable for applications

4. Cross-Cutting Practices

Versioning & Changelog
- Semantic Versioning: MAJOR.MINOR.PATCH.
- CHANGELOG.md updated for each tagged release.
Documentation
- README.md for overview and quickstart.
- docs/abf-prd.md for research spec.
- docs/plan.md (this file) for roadmap and phases.
- docs/PROJECT_EVOLUTION.md for the full project story.
Git Hygiene
- Feature branches per module/experiment.
- Descriptive, technical commit messages and PRs.
Reproducibility
- Config-driven experiments.
- Minimal "one-command" scripts to reproduce key results.

5. Milestone Table (High-Level)

Timeframe	HARBOR Focus	ABF Focus	Status
0–3 months	H1: Core quant stack	A1: Spec + data	✅ Complete
3–6 months	H2: Advanced risk	A2: Old Q1 (deprecated)	✅ Complete
6–9 months	H3: Agent simulation core	Empirical insurance track (lightweight)	Next
9–12 months	H4: Autonomous agent types	—	Planned
12–14 months	—	A3: Emergent coordination	Planned
14–16 months	H5: Population dynamics	A4: Regime manufacturing	Planned
16–18 months	—	A5: Adversarial adaptation	Planned
18+ months	—	A6: Retail impact & paper	Planned

Parallelism: H5 can begin with rule-based agents (from H3) before H4 is fully complete. The empirical insurance track runs independently. A3 starts after H4 delivers working autonomous agents.

6. Current Phase (March 2026)

HARBOR H1 (Core Quant Stack): ✅ Complete — data loaders, risk models, portfolio optimization, backtest engine all implemented and tested.
HARBOR H2 (Advanced Risk & Simulation): ✅ Complete — regime-aware covariance, Student-t/factor Monte Carlo, config-driven stress scenarios, risk decomposition, pluggable risk engine interface. 48 H2 tests passing.
ABF A2 (Old Q1 Pipeline): ✅ Complete, now deprecated — local projections, CAR computation, robustness sweep. Results were weak/time-dependent, motivating the pivot to autonomous-agent research.
ML Extensions: Experimental scaffolding — LSTM/GRU vol forecasters and deep RL behavioral agents with unit tests. Classical baselines (GARCH, EWMA) implemented. These serve as infrastructure for RL agent types in H4.
Agent Simulation (H3): Next — harbor/agents/ stub in place. Building the market environment and agent interface is the immediate priority.

Key scope change (March 2026): Project pivoted from "AI-driven trading broadly" to "autonomous and agentic trading" specifically. Old Q1 deprecated. New three-question research arc: emergent coordination → regime manufacturing → adversarial adaptation. The simulation framework moved from late-stage (old H5) to immediate priority (new H3) as the primary research instrument. See docs/PROJECT_EVOLUTION.md for the full narrative and docs/superpowers/specs/2026-03-16-autonomous-agent-pivot-design.md for the design spec.

Test suite: 205 tests passing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HARBOR & ABF — Plan and Roadmap

0. Vision

1. Architecture Overview

2. HARBOR Roadmap (Framework)

2.1 Phase H1 — Core Quant Stack ✅ COMPLETE

2.2 Phase H2 — Advanced Risk & Simulation ✅ COMPLETE

2.3 Phase H3 — Agent Simulation Core

2.4 Phase H4 — Autonomous Agent Types

2.5 Phase H5 — Population Dynamics & Experiments

3. ABF Roadmap (Research)

3.1 Phase A1 — Spec & Data ✅ COMPLETE

3.2 Phase A2 — Old Q1: Shock → Persistence → Reversal ✅ COMPLETE (Deprecated)

3.3 Phase A3 — Q1: Emergent Coordination

3.4 Phase A4 — Q2: Regime Manufacturing

3.5 Phase A5 — Q3: Adversarial Adaptation

3.6 Phase A6 — Retail Impact & Write-up

4. Cross-Cutting Practices

5. Milestone Table (High-Level)

6. Current Phase (March 2026)

FilesExpand file tree

plan.md

Latest commit

History

plan.md

File metadata and controls

HARBOR & ABF — Plan and Roadmap

0. Vision

1. Architecture Overview

2. HARBOR Roadmap (Framework)

2.1 Phase H1 — Core Quant Stack ✅ COMPLETE

2.2 Phase H2 — Advanced Risk & Simulation ✅ COMPLETE

2.3 Phase H3 — Agent Simulation Core

2.4 Phase H4 — Autonomous Agent Types

2.5 Phase H5 — Population Dynamics & Experiments

3. ABF Roadmap (Research)

3.1 Phase A1 — Spec & Data ✅ COMPLETE

3.2 Phase A2 — Old Q1: Shock → Persistence → Reversal ✅ COMPLETE (Deprecated)

3.3 Phase A3 — Q1: Emergent Coordination

3.4 Phase A4 — Q2: Regime Manufacturing

3.5 Phase A5 — Q3: Adversarial Adaptation

3.6 Phase A6 — Retail Impact & Write-up

4. Cross-Cutting Practices

5. Milestone Table (High-Level)

6. Current Phase (March 2026)