Project Owner: Lewis Smith Last Updated: 2026-03-16
HARBOR is an asset management algorithm that evolved into an empirical research platform for studying how autonomous trading agents reshape financial markets. The core thesis: autonomous agents (LLM-powered, RL-based, tool-using) are qualitatively different from traditional systematic algorithms — they reason, adapt, and interact strategically — and when deployed at scale, they create emergent coordination, manufactured regimes, and adversarial dynamics that traditional finance cannot explain.
HARBOR's triple role:
- Infrastructure — agents use HARBOR's risk models, portfolio construction, and data pipeline
- Participant — HARBOR-as-agent competes against autonomous agents, testing whether regime-awareness provides edge
- Origin story — the asset management algorithm that sparked the research question
ABF (Artificial Behavioral Finance) is the research track that uses HARBOR's infrastructure and the agent simulation framework to test whether autonomous trading agents create novel market dynamics.
Outputs:
- A modular Python library (HARBOR) with quant, ML, and agent simulation internals.
- A reproducible ABF research pipeline + working-paper-style draft on autonomous agent market dynamics.
High-level HARBOR modules:
harbor.data– SP500 data, universe, features, Massive API client.harbor.risk– HRP, covariance estimation, Monte Carlo VaR/CVaR, regime detection, scenarios, decomposition.harbor.portfolio– optimization, constraints, allocation logic.harbor.backtest– engine, metrics, experiment runners.harbor.ml.volatility– NN volatility forecasters (experimental) + classical baselines (GARCH, EWMA).harbor.ml.behavior_agents– deep RL behavioral portfolio agents (experimental).harbor.agents– Agent simulation framework: market environment, autonomous agents, population experiments.harbor.abf– Research experiments and analysis utilities.harbor.retail– Retail impact metrics and analysis (planned).
The system has two arms: the asset management stack (data → risk → portfolio → backtest) and the research stack (agents + abf + retail). They share infrastructure and feed results into each other.
Goal: Solid, math-heavy portfolio/risk core with clean APIs.
Deliverables:
harbor.data- SP500 survivorship-bias-free loaders (CRSP/WRDS if available; fallback acceptable but documented).
harbor.risk- Covariance estimators (sample, shrinkage).
- Hierarchical Risk Parity (HRP) implementation.
- Basic Monte Carlo return generator + portfolio VaR/CVaR.
harbor.portfolio- Mean–variance / risk-parity / HRP allocation interfaces.
harbor.backtest- Cross-sectional backtest loop with transaction costs and standard performance/risk metrics.
Exit criteria:
- One end-to-end script: "load SP500 → build HRP portfolio → backtest → report risk metrics".
Phase H1 checklist (execution status):
-
harbor.data: point-in-time membership loader with documented fallback, price loader, risk-free loader, local cache scaffolding. -
harbor.risk: sample/shrinkage covariance estimators, HRP implementation, Monte Carlo VaR/CVaR utilities. -
harbor.portfolio: mean-variance, risk parity, and HRP allocation interfaces. -
harbor.backtest: cross-sectional backtest engine with transaction costs and core metrics. - Exit-criteria script added:
experiments/h1_end_to_end_hrp_backtest.py. - Phase H1 unit-test suite added for core data/risk/portfolio/backtest pathways.
Goal: Deepen risk modeling and scenario analysis.
Deliverables:
harbor.risk- Regime-aware covariance (regime switches and shrinkage based on vol state).
- Robust Monte Carlo engines (Student-t marginals, factor-driven simulation).
- Stress testing:
- Shock scenarios (vol spikes, correlation spikes, sector crashes).
- Risk decomposition by factor/cluster.
Exit criteria:
- Config-driven scenario runs with clear outputs (risk decomposition, stress-test reports).
- Pluggable risk engine used by ABF experiments.
Phase H2 checklist (execution status):
- Regime-aware covariance estimators.
- Student-t and factor-driven Monte Carlo engines.
- Config-driven stress scenario runner.
- Risk decomposition by factor and cluster.
- Pluggable risk engine interface.
- 48 H2 tests passing.
- Demo script:
experiments/h2_risk_engine_demo.py.
Goal: Build the market simulation environment and agent interface — the primary research instrument.
Status: Next. This is the foundation for all agent-first research.
Depends on: H2 (complete).
Module: harbor/agents/
harbor/agents/
environment.py # Market sim: price-impact model, order matching, state
base_agent.py # Abstract interface: observe() → decide() → act()
rule_agents.py # Momentum, vol-targeting, mean-reversion (simple baselines)
config.py # Population configs, parameter distributions
metrics.py # Market-level: crowding index, flow imbalance, regime labels
Key design decisions:
| Decision | Choice | Rationale |
|---|---|---|
| Price formation | Impact model (linear temporary + square-root permanent) | Captures dynamics we care about; standard in agent-based finance lit |
| Agent interface | observe() → decide() → act() | Clean separation; works for all agent types |
| Rule-based agents | First implementation | Validates environment before adding complexity |
| Output format | Same DataFrame structure as real data | Enables reuse of entire ABF analysis pipeline |
Deliverables:
harbor.agents.environment: Market environment with price-impact modelharbor.agents.base_agent: Abstract agent interfaceharbor.agents.rule_agents: Momentum, vol-targeting, mean-reversion baselinesharbor.agents.config: Population configuration systemharbor.agents.metrics: Crowding index, flow imbalance, regime labels
Exit criteria:
- Market environment runs end-to-end with rule-based agents
- Price-impact model produces realistic price dynamics (no drift explosion, reasonable vol)
- Metrics module computes crowding, flow imbalance, regime labels
- Output format compatible with ABF analysis pipelines
- Unit tests for environment, agents, metrics
Sprint plan:
| Sprint | Duration | Focus |
|---|---|---|
| H3-S1 | 2 weeks | Market environment + price-impact model + state management |
| H3-S2 | 2 weeks | Base agent interface + rule-based agents (momentum, vol-target, mean-rev) |
| H3-S3 | 2 weeks | Metrics module + config system + end-to-end simulation runs |
Goal: Implement the autonomous agent types that make this project distinctive — LLM agents, RL agents, and HARBOR-as-agent.
Depends on: H3 (simulation core running).
harbor/agents/
llm_agents.py # LLM-based agents (Claude/GPT via API, prompted with market data)
rl_agents.py # RL agents (wraps harbor.ml.behavior_agents, trains in environment)
harbor_agent.py # HARBOR-as-agent: uses harbor.risk + harbor.portfolio to trade
adaptation.py # Agent learning/adaptation between rounds
Agent types:
| Type | Source | Distinguishing Feature |
|---|---|---|
| LLM-based | llm_agents.py |
Reasons about market state via API call, adapts via prompting |
| RL-based | rl_agents.py |
Learns optimal policy through environment interaction (experimental baseline — not validated against classical benchmarks) |
| HARBOR | harbor_agent.py |
Uses HARBOR's own risk/portfolio logic — the system tests itself |
Key design decisions:
| Decision | Choice | Rationale |
|---|---|---|
| LLM agents | Real API calls (Claude/GPT) | Mocking loses the point — real reasoning is what makes agents different |
| RL agents | Wrap existing harbor.ml.behavior_agents | Reuses validated code |
| HARBOR-as-agent | Wraps harbor.risk + harbor.portfolio | Tests the system against itself |
| Adaptation | Between-round strategy updates | Agents learn from past performance |
Exit criteria:
- LLM agents receive market state, output trading decisions via API
- RL agents train within the simulation environment
- HARBOR agent uses existing portfolio logic to trade
- Adaptation mechanism allows strategy updates between rounds
- Unit tests for each agent type (mocked API for CI)
Sprint plan:
| Sprint | Duration | Focus |
|---|---|---|
| H4-S1 | 2 weeks | LLM agent implementation + prompt engineering + API integration |
| H4-S2 | 2 weeks | RL agent wrapper + HARBOR-as-agent + adaptation mechanism |
| H4-S3 | 2 weeks | Integration testing: all agent types in shared simulation |
Goal: Multi-agent population management and predefined experiment configurations for causal testing.
Depends on: H4 (all agent types working).
harbor/agents/
population.py # Population manager: spawn, configure, mix agent types
experiments.py # Predefined experiment configs
analysis.py # Bridge: convert sim output → ABF-compatible DataFrames
Experiment configurations:
- Homogeneous LLM: 50 LLM agents, same base prompt → emergent coordination test
- Mixed autonomous: 20 LLM + 20 RL + 10 HARBOR → interaction dynamics
- Stress injection: External vol shock + mixed population → cascading behavior
- Adaptation test: Agents learn between rounds → convergence or divergence
- HARBOR resilience: HARBOR agent in autonomous-agent-dominated market
Exit criteria:
- Population manager configures and runs multi-agent simulations end-to-end
- At least 3 experiment configurations produce non-trivial results
- Analysis module converts simulation output to ABF-compatible DataFrames
- Results reproducible via config files and
maketargets
Sprint plan:
| Sprint | Duration | Focus |
|---|---|---|
| H5-S1 | 2 weeks | Population manager + experiment config system |
| H5-S2 | 2 weeks | Run all 5 experiment configs + analysis bridge |
| H5-S3 | 2 weeks | Results analysis, reproducibility, documentation |
ABF = research layer that uses HARBOR infrastructure and the agent simulation framework.
Goal: Lock questions, universe, and primary metrics.
Deliverables:
- PRD (
docs/abf-prd.md) v1 done. - SP500 universe with historical membership + clean daily panel.
- Config files for shock definitions and regime definitions.
Goal: Test whether vol-targeting algorithms create momentum and reversal patterns.
Status: Complete but deprecated. Results were weak and time-dependent (post-2020 only, ~1.3bps, economically insignificant). This investigation motivated the pivot to autonomous-agent-specific research. Code remains in harbor/abf/q1/ as historical work. See docs/abf-q1-research-summary.md.
Goal: Test whether independently-built autonomous agents converge on similar strategies without explicit coordination.
Depends on: H4 (autonomous agent types working).
Module: harbor/abf/q1_coordination/
Method:
- Run H4 agent populations with diverse initial configurations (different LLM prompts, different RL training seeds, different HARBOR configs)
- Measure convergence metrics:
| Metric | Definition |
|---|---|
| Position correlation | Cross-agent correlation of portfolio weights over time |
| Strategy similarity | Rolling cosine similarity of agent trade signals |
| Herding intensity | Fraction of agents on the same side of each trade |
| Convergence speed | Time (in rounds) for similarity metrics to plateau |
- Key test: do agents that start different become more similar over time?
- Empirical complement: post-2023 real-market signatures (did cross-asset correlations change character after LLM agent adoption?)
Target figures:
- Convergence trajectories: similarity metrics over simulation rounds for each experiment config
- Herding intensity heatmap: agent-by-agent position correlation matrix at start vs end
- Empirical comparison: real-market correlation structure pre-2020 vs post-2023
- Robustness: sensitivity to LLM prompt variation, RL seed variation
Exit criteria:
- Statistical evidence that independent agents converge (or clear null result)
- At least one convergence metric shows clear temporal trend in simulation
- Empirical comparison with real-market patterns
- Draft text for paper section on emergent coordination
Sprint plan:
| Sprint | Duration | Focus |
|---|---|---|
| A3-S1 | 2 weeks | Convergence metric computation + simulation runs |
| A3-S2 | 2 weeks | Empirical complement (real-market analysis) |
| A3-S3 | 2 weeks | Figures, robustness, write-up |
Goal: Test whether autonomous agent populations CREATE market regimes that wouldn't exist without them.
Depends on: H5 (population experiments) and A3 (convergence findings inform regime analysis).
Module: harbor/abf/q2_regimes/
Method:
- Ablation experiments: run simulation with and without agent populations, compare regime structure using H2's regime detection tools
- Dose-response: vary agent concentration from 10% to 80% of market volume, measure regime intensity
- Regime detection: apply existing
harbor.riskregime tools to synthetic data - Empirical complement: compare regime frequency and intensity pre-2020 vs post-2023
Target figures:
- Ablation comparison: regime structure with vs without agents
- Dose-response curves: agent concentration → regime intensity
- Regime type identification: which regimes are agent-manufactured?
- Empirical comparison: regime frequency pre-2020 vs post-2023
Exit criteria:
- Ablation shows statistically significant regime differences with/without agents
- Dose-response shows monotonic relationship between agent concentration and regime intensity
- At least one manufactured regime type clearly identified
- Draft text for paper section
Sprint plan:
| Sprint | Duration | Focus |
|---|---|---|
| A4-S1 | 2 weeks | Ablation experiments + regime detection on synthetic data |
| A4-S2 | 2 weeks | Dose-response curves + empirical complement |
| A4-S3 | 2 weeks | Figures, statistical tests, write-up |
Goal: Test whether autonomous agents learn to exploit each other and whether this destabilizes prices.
Depends on: A4 (regime findings inform adaptation analysis).
Module: harbor/abf/q3_adversarial/
Method:
- Multi-round simulation where agents adapt strategies between rounds
- Measure: price stability over time, strategy divergence/convergence, Sharpe ratio decay
- Test whether adaptation leads to arms race (increasing instability), equilibrium (stability), or cycles
- Retail impact: how does a naive buy-and-hold investor fare as agents adapt?
- HARBOR impact: does regime-awareness mitigate the damage?
Target figures:
- Price stability over adaptation rounds
- Strategy divergence/convergence trajectories
- Sharpe decay curves for different agent types
- Retail portfolio impact during adversarial periods
- HARBOR-as-agent performance vs naive strategies
Exit criteria:
- Clear characterization of adaptation dynamics (arms race, equilibrium, or cycles)
- Quantified retail portfolio impact during adversarial periods
- HARBOR-as-agent performance comparison
- Draft text for paper section
Sprint plan:
| Sprint | Duration | Focus |
|---|---|---|
| A5-S1 | 2 weeks | Multi-round adaptation experiments |
| A5-S2 | 2 weeks | Retail + HARBOR impact analysis |
| A5-S3 | 2 weeks | Figures, characterization, write-up |
Goal: Complete paper draft with HARBOR-as-participant results and retail impact quantification.
Depends on: A5.
Deliverables:
- HARBOR-as-agent performance across all experiment configs
- Quantified regime-awareness benefit for retail portfolios
- Paper-quality draft: Q1 (coordination) + Q2 (regimes) + Q3 (adaptation) + retail impact
- Reproducible pipeline via
maketargets - Public or semi-public repo with docs and notebooks aligned to the paper
Exit criteria:
- Complete paper draft with figures, tables, and statistical tests
- All experiments reproducible in <5 commands
- At least one expert review (faculty or practitioner)
- Concise writeup suitable for applications
-
Versioning & Changelog
- Semantic Versioning:
MAJOR.MINOR.PATCH. CHANGELOG.mdupdated for each tagged release.
- Semantic Versioning:
-
Documentation
README.mdfor overview and quickstart.docs/abf-prd.mdfor research spec.docs/plan.md(this file) for roadmap and phases.docs/PROJECT_EVOLUTION.mdfor the full project story.
-
Git Hygiene
- Feature branches per module/experiment.
- Descriptive, technical commit messages and PRs.
-
Reproducibility
- Config-driven experiments.
- Minimal "one-command" scripts to reproduce key results.
| Timeframe | HARBOR Focus | ABF Focus | Status |
|---|---|---|---|
| 0–3 months | H1: Core quant stack | A1: Spec + data | ✅ Complete |
| 3–6 months | H2: Advanced risk | A2: Old Q1 (deprecated) | ✅ Complete |
| 6–9 months | H3: Agent simulation core | Empirical insurance track (lightweight) | Next |
| 9–12 months | H4: Autonomous agent types | — | Planned |
| 12–14 months | — | A3: Emergent coordination | Planned |
| 14–16 months | H5: Population dynamics | A4: Regime manufacturing | Planned |
| 16–18 months | — | A5: Adversarial adaptation | Planned |
| 18+ months | — | A6: Retail impact & paper | Planned |
Parallelism: H5 can begin with rule-based agents (from H3) before H4 is fully complete. The empirical insurance track runs independently. A3 starts after H4 delivers working autonomous agents.
- HARBOR H1 (Core Quant Stack): ✅ Complete — data loaders, risk models, portfolio optimization, backtest engine all implemented and tested.
- HARBOR H2 (Advanced Risk & Simulation): ✅ Complete — regime-aware covariance, Student-t/factor Monte Carlo, config-driven stress scenarios, risk decomposition, pluggable risk engine interface. 48 H2 tests passing.
- ABF A2 (Old Q1 Pipeline): ✅ Complete, now deprecated — local projections, CAR computation, robustness sweep. Results were weak/time-dependent, motivating the pivot to autonomous-agent research.
- ML Extensions: Experimental scaffolding — LSTM/GRU vol forecasters and deep RL behavioral agents with unit tests. Classical baselines (GARCH, EWMA) implemented. These serve as infrastructure for RL agent types in H4.
- Agent Simulation (H3): Next —
harbor/agents/stub in place. Building the market environment and agent interface is the immediate priority.
Key scope change (March 2026): Project pivoted from "AI-driven trading broadly" to "autonomous and agentic trading" specifically. Old Q1 deprecated. New three-question research arc: emergent coordination → regime manufacturing → adversarial adaptation. The simulation framework moved from late-stage (old H5) to immediate priority (new H3) as the primary research instrument. See docs/PROJECT_EVOLUTION.md for the full narrative and docs/superpowers/specs/2026-03-16-autonomous-agent-pivot-design.md for the design spec.
Test suite: 205 tests passing.