Document: docs/spec/SPEC_DRAFT.md
Date: 2026-02-17
Status: Implementation-ready. Zero-code baseline.
Inputs: SPEC.v3.md, SPEC_NOTES.md, SPEC_DECISIONS_LOCK.md, SPEC_REVISIONS.md
Canonical Location: projects/newton/spec/
- Executive Summary
- Scope and Non-Goals
- Architecture
- Data Specification and Extensible Feature Model
- Strategy Engine and Regime Subsystem
- Execution Model (Spot v1) and Reconciliation
- Risk Model
- Backtesting Framework and Realism Assumptions
- Performance Metrics
- Client UI Functional Specification
- Monitoring, Operations, and Security
- Documentation Strategy
- Governance
- Branching and Stage Gates
- Stage-by-Stage Implementation Plan
- Open Questions
- Decision Log
- Alternatives Considered
- Appendices
Develop a fully automated multi-instrument trading system that generates sustainable income using a hybrid machine-learning approach to identify and execute trades across forex and cryptocurrency markets.
This project is at a pre-development / zero-code baseline. No code has been written. No phases have been implemented. All specification content describes the target system to be built from scratch.
| Instrument | Broker / Exchange | Market Type | Contract Type | Trading Hours |
|---|---|---|---|---|
| EUR/USD | Oanda (v20 REST API) | Forex (Spot) | Spot FX | 24/5 (Sun 17:00 – Fri 17:00 ET) |
| BTC/USD | Binance (REST + WebSocket API) | Cryptocurrency (Spot) | Spot (BTCUSDT pair) | 24/7 |
Design Intent: Including two fundamentally different instruments forces a true multi-instrument, multi-market architecture from day one. The system must account for differing market microstructure, volatility profiles, liquidity characteristics, and trading hours.
v1 is spot-only. No futures, no leverage, no funding rates. The architecture is designed to support derivatives in future versions without major refactoring, but v1 scope is exclusively spot market execution. BTC/USD uses the Binance spot BTCUSDT pair with no leverage.
- Signal generation: 1h candles (primary). Optional 4h confirmation deferred to v1.1.
- Execution horizon: Intra-day to swing (holding periods of hours to days, max 48 hours).
- BTC/USD may require instrument-specific holding period tuning due to higher volatility.
Hybrid model: event-based Bayesian analysis for interpretability and risk framework, combined with a supervised ML model (initially XGBoost; CNN-LSTM evaluated in Stage 3) for pattern recognition. Each instrument uses a strategy configuration tailored to its behavior while sharing common infrastructure.
| Criterion | Target | Measurement Period | Gate Type | Per-Instrument |
|---|---|---|---|---|
| Profit Factor | > 1.3 | 3-month paper trade | Go/No-Go for live | Yes |
| Sharpe Ratio (annualized) | > 0.8 | 3-month paper trade | Go/No-Go for live | Yes |
| Max Drawdown | < 15% (default; configurable per strategy) | 3-month paper trade | Hard stop if breached | Yes |
| Win Rate | > 45% | 3-month paper trade | Informational | Yes |
| Backtest-to-Paper Deviation | < 20% on Sharpe | 3-month paper trade | Investigate if breached | Yes |
| Live Max Drawdown | < 20% (default; configurable per strategy) | Ongoing | Kill switch | Per-instrument + portfolio |
| System Availability | > 99.5% | Monthly | Investigate if breached | System-wide |
| Signal Calibration | Predicted vs. observed ±5 pp per decile | Per walk-forward fold | Retrain if breached | Yes |
Assumption [A1]: These targets are achievable for medium-frequency strategies on these instruments. If paper trading shows Sharpe < 0.5 on either instrument, the strategy hypothesis for that instrument should be revisited, not tuned.
Note: BTC/USD success criteria may require adjustment (e.g., wider drawdown tolerance) given higher baseline volatility. This will be evaluated during Stage 5 backtesting and explicitly decided before paper trading begins. Any adjustments must be configured via per-strategy overrides (see §7.4).
- Data ingestion from Oanda (EUR/USD spot) and Binance (BTC/USD spot, BTCUSDT pair) across 1m, 5m, 1h, 4h, 1d timeframes.
- Historical data backfill (minimum 3 years: 2023-01-01 to present) and validation for both instruments.
- Extensible feature/indicator computation: initial set includes RSI(14), MACD(12,26,9), Bollinger Bands(20,2.0), OBV, ATR(14). New indicators can be added via the feature provider interface without schema changes or disruptive refactoring.
- Bayesian inference engine for generating trade signals based on tokenized indicator events.
- Supervised ML model (XGBoost as default; CNN-LSTM as optional alternative) to provide a complementary probability score.
- Stacking meta-learner to combine Bayesian and ML signals into a calibrated probability.
- Backtesting engine with walk-forward validation and purged K-fold cross-validation.
- Paper trading module via Oanda practice account (EUR/USD) and Binance testnet (BTC/USD spot).
- Risk management: broker-side stops, Kelly-based position sizing, drawdown circuit breakers — all configurable per strategy with global defaults.
- Performance metrics: configurable per strategy with global defaults.
- Operational monitoring: structured logging, Prometheus metrics, Telegram alerts.
- Position reconciliation loop (per-broker).
- Regime detection subsystem with deterministic classification and strategy-aware behavior.
- Instrument-specific strategy configurations sharing common infrastructure.
- Client application (web UI) with strict separation from server, progressing each stage.
- Developer documentation, user/operator documentation, and in-app help.
- HFT (sub-second).
- Futures, leverage, margin trading, funding rates — v1 is spot-only. Architecture supports future derivatives without major refactor.
- Instruments beyond EUR/USD and BTC/USD (v2+).
- Non-technical data sources (sentiment, news, order book) — architecture supports future addition via FeatureProvider interface without major refactor (see §3.6).
- Short selling (v1 is long-only; SELL signal closes existing longs).
- Multi-timeframe confirmation logic (deferred to v1.1).
- Online/incremental learning (v2).
- Dynamic strategy generation beyond static configs (v1 uses static configs; see §5.8 for roadmap).
+==============================================================+
| Newton Server (Single Process) |
| +------------------+ +-------------------+ +-------------+ |
| | Data Module | | Analysis Module | | Trading | |
| | data/ | | analysis/ | | Module | |
| | - fetcher_oanda |->| - event_detector |->| trading/ | |
| | - fetcher_binance| | - tokenizer | | - signal | |
| | - indicators | | - bayesian | | - risk | |
| | - pipeline | | - ml_model | | - executor | |
| | - feature_store | | - meta_learner | | _oanda | |
| | - verifier | | - regime | | _binance | |
| | - db | | - feature_provider| | - reconciler| |
| +--------|---------+ +---------|---------+ +------|------+ |
| | | | |
+===========|======================|====================|========+
| | |
+--------v---------+ +------v-------+ +-------v--------+
| TimescaleDB | | Model | | Broker APIs |
| (PostgreSQL) | | Artifacts | | - Oanda v20 |
+------------------+ | (disk) | | - Binance |
+--------------+ | (Spot) |
+----------------+
+------------------+
| Client (Web UI) |--- REST/WS API only (strict boundary)
+------------------+
Key architectural principles:
- Monolith server — single developer, manageable complexity. Modules separated by directory and interface; extraction to services deferred to when needed.
- Multi-broker abstraction —
BrokerAdapterinterface allows Oanda and Binance to be treated uniformly by upper layers. - Strict client/server separation — UI communicates only via versioned REST/WebSocket APIs. No shared state, no direct DB access from client.
- Instrument-aware pipeline — every stage (data, analysis, trading) is parameterized by instrument with per-instrument configuration.
- Spot-first, derivatives-ready — v1 uses spot execution only. The broker adapter interface and order model are designed so that adding futures/margin support requires implementing new adapter methods, not restructuring the core.
Rules:
- The server exposes a versioned REST API (prefix:
/api/v1/) and optional WebSocket channels as the sole interface for all clients. - The client never accesses the database, file system, or model artifacts directly.
- API contracts (OpenAPI 3.1 schema) are the source of truth for client/server interaction.
- The client validates server behavior: all API responses include checksums, timestamps, and status codes that the client can verify and surface discrepancies.
- The UI is replaceable: any conforming client can substitute the default web UI without server modification.
API Versioning Strategy:
- URL-path versioning:
/api/v1/,/api/v2/, etc. - Breaking changes require a version bump. Non-breaking additions (new optional fields) are allowed within a version.
- Deprecated endpoints are marked with
Sunsetheader and removed no sooner than 2 minor releases later. - OpenAPI schema is auto-generated from server code (FastAPI) and published as a build artifact.
Verification Observability:
- Client-side health panel shows: API connectivity, response latency (p50/p95), schema validation errors, data freshness.
- Any API response failing schema validation is flagged in the UI with details.
class BrokerAdapter(Protocol):
"""Abstract broker interface. Implemented per broker."""
async def get_candles(self, instrument: str, interval: str,
start: datetime, end: datetime) -> list[Candle]: ...
async def get_account(self) -> AccountInfo: ...
async def get_positions(self) -> list[Position]: ...
async def place_market_order(self, instrument: str, units: float,
stop_loss: float,
client_order_id: str) -> OrderResult: ...
async def modify_stop_loss(self, trade_id: str,
new_stop: float) -> OrderResult: ...
async def close_position(self, trade_id: str) -> OrderResult: ...
async def get_order_status(self, client_order_id: str) -> OrderStatus: ...Implementations: OandaAdapter (spot forex), BinanceSpotAdapter (spot crypto).
Rationale: Uniform interface allows the trading module to be broker-agnostic. Instrument configuration maps each instrument to its adapter. New brokers or market types (e.g., futures) can be added by implementing the interface.
Each instrument has an independent configuration that specifies behavior across all system layers:
// config/instruments/EUR_USD.json
{
"instrument_id": "EUR_USD",
"broker": "oanda",
"display_name": "EUR/USD",
"asset_class": "forex",
"market_type": "spot",
"base_currency": "EUR",
"quote_currency": "USD",
"pip_size": 0.0001,
"min_trade_size": 1,
"max_trade_size": 1000000,
"trading_hours": "24/5",
"intervals": ["1m", "5m", "1h", "4h", "1d"],
"signal_interval": "1h",
"typical_spread_pips": 1.5,
"default_slippage_pips": 1.0,
"strategy_config": "config/strategies/EUR_USD_strategy.json",
"risk_overrides": {}
}// config/instruments/BTC_USD.json
{
"instrument_id": "BTC_USD",
"broker": "binance",
"display_name": "BTC/USDT",
"asset_class": "crypto",
"market_type": "spot",
"base_currency": "BTC",
"quote_currency": "USDT",
"symbol": "BTCUSDT",
"pip_size": 0.01,
"min_trade_size": 0.00001,
"max_trade_size": 100,
"trading_hours": "24/7",
"intervals": ["1m", "5m", "1h", "4h", "1d"],
"signal_interval": "1h",
"typical_spread_pct": 0.05,
"default_slippage_pct": 0.02,
"strategy_config": "config/strategies/BTC_USD_strategy.json",
"risk_overrides": {
"hard_stop_pct": 0.03,
"high_volatility_stop_pct": 0.05,
"max_drawdown_pct": 0.25
}
}Key Design Points:
- Strategy configurations are instrument-specific (different event definitions, token sets, thresholds, risk parameters).
- Common infrastructure (pipeline orchestration, Bayesian engine logic, ML training framework, risk framework) is shared.
- Risk overrides per instrument allow tuning for different volatility profiles.
market_type: "spot"is explicit for v1. Future instruments may use"futures"or"margin".
| Failure | Detection | Response | Recovery |
|---|---|---|---|
| Oanda REST API down | HTTP error / timeout | Retry 3× with exponential backoff (2s, 4s, 8s). Halt EUR/USD signals. Alert. | Resume on next successful fetch. Reconcile positions. |
| Binance REST API down | HTTP error / timeout | Retry 3× with exponential backoff. Halt BTC/USD signals. Alert. | Resume on next successful fetch. Reconcile positions. |
| Oanda WebSocket disconnect | Heartbeat timeout (30s) | Reconnect with backoff. Fall back to REST polling (10s). | Auto-reconnect; alert if > 5 min. |
| Binance WebSocket disconnect | Heartbeat timeout (30s) | Reconnect with backoff. Fall back to REST polling (10s). | Auto-reconnect; alert if > 5 min. |
| Database unreachable | Connection error | Halt all operations. Alert. | Auto-retry every 30s. Resume on reconnect. |
| Application crash | Docker restart policy / systemd | Broker-side stops protect open positions. | On restart: reconcile all positions per broker, resume from latest complete candle. |
| Model file missing/corrupt | Hash check on load | Fall back to Bayesian-only mode per instrument. Alert. | Retrain or restore from backup. |
| Single broker down | Per-broker health check | Continue trading on healthy broker. Halt only the affected instrument. | Resume when broker recovers. |
The architecture allows non-technical data (sentiment, news, order book / market microstructure signals) to be added in future versions without major refactor.
Feature Provider Interface:
class FeatureProvider(Protocol):
"""Interface for pluggable feature sources."""
@property
def provider_name(self) -> str: ...
@property
def feature_namespace(self) -> str: ...
async def get_features(self, instrument: str, timestamp: datetime,
lookback: int) -> dict[str, float]: ...
def get_feature_metadata(self) -> list[FeatureMetadata]: ...Built-in providers (v1):
TechnicalIndicatorProvider— RSI, MACD, BB, OBV, ATR, and any additional technical indicators added via the extension mechanism.
Future providers (interface exists, not implemented in v1):
SentimentProvider— social media / news sentiment scores.OrderBookProvider— depth imbalance, bid-ask pressure.NewsProvider— event flags, surprise metrics.
Adding a new indicator or feature provider:
- Implement the
FeatureProviderprotocol (or for technical indicators, extend theTechnicalIndicatorProvider). - Register the provider in the configuration (
config/feature_providers.json). - The provider's features automatically flow into the feature store with its declared namespace.
- The tokenizer and ML feature engineering stages query all registered providers.
- No changes to core pipeline code, database schema, or existing providers are required.
Testing requirements for new indicators:
- Unit tests verifying calculation accuracy against a reference implementation.
- Integration test confirming features are stored and retrievable from the feature store.
- Backtest run demonstrating no regression to existing strategy performance.
- [A2]: Hybrid Bayesian + ML model is the desired approach.
- [A3]: Multi-instrument from v1: EUR/USD (Oanda spot) + BTC/USD (Binance spot).
- [A4]: No code has been written. This is a zero-code baseline.
- [A5]: Python 3.11+ is the implementation language.
- [A6]: RTX 5060ti 16GB available on host for GPU-accelerated training.
- [A7]: Developer is solo; system must be maintainable by one person.
- [A8]: Each instrument may exhibit different market microstructure requiring per-instrument strategy tuning.
- [A9]: v1 is spot-only. No futures, leverage, or margin trading.
| # | Contradiction | Resolution |
|---|---|---|
| 1 | Daily loss limit: SPEC.md says 3%, notes say 2% | Use 2% (more conservative, appropriate for unproven system) |
| 2 | Order type: notes suggest limit, spec says market | Market orders for v1 (simpler, guaranteed fill). Track savings for v2. |
| 3 | Redis: notes mention it; spec omits | Defer to v1.1. Current throughput doesn't justify added infra. |
| 4 | Binance: spec referenced futures | v1 uses Binance spot only (DL-002). No futures dependency. |
| 5 | Indicator schema: JSONB vs. relational | Feature store model (long-format) replaces both approaches. |
| 6 | Trailing stop: spec vs. notes conflict | Trailing stop activates at +1%, breakeven at +2% (notes version, more conservative). |
| 7 | Phase 1 status: earlier specs assumed implemented | Corrected to zero-code baseline. |
| Source | Instrument | API | Candle Confirmation | Auth |
|---|---|---|---|---|
| Oanda v20 | EUR/USD | REST + WebSocket | complete: true flag |
API key (env var) |
| Binance | BTC/USDT (spot) | REST + WebSocket | Kline close event | API key + secret (env vars) |
Fetch Schedule: Poll every 10 seconds after expected candle close; accept only confirmed/complete candles.
Binance-Specific Considerations (Spot):
- BTC/USDT trades 24/7; no market close gaps.
- Volume denomination differs (base vs. quote currency). Normalize to quote currency (USDT).
- Binance rate limits: implement rate limiter (1200 requests/min weight). Track weight per request.
- Spot API does not require futures-specific configuration (no funding rates, no contract type selection).
- Use Binance spot kline/candlestick endpoints for historical and real-time data.
-- OHLCV data (hypertable partitioned by time)
CREATE TABLE ohlcv (
time TIMESTAMPTZ NOT NULL,
instrument TEXT NOT NULL, -- EUR_USD, BTC_USD, etc.
interval TEXT NOT NULL,
open DOUBLE PRECISION NOT NULL,
high DOUBLE PRECISION NOT NULL,
low DOUBLE PRECISION NOT NULL,
close DOUBLE PRECISION NOT NULL,
volume DOUBLE PRECISION NOT NULL,
spread_avg DOUBLE PRECISION,
verified BOOLEAN DEFAULT FALSE,
source TEXT NOT NULL, -- oanda, binance
PRIMARY KEY (time, instrument, interval)
);
SELECT create_hypertable('ohlcv', 'time');
-- Feature store (long-format, extensible)
CREATE TABLE features (
time TIMESTAMPTZ NOT NULL,
instrument TEXT NOT NULL,
interval TEXT NOT NULL,
namespace TEXT NOT NULL, -- 'technical', 'sentiment', 'orderbook', etc.
feature_key TEXT NOT NULL, -- 'rsi:period=14', 'macd:fast=12,slow=26,signal=9:line', etc.
value DOUBLE PRECISION NOT NULL,
PRIMARY KEY (time, instrument, interval, namespace, feature_key)
);
SELECT create_hypertable('features', 'time');
CREATE INDEX idx_features_lookup ON features (instrument, interval, namespace, feature_key, time DESC);
-- Feature metadata (describes available features)
CREATE TABLE feature_metadata (
namespace TEXT NOT NULL,
feature_key TEXT NOT NULL,
display_name TEXT NOT NULL,
description TEXT,
unit TEXT,
params JSONB, -- {"period": 14} etc.
provider TEXT NOT NULL, -- provider_name from FeatureProvider
created_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (namespace, feature_key)
);
-- Events (detected target events)
CREATE TABLE events (
id BIGSERIAL,
time TIMESTAMPTZ NOT NULL,
instrument TEXT NOT NULL,
interval TEXT NOT NULL,
event_name TEXT NOT NULL,
event_value BOOLEAN NOT NULL,
lookforward_periods INTEGER NOT NULL,
price_at_signal DOUBLE PRECISION NOT NULL,
price_at_resolution DOUBLE PRECISION NOT NULL,
PRIMARY KEY (id)
);
CREATE INDEX idx_events_lookup ON events (instrument, interval, event_name, time);
-- Tokens (active indicator state tokens per candle)
CREATE TABLE tokens (
time TIMESTAMPTZ NOT NULL,
instrument TEXT NOT NULL,
interval TEXT NOT NULL,
tokens TEXT[] NOT NULL,
PRIMARY KEY (time, instrument, interval)
);
-- Trades (system trade log)
CREATE TABLE trades (
id BIGSERIAL PRIMARY KEY,
client_order_id TEXT UNIQUE NOT NULL,
broker_order_id TEXT,
instrument TEXT NOT NULL,
broker TEXT NOT NULL, -- oanda, binance
direction TEXT NOT NULL CHECK (direction IN ('BUY', 'SELL')),
signal_score DOUBLE PRECISION NOT NULL,
signal_type TEXT NOT NULL,
regime_label TEXT, -- active regime at signal time
entry_time TIMESTAMPTZ,
entry_price DOUBLE PRECISION,
exit_time TIMESTAMPTZ,
exit_price DOUBLE PRECISION,
quantity DOUBLE PRECISION NOT NULL,
stop_loss_price DOUBLE PRECISION,
status TEXT NOT NULL CHECK (status IN ('PENDING', 'OPEN', 'CLOSED', 'CANCELLED', 'REJECTED')),
pnl DOUBLE PRECISION,
commission DOUBLE PRECISION,
slippage DOUBLE PRECISION,
exit_reason TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Reconciliation log
CREATE TABLE reconciliation_log (
id BIGSERIAL PRIMARY KEY,
checked_at TIMESTAMPTZ DEFAULT NOW(),
broker TEXT NOT NULL,
status TEXT NOT NULL CHECK (status IN ('MATCH', 'SYSTEM_EXTRA', 'BROKER_EXTRA')),
details JSONB,
resolved BOOLEAN DEFAULT FALSE
);
-- Regime log (tracks regime state over time)
CREATE TABLE regime_log (
id BIGSERIAL PRIMARY KEY,
time TIMESTAMPTZ NOT NULL,
instrument TEXT NOT NULL,
regime_label TEXT NOT NULL,
confidence DOUBLE PRECISION,
vol_30d DOUBLE PRECISION,
adx_14 DOUBLE PRECISION,
trigger TEXT NOT NULL, -- 'automatic', 'manual_override'
details JSONB
);
CREATE INDEX idx_regime_lookup ON regime_log (instrument, time DESC);
-- Strategy configuration versions
CREATE TABLE strategy_versions (
id BIGSERIAL PRIMARY KEY,
instrument TEXT NOT NULL,
version INTEGER NOT NULL,
config JSONB NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
created_by TEXT NOT NULL, -- 'user', 'system', 'optimizer'
approved BOOLEAN DEFAULT FALSE,
approved_at TIMESTAMPTZ,
approval_evidence JSONB, -- backtest artifact references
notes TEXT,
UNIQUE (instrument, version)
);
-- Spec deviation log
CREATE TABLE spec_deviations (
id BIGSERIAL PRIMARY KEY,
deviation_id TEXT UNIQUE NOT NULL, -- DEV-001, DEV-002, etc.
spec_section TEXT NOT NULL,
description TEXT NOT NULL,
justification TEXT NOT NULL,
impact TEXT NOT NULL,
risk_assessment TEXT NOT NULL,
status TEXT NOT NULL CHECK (status IN ('PROPOSED', 'APPROVED', 'REJECTED', 'IMPLEMENTED')),
created_at TIMESTAMPTZ DEFAULT NOW(),
reviewed_at TIMESTAMPTZ,
reviewer TEXT
);
-- Configuration change audit log
CREATE TABLE config_changes (
id BIGSERIAL PRIMARY KEY,
changed_at TIMESTAMPTZ DEFAULT NOW(),
changed_by TEXT NOT NULL,
section TEXT NOT NULL, -- 'risk', 'strategy', 'instrument', etc.
instrument TEXT, -- NULL for global changes
old_value JSONB,
new_value JSONB NOT NULL,
reason TEXT
);Problem: Column naming like rsi_14 is rigid. Adding new indicators requires schema migration. Querying across arbitrary indicator sets requires knowing column names at code-time.
Selected approach: Long-format feature store with performance mitigations.
Rationale:
- Scalability: Adding new indicators (technical or otherwise) requires zero schema changes — just insert rows with a new feature key (and register in feature_metadata).
- Self-describing: The
feature_metadatatable provides discovery and documentation. - Namespace isolation:
technical,sentiment,orderbooknamespaces prevent collisions and enable selective querying. - Multi-instrument native: Same schema serves all instruments without instrument-specific columns.
- Extensibility mandate (R-03): New indicators can be added by implementing the
FeatureProviderinterface and registering — no disruptive refactor.
Feature Key Format:
{indicator}:{param1}={value1},{param2}={value2}:{component}
Examples:
rsi:period=14→ RSI with period 14macd:fast=12,slow=26,signal=9:line→ MACD line valuemacd:fast=12,slow=26,signal=9:signal→ MACD signal valuemacd:fast=12,slow=26,signal=9:histogram→ MACD histogrambb:period=20,std=2.0:upper→ Bollinger upper bandbb:period=20,std=2.0:middle→ Bollinger middle bandbb:period=20,std=2.0:lower→ Bollinger lower bandobv:→ On-Balance Volume (no params)atr:period=14→ ATR with period 14
Process for adding a new indicator (e.g., Stochastic RSI):
- Implement calculation in
TechnicalIndicatorProvider(or a new provider class implementingFeatureProvider). - Register the feature key(s) in
feature_metadata(e.g.,stochrsi:period=14,smooth_k=3,smooth_d=3:k). - Add unit tests comparing output against a reference library (TA-Lib or equivalent).
- Run indicator computation pipeline — new features are automatically stored in the feature store.
- Optionally update token classifications and strategy configs to use the new indicator.
- Run backtest to confirm no regression to existing strategies.
Performance Mitigations:
- Composite index on
(instrument, interval, namespace, feature_key, time DESC)for point lookups. - Materialized views for hot query patterns (e.g., latest values for all core indicators for a given instrument/interval).
- Batch insert using
COPYorexecute_valuesfor bulk feature writes. - TimescaleDB compression on older partitions (> 30 days) for storage efficiency.
- Benchmark requirement: Before finalizing Stage 1, benchmark reads of 2 years × 5 indicators × 1h interval. Target: < 500ms for a full feature vector retrieval for a 60-period lookback.
Query Example — Get latest RSI for EUR/USD 1h:
SELECT value FROM features
WHERE instrument = 'EUR_USD'
AND interval = '1h'
AND namespace = 'technical'
AND feature_key = 'rsi:period=14'
ORDER BY time DESC LIMIT 1;Query Example — Get all features for a candle:
SELECT feature_key, value FROM features
WHERE instrument = 'EUR_USD'
AND interval = '1h'
AND namespace = 'technical'
AND time = '2025-01-15T14:00:00Z';| Check | Frequency | Action on Failure |
|---|---|---|
| Gap detection (missing candles) | Every pipeline run, per instrument | Auto-backfill; flag verified = false until filled |
| Duplicate check | Every pipeline run | Deduplicate (keep latest) |
| OHLC logic (high ≥ open, close, low; low ≤ open, close, high) | Every pipeline run | Flag row as suspect; exclude from signal generation |
| Stale data (no new candle within 2× expected interval) | Continuous (watchdog), per instrument | Alert; halt new signals for that instrument |
| Outlier detection (candle range > 10× ATR(14)) | Every pipeline run | Flag; do not auto-exclude but alert for manual review |
| Cross-source sanity (if secondary feed available) | Per candle | Alert if primary and secondary prices diverge significantly |
- All timestamps are stored and processed in UTC.
- No timezone-naive
datetimeobjects anywhere in the codebase. Enforced via linting rule.
An event {INSTRUMENT}_UP_X_PCT_N_PERIODS is defined as:
At candle T (signal candle), the event is TRUE if:
(close[T + N] - close[T]) / close[T] >= X / 100where close[T + N] is the closing price of the candle exactly N periods after T.
This is a close-to-close forward return measurement. NOT a high-watermark measurement.
Rationale: Close-to-close is unambiguous, reproducible, and conservative.
Test Hook: Unit test with a known price series.
v1 Event Catalog (DL-003):
The v1 event catalog is constrained per instrument. Each event has explicit definitions and acceptance criteria:
EUR/USD Events:
| Name | Direction | Threshold % | Lookforward Periods | Interval | Min Occurrences |
|---|---|---|---|---|---|
EURUSD_UP_1PCT_24H |
UP | 1.0 | 24 | 1h | 100 |
EURUSD_DOWN_1PCT_24H |
DOWN | 1.0 | 24 | 1h | 100 |
BTC/USD Events:
| Name | Direction | Threshold % | Lookforward Periods | Interval | Min Occurrences |
|---|---|---|---|---|---|
BTCUSD_UP_3PCT_24H |
UP | 3.0 | 24 | 1h | 100 |
BTCUSD_DOWN_3PCT_24H |
DOWN | 3.0 | 24 | 1h | 100 |
Acceptance criteria for event detection:
- Given a known price series, event labels must match hand-calculated expectations with zero discrepancy.
- Min occurrences threshold must be validated against the 3-year historical dataset. If an event has fewer occurrences than
min_occurrences, alert and log — the event definition may need threshold adjustment. - Events are stored in the
eventstable with all required fields.
| Table | Retention | Compression | Notes |
|---|---|---|---|
ohlcv |
Indefinite (all historical data) | TimescaleDB compression after 90 days | Core dataset, never purge |
features |
Indefinite | TimescaleDB compression after 30 days | Recomputable but expensive |
events |
Indefinite | TimescaleDB compression after 90 days | Required for retraining |
tokens |
Indefinite | TimescaleDB compression after 90 days | Required for retraining |
trades |
Indefinite | None (small table) | Audit record, never purge |
reconciliation_log |
1 year | None | Compress/archive older than 1 year |
regime_log |
Indefinite | None (small table) | Analytics record |
strategy_versions |
Indefinite | None | Audit trail |
config_changes |
Indefinite | None | Audit trail |
spec_deviations |
Indefinite | None | Governance record |
Backup Policy:
- Frequency: Daily automated backup (pg_dump) to local storage + weekly offsite copy.
- Retention: Keep daily backups for 30 days; weekly backups for 1 year.
- Restore test: Monthly restore-test to a temporary database, verifying row counts and data integrity checksums.
Each instrument has a strategy configuration file that defines its behavior:
// config/strategies/EUR_USD_strategy.json
{
"instrument": "EUR_USD",
"events": ["EURUSD_UP_1PCT_24H", "EURUSD_DOWN_1PCT_24H"],
"token_config": "config/classifications/EUR_USD_classifications.json",
"token_selection": {"method": "mutual_information", "top_n": 20, "jaccard_threshold": 0.85},
"bayesian": {"calibration": "isotonic", "posterior_cap": 0.90, "laplace_alpha": 1},
"ml_model": {"type": "xgboost", "lookback_periods": 24, "hyperparams": "auto"},
"meta_learner": {"type": "logistic_regression", "min_samples": 100},
"thresholds": {"strong_buy": 0.65, "buy": 0.55, "sell": 0.40},
"risk_overrides": {},
"performance_overrides": {}
}// config/strategies/BTC_USD_strategy.json
{
"instrument": "BTC_USD",
"events": ["BTCUSD_UP_3PCT_24H", "BTCUSD_DOWN_3PCT_24H"],
"token_config": "config/classifications/BTC_USD_classifications.json",
"token_selection": {"method": "mutual_information", "top_n": 20, "jaccard_threshold": 0.85},
"bayesian": {"calibration": "isotonic", "posterior_cap": 0.90, "laplace_alpha": 1},
"ml_model": {"type": "xgboost", "lookback_periods": 24, "hyperparams": "auto"},
"meta_learner": {"type": "logistic_regression", "min_samples": 100},
"thresholds": {"strong_buy": 0.60, "buy": 0.50, "sell": 0.45},
"risk_overrides": {"hard_stop_pct": 0.03},
"performance_overrides": {"max_drawdown_pct": 0.25}
}Shared infrastructure: Pipeline orchestration, Bayesian engine logic, ML training framework, meta-learner framework, risk framework, reporting.
Instrument-specific: Event definitions, token classifications, threshold values, risk parameter overrides, performance metric overrides, model artifacts.
Format: {INSTRUMENT}_{PREFIX}_{PARAM}_{DATAPOINT}_{TYPE}_{VALUE}
Examples:
EURUSD_RSI14_CL_BLW_30— EUR/USD RSI(14) on Close is Below 30BTCUSD_MACD12269_CL_XABV_0— BTC/USD MACD(12,26,9) crosses Above 0
Token vocabularies are defined per instrument in config/classifications/{INSTRUMENT}_classifications.json.
Acceptance Test: Given known indicator values per instrument, verify exact token output.
- For each event type (per instrument), calculate mutual information
I(Token; Event)for all tokens. - Rank tokens by mutual information.
- Filter redundant tokens: if Jaccard similarity between two tokens' occurrence vectors > 0.85, keep only the higher-MI token.
- Select top N tokens (configurable per instrument, default N=20, max N=50).
- Log the selected token set, their MI scores, and the correlation matrix.
Acceptance Test: On synthetic data with a known informative token and a known noise token, verify the informative token is selected and the noise token is not.
Method: Naïve Bayes with calibration.
Process:
- Calculate prior:
P(Event) = count(Event=TRUE) / count(all) - Calculate likelihood:
P(Token_i | Event) = count(Token_i AND Event) / count(Event)with Laplace smoothing (alpha=1, configurable per strategy). - Calculate posterior using log-odds form (numerically stable):
log_odds = log(P(Event) / P(~Event)) for each token_i in active_tokens: log_odds += log(P(Token_i | Event) / P(Token_i | ~Event)) posterior = sigmoid(log_odds) - Calibration: Apply isotonic regression fitted on out-of-fold predictions.
- Cap: Maximum posterior capped at 0.90 (configurable per instrument strategy).
Known Limitation: Naïve Bayes assumes token independence, which is violated. Calibration partially mitigates this. Future: full joint model (PyMC/MCMC).
Inter-token Correlation Check: At training time, compute pairwise phi coefficients. If |phi| > 0.7 between any pair, log warning. If > 3 pairs exceed threshold, alert and recommend reducing token set.
Default: XGBoost (v1).
Input Features: Last N periods (configurable per instrument strategy, default 24) of: OHLCV returns (not raw prices), indicator features from feature store (all registered providers), token presence flags. Feature count depends on registered feature providers.
Training: Walk-forward: train on rolling 2-year window, validate on next 6 months. Hyperparameter search via Optuna. Early stopping on validation loss.
Acceptance Criteria: Out-of-sample AUC-ROC > 0.55 per instrument. If AUC < 0.55: disable ML component for that instrument; fall back to Bayesian-only mode.
Optional: CNN-LSTM (Stage 3) — evaluated if XGBoost fails to meet AUC threshold. Uses same feature set reshaped as temporal sequences. Requires GPU (RTX 5060ti). Regularization: dropout (0.2-0.5), early stopping, weight decay. Max parameter budget: 1M parameters.
Method: Logistic regression stacking.
Inputs: Bayesian posterior, ML probability, current regime confidence. Trained on out-of-fold predictions from walk-forward.
Signal Interpretation:
| Combined Probability | Signal | Action |
|---|---|---|
| > threshold_strong_buy (per strategy) | STRONG_BUY | Open long position (if no existing position for instrument) |
| > threshold_buy (per strategy) | BUY | Open long position (smaller size) |
| < threshold_sell (per strategy) | SELL | Close any open long position for instrument |
| else | NEUTRAL | No action |
Thresholds are per-instrument, defined in strategy configuration, and derived from walk-forward backtest.
Detect when market conditions shift, enabling strategy adaptation or protective halting. Regime detection operates independently per instrument.
Inputs (per instrument):
- vol_30d: Rolling 30-day annualized realized volatility (close-to-close returns, standard deviation × √252 for forex, √365 for crypto).
- ADX_14: Rolling 14-day Average Directional Index (trend strength).
- Hurst exponent: Deferred to v1.1.
Regime Classification:
| Regime Label | Condition | Typical Behavior |
|---|---|---|
LOW_VOL_TRENDING |
vol_30d < vol_median AND ADX_14 > 25 | Trend-following strategies favored |
LOW_VOL_RANGING |
vol_30d < vol_median AND ADX_14 ≤ 25 | Mean-reversion / range-bound strategies favored |
HIGH_VOL_TRENDING |
vol_30d ≥ vol_median AND ADX_14 > 25 | Aggressive trends, higher risk |
HIGH_VOL_RANGING |
vol_30d ≥ vol_median AND ADX_14 ≤ 25 | Choppy, high-risk — reduce position sizing |
vol_median is calculated over a 2-year rolling window, recalculated monthly (on the first day of each month at 00:00 UTC).
Confidence quantifies how clearly the current market state falls within a regime. It is computed as follows:
Step 1 — Compute normalized distances from classification boundaries:
d_vol = |vol_30d - vol_median| / vol_median
d_adx = |ADX_14 - 25| / 25
Both d_vol and d_adx represent the fractional distance from the classification boundary. A value of 0 means exactly on the boundary; higher values mean further into the regime zone.
Step 2 — Clamp distances to [0, 1]:
d_vol_clamped = min(d_vol, 1.0)
d_adx_clamped = min(d_adx, 1.0)
Step 3 — Compute confidence as geometric mean:
confidence = sqrt(d_vol_clamped × d_adx_clamped)
This produces a value in [0, 1].
Confidence Bands:
| Band | Confidence Range | Behavior |
|---|---|---|
| HIGH | confidence ≥ 0.5 | Normal trading with current strategy config |
| MEDIUM | 0.2 ≤ confidence < 0.5 | Normal trading; log regime as "soft" |
| LOW | confidence < 0.2 | Reduce position size by 50%; widen stops by 50% |
Recompute Cadence: Regime confidence is recomputed on every new 1h candle close, per instrument.
Example:
- vol_30d = 0.18, vol_median = 0.15, ADX_14 = 32
- d_vol = |0.18 - 0.15| / 0.15 = 0.20
- d_adx = |32 - 25| / 25 = 0.28
- confidence = sqrt(0.20 × 0.28) = sqrt(0.056) ≈ 0.237 → MEDIUM
| Regime State | System Behavior |
|---|---|
| High confidence, any regime | Normal trading with current strategy config |
| Medium confidence | Normal trading; regime logged as "soft" |
| Low confidence (transitioning) | Reduce position size by 50%; widen stops by 50% |
| Model degradation in current regime (rolling 30-trade Sharpe < 0) | Halt new entries for that instrument (circuit breaker) |
| Manual override active | Use operator-specified regime until override cleared |
- API endpoint:
PUT /api/v1/regime/{instrument}/override - Body:
{"regime_label": "HIGH_VOL_RANGING", "reason": "Operator assessment", "expires_at": "2025-03-01T00:00:00Z"} - Override automatically expires at
expires_ator when manually cleared viaDELETE /api/v1/regime/{instrument}/override. - All overrides are logged to
regime_logwithtrigger = 'manual_override'. - While override is active, automatic regime detection continues to run and log but does not control behavior.
- Every trade record includes
regime_labelat signal time. - Backtest and live reports include:
- Performance breakdown by regime (Sharpe, PF, win rate per regime).
- Regime timeline chart showing regime transitions overlaid on equity curve.
- Regime duration statistics.
- Charts display regime as colored background bands.
- Walk-forward windows must be long enough to include multiple regime transitions (minimum 2 years training ensures this).
- Backtest reports include per-regime performance and a "regime-adjusted" Sharpe that weights performance by time-in-regime.
- If a regime has < 20 trades in any fold, its performance estimate is flagged as "low sample" and excluded from go/no-go decisions.
Staged Capability:
| Stage | Capability | Version | Description |
|---|---|---|---|
| 1 | Static configs | v1 | Strategy parameters defined in JSON config files. Validated offline via backtest. Edited manually or via UI. |
| 2 | Assisted parameter search | v1.1+ | Offline parameter search over candidate spaces (e.g., threshold sweeps, indicator parameter ranges). Results presented to operator for review. Not auto-deployed. |
| 3 | Server-side generation | v2+ | Automated pipeline that explores parameter spaces, evaluates candidates via walk-forward backtest, and emits versioned strategy config files. Requires approval gate before activation. |
Governance Gates (all stages):
- Proposal: New/modified strategy config is generated (manually or automatically).
- Backtest validation: Config must pass minimum performance bars on walk-forward backtest.
- Review: Operator reviews backtest results, parameter choices, and rationale.
- Approval: Operator explicitly approves (sets
approved = trueinstrategy_versions). Approval must include evidence bundle reference (backtest run ID, metrics summary, regime coverage). - Activation: Approved config is promoted to active. Previous config is archived (never deleted).
- Monitoring: Post-activation, track performance vs. backtest expectations. Auto-halt if deviation exceeds thresholds (Sharpe deviation > 50% from backtest, or any circuit breaker triggered).
Acceptance Test: Verify that a new strategy config cannot be activated without explicit approval. Verify that unapproved configs are not used for live trading.
- All orders are market orders sent via the instrument's broker adapter.
- v1 is spot-only: no leverage, no margin, no funding rates.
- Every order includes broker-side stop-loss:
- Oanda:
stopLossOnFillparameter on the order. - Binance Spot: A separate OCO (One-Cancels-Other) order is placed immediately after the entry fill, containing the stop-loss. If OCO placement fails after entry: close position immediately and alert.
- Oanda:
- Every order has a unique
client_order_idformat:NEWTON-{instrument}-{timestamp_ms}.
Binance Spot-Specific:
- BTC/USDT orders use Binance Spot API (
POST /api/v3/order). - Minimum notional and lot size rules must be validated pre-submission.
- No funding rates apply (spot market).
- No contract type selection needed (spot, not futures).
- Commission is deducted from the received asset. Account for this in position sizing.
Locked assumptions for v1 spot backtesting:
| Parameter | EUR/USD (Oanda Spot) | BTC/USDT (Binance Spot) |
|---|---|---|
| Default spread | 1.5 pips | 0.05% of price |
| Default slippage | 1.0 pip | 0.02% of price |
| Pessimistic multiplier | 2× (applies to both spread and slippage) | 2× |
| Commission | Spread-inclusive (no separate commission) | 0.10% per trade (taker rate for spot) |
| Funding rate | N/A (spot) | N/A (spot) |
| Latency assumption | 100ms order-to-fill | 200ms order-to-fill |
| Fill model | Full fill at modeled price (market order) | Full fill at modeled price (market order) |
| Partial fill simulation | Not simulated in v1 (market orders assumed to fill fully) | Not simulated in v1 |
| Reject simulation | Not simulated in v1 (assumes sufficient liquidity) | Not simulated in v1 |
Fill price (backtest): open[T+1] + slippage + spread/2 for buys; open[T+1] - slippage - spread/2 for sells.
Pessimistic mode: All slippage and spread values are multiplied by the pessimistic multiplier (2×). Commission rates unchanged. This mode is available as a toggle when running backtests.
- Orders submitted with
client_order_idfor idempotency. - Retry up to 3× on 5xx/timeout errors.
- Before retry, check for existing order with same
client_order_idto prevent duplicates. - No retry on 4xx (client error — log and alert).
- All retry attempts logged with timestamps and error details.
Frequency: Every 60 seconds, per broker.
Process:
- Fetch all open positions from broker API.
- Compare with internal
tradestable (status = 'OPEN'). - Classify each position:
| State | Meaning | Action |
|---|---|---|
| MATCH | Internal and broker agree | Log OK |
| SYSTEM_EXTRA | System thinks position is open, broker does not | Alert (CRITICAL). Mark internal trade as CLOSED with exit_reason = 'RECONCILIATION'. |
| BROKER_EXTRA | Broker has position system doesn't know about | Alert (CRITICAL). Halt new entries for that instrument. Create internal record. Require manual review. |
- Log result to
reconciliation_logtable. - Expose reconciliation status as Prometheus metric per broker:
newton_reconciliation_status{broker="oanda|binance", result="match|system_extra|broker_extra"}.
All risk management parameters are configurable per strategy, defaulting to global spec defaults. This enables instrument-specific risk tuning while maintaining safe baselines.
Configuration Precedence (highest to lowest):
- Instrument override (in
config/instruments/{INSTRUMENT}.json→risk_overrides) - Strategy override (in
config/strategies/{INSTRUMENT}_strategy.json→risk_overrides) - Global default (in
config/risk.json→defaults)
When multiple levels specify the same parameter, the highest-precedence non-null value wins.
Global Defaults Location: config/risk.json
Per-Strategy Override Schema: The risk_overrides object in strategy configuration accepts any key from the global defaults. Only specified keys override; unspecified keys fall back to global defaults.
// Example: BTC_USD strategy risk overrides
{
"risk_overrides": {
"hard_stop_pct": 0.03,
"high_volatility_stop_pct": 0.05,
"max_drawdown_pct": 0.25,
"daily_loss_limit_pct": 0.03
}
}Validation Constraints (preventing unsafe overrides):
| Parameter | Minimum | Maximum | Rationale |
|---|---|---|---|
hard_stop_pct |
0.005 (0.5%) | 0.10 (10%) | Prevent negligible or catastrophic stops |
max_risk_per_trade_pct |
0.001 (0.1%) | 0.05 (5%) | Cap single-trade risk |
max_position_pct |
0.005 (0.5%) | 0.20 (20%) | Cap position sizing |
daily_loss_limit_pct |
0.005 (0.5%) | 0.05 (5%) | Cap daily losses |
max_drawdown_pct |
0.05 (5%) | 0.30 (30%) | Prevent reckless drawdown tolerance |
kelly_fraction |
0.10 | 0.50 | Half-Kelly to quarter-Kelly range |
time_stop_hours |
1 | 168 (7 days) | Reasonable holding period bounds |
Any override outside these bounds is rejected at load time with a clear error message. The system will not start with invalid risk configuration.
Audit Logging: Every change to risk parameters (via UI, API, or config file reload) is logged to the config_changes table with: timestamp, who changed it, section, old value, new value, and reason.
// config/risk.json
{
"defaults": {
"max_position_pct": 0.05,
"max_risk_per_trade_pct": 0.02,
"kelly_fraction": 0.25,
"kelly_min_trades": 30,
"kelly_window": 60,
"micro_size_pct": 0.005,
"hard_stop_pct": 0.02,
"trailing_activation_pct": 0.01,
"trailing_breakeven_pct": 0.02,
"time_stop_hours": 48,
"daily_loss_limit_pct": 0.02,
"max_drawdown_pct": 0.20,
"consecutive_loss_halt": 5,
"consecutive_loss_halt_hours": 24,
"gap_risk_multiplier": 2.0,
"volatility_threshold_multiplier": 2.0,
"high_volatility_size_reduction": 0.5,
"high_volatility_stop_pct": 0.03
},
"portfolio": {
"max_total_exposure_pct": 0.10,
"max_portfolio_drawdown_pct": 0.20
}
}| Check | Rule | On Failure | Configurable Per Strategy |
|---|---|---|---|
| Position limit | Max 1 open position per instrument | Reject new order | No (architectural constraint) |
| Portfolio exposure | Max total exposure across all instruments: portfolio.max_total_exposure_pct of equity (default 10%) |
Reject new order | Portfolio-level only |
| Position sizing | Kelly 1/4, minimum of: Kelly result, max_position_pct of equity, max_risk_per_trade_pct risk per trade |
Use smallest of the three | Yes |
| Circuit breaker active | Check daily loss and drawdown flags (per instrument + portfolio) | Reject new order | Yes (thresholds) |
| Data freshness | Last verified candle < 2 × interval ago | Reject new order | No (safety) |
| Model freshness | Days since last retrain < 30 | Alert (warning, not blocking) | No |
| Regime confidence | If regime confidence = LOW (< 0.2) | Reduce position size by 50% | Yes (confidence threshold) |
Kelly Criterion Implementation:
- Rolling window of the last
kelly_windowtrades (default 60) per instrument. - First
kelly_min_tradestrades (default 30): fixedmicro_size_pct(default 0.5% of equity) micro-sizing. - Hard cap: never risk >
max_risk_per_trade_pctper trade, never exceedmax_position_pctposition. - Kelly formula:
f* = kelly_fraction × (p × b - q) / bwhere p = win rate, q = loss rate, b = average win / average loss.
| Control | Specification | Implementation | Configurable Per Strategy |
|---|---|---|---|
| Hard stop-loss | hard_stop_pct below entry (default -2%, BTC -3%) |
Broker-side stop at entry time | Yes |
| Trailing stop activation | Position profit reaches trailing_activation_pct (default +1%) |
Modify broker stop to entry price (breakeven) | Yes |
| Trailing stop advance | Position profit reaches trailing_breakeven_pct (default +2%) |
Modify broker stop to +1% above entry | Yes |
| Time stop | Position open > time_stop_hours (default 48 hours) |
Market close order; exit_reason = 'TIME_STOP' |
Yes |
| Volatility check | ATR(14) > volatility_threshold_multiplier × 30-day average at signal time (default 2×) |
Reduce size by high_volatility_size_reduction (default 50%); widen hard stop to high_volatility_stop_pct |
Yes |
Stop Update Frequency: On every new candle close AND on WebSocket tick updates for positions with profit > 0.5%.
Gap Risk Mitigation: Size assuming worst-case gap of gap_risk_multiplier × stop distance (default 2×).
| Breaker | Trigger | Scope | Action | Reset | Configurable Per Strategy |
|---|---|---|---|---|---|
| Daily loss | Equity drops daily_loss_limit_pct from day-open (default 2%) |
Per instrument + portfolio | Close positions; halt entries | Automatic at 00:00 UTC | Yes (threshold) |
| Max drawdown | Equity drops max_drawdown_pct from ATH (default 20%) |
Per instrument (configurable) + portfolio | Close all; halt all | Manual intervention | Yes (threshold) |
| Consecutive losses | consecutive_loss_halt consecutive losers (default 5) |
Per instrument | Halt entries for consecutive_loss_halt_hours (default 24h) |
Automatic after timeout | Yes (both values) |
| Model degradation | Rolling 30-trade Sharpe < 0 | Per instrument | Halt entries | When Sharpe ≥ 0 | No (safety) |
| Kill switch | Manual activation | System-wide | Close ALL positions on ALL brokers | Manual reset | No (safety) |
Kill Switch:
- Available via UI button AND
POST /api/v1/killendpoint. - On activation: close all positions on all brokers (market orders), cancel all pending, set
kill_switch_active = true. - Reset: manual only via
DELETE /api/v1/kill(requires confirmation). - Activation and reset are logged to
config_changeswith timestamp and reason.
Primary: Walk-Forward Testing.
- Minimum train window: 2 years.
- Test window: 6 months.
- Step: 6 months.
- Embargo: 48 hours (no data from 48h before test window start used in training).
- Minimum 4 folds.
- Runs independently per instrument.
Secondary: Purged K-Fold.
- K = 5, with 48-hour purge zones between folds.
- Used as robustness check; not the primary validation method.
- Runs independently per instrument.
Per-instrument fill model using the locked realism assumptions from §6.2:
- EUR/USD: Fill at
open[T+1] ± (1.0 pip slippage + 0.75 pip half-spread). No separate commission. - BTC/USDT: Fill at
open[T+1] ± (0.02% slippage + 0.025% half-spread). Plus 0.10% taker commission per trade. - Pessimistic mode: 2× multiplier on slippage and spread. Commission unchanged.
No partial fills, no rejects simulated in v1. Market orders assumed to fill fully at the modeled price.
| Bias | Mitigation |
|---|---|
| Look-ahead bias | Walk-forward with 48h embargo; event definitions use only past data |
| Overfitting | Walk-forward + purged K-fold; minimum trade count per fold |
| Survivorship bias | Flag for BTC/USDT (crypto pairs can be delisted); EUR/USD not affected |
| Selection bias | Fixed event catalog and token selection methodology; no manual cherry-picking |
| Data snooping | Hyperparameter search within walk-forward training windows only |
- Per-regime performance breakdown (Sharpe, PF, win rate per regime per instrument).
- Regime transition timeline overlaid on equity curve.
- Regime-adjusted metrics that weight by time-in-regime.
- Low-sample regime flagging: if a regime has < 20 trades in any fold, its performance estimate is flagged as "low sample" and excluded from go/no-go decisions.
Performance metric thresholds and targets are configurable per strategy, defaulting to the global spec defaults defined below. This allows instrument-specific performance expectations (e.g., wider drawdown tolerance for BTC/USD).
Default Metric Thresholds (global):
| Metric | Default Minimum Bar | Hard Gate | Informational | Per-Instrument |
|---|---|---|---|---|
| Sharpe Ratio | > 0.8 | Yes (go/no-go for live) | No | Yes |
| Profit Factor | > 1.3 | Yes (go/no-go for live) | No | Yes |
| Max Drawdown | < 15% | Yes (hard stop if breached) | No | Yes |
| Win Rate | > 45% | No | Yes (informational) | Yes |
| Calmar Ratio | > 0.5 | No | Yes (informational) | Yes |
| Expectancy | > 0 | Yes (must be positive) | No | Yes |
| Trade Count | > 30 per fold | Yes (minimum sample) | No | Yes |
| Calibration Error | < 5 pp per decile | Yes (retrain trigger) | No | Yes |
Per-Strategy Override Capability:
Strategy configurations can override thresholds via performance_overrides:
{
"performance_overrides": {
"max_drawdown_pct": 0.25,
"sharpe_ratio_min": 0.6,
"win_rate_min": 0.40
}
}Which metrics are hard gates vs. informational:
- Hard gates (must be met for go/no-go decisions): Sharpe Ratio, Profit Factor, Max Drawdown, Expectancy, Trade Count, Calibration Error.
- Informational (tracked and reported but do not block progression): Win Rate, Calmar Ratio.
Changing a metric from informational to hard gate (or vice versa) requires a spec deviation record (see §13.1).
Portfolio-Level Metrics (additional, not overridable per strategy):
- Portfolio Sharpe (correlation-adjusted).
- Maximum portfolio drawdown (default < 20%).
- Correlation between instrument returns (target: < 0.5 for diversification benefit).
| Metric | Formula |
|---|---|
| Sharpe Ratio | (mean_return - risk_free_rate) / std_return × √(252) (annualized; use √365 for crypto-only) |
| Profit Factor | sum(winning_trades_pnl) / abs(sum(losing_trades_pnl)) |
| Max Drawdown | max(peak_equity - trough_equity) / peak_equity |
| Win Rate | count(winning_trades) / count(all_trades) |
| Calmar Ratio | annualized_return / max_drawdown |
| Expectancy | (win_rate × avg_win) - (loss_rate × avg_loss) |
| Calibration Error | max(abs(predicted_probability - observed_frequency)) per decile bin |
- Framework: React (or equivalent SPA framework) communicating exclusively via REST API and WebSocket.
- Authentication: HTTP basic auth (username/password from env vars). Accessible on localhost or via SSH tunnel only (DL-008: localhost/controlled access for v1).
- No direct database access. All data flows through server API.
- Progressive delivery (DL-004): Each implementation stage includes thin client milestones (see §15).
Displays:
- Portfolio equity curve (real-time via WebSocket).
- Per-instrument current status: active position, unrealized P&L, regime label, regime confidence, circuit breaker status.
- System health: API latency, data freshness per instrument, broker connectivity, reconciliation status.
- Recent alerts (last 24h).
- Kill switch button (prominent, requires confirmation dialog).
Acceptance Criteria:
- Dashboard loads in < 3 seconds.
- Equity curve updates within 5 seconds of position change.
- Kill switch activates within 2 seconds of confirmation.
Workflows:
- View strategies: List all strategy configs per instrument with version history.
- Create/edit strategy: Form-based editor for strategy parameters (events, thresholds, risk overrides, performance overrides). Validates against schema before save.
- Compare strategies: Side-by-side comparison of two strategy versions with highlighted differences.
- Activate strategy: Promote an approved strategy version to active. Requires confirmation.
- Version history: View all versions, diffs, approval status, evidence bundles, and performance notes.
Acceptance Criteria:
- User can create, save, and activate a strategy config without editing JSON directly.
- Invalid configurations (including unsafe risk overrides) are rejected with clear error messages.
- Strategy activation requires the config to be in "approved" state with evidence bundle.
Workflows:
- Run backtest: Select instrument, strategy config (or version), date range, pessimistic mode toggle. Submit to server. Progress indicator shows status.
- View results: Equity curve chart, trade list, performance metrics table (all metrics from §9), calibration plot, regime overlay.
- Trade overlay on charts: Candlestick chart with entry/exit markers, stop-loss lines, regime bands.
- Compare backtests: Side-by-side performance comparison of two backtest runs.
- Pessimistic mode toggle: Run backtest with 2× slippage/spread.
Acceptance Criteria:
- Backtest of 2 years of 1h data completes within 60 seconds per instrument.
- Charts are interactive (zoom, pan, tooltip with trade details).
- All metrics from §9 are displayed.
Displays:
- Open positions with real-time P&L.
- Signal log: recent signals with scores, regime, and decision.
- Trade history: filterable by instrument, date range, result.
- Reconciliation status per broker.
- Circuit breaker status per instrument.
Controls:
- Manual position close (with confirmation).
- Pause/resume trading per instrument.
- Kill switch (system-wide).
Configurable via UI:
- Risk parameters (per instrument and global) — with validation against bounds from §7.1.
- Performance metric thresholds (per strategy) — with governance tracking.
- Alert preferences (Telegram on/off, alert levels).
- Regime override (set manual regime per instrument with expiry).
- Trading mode (paper/live) — requires elevated confirmation for switch to live.
Not configurable via UI (require config file edit):
- Database connection strings.
- API keys.
- Core system architecture settings.
- Validation constraint bounds (min/max for risk parameters).
- Daily report: P&L, trades, regime, alerts.
- Weekly report: Rolling Sharpe, drawdown, regime distribution, comparison to backtest expectations.
- Monthly report: Full performance analysis, backtest-to-live deviation, model freshness, system availability.
- Reports are viewable in UI and exportable as PDF.
- Every major UI section has a help icon that opens contextual documentation.
- Help content covers: what the section does, key concepts, common workflows, troubleshooting.
- Help content is stored as markdown files served by the API (
GET /api/v1/docs/{section}). - Help content is versioned with the application.
- Structured JSON to stdout.
- Every log entry includes:
timestamp(UTC),level,module,instrument(if applicable),broker(if applicable),message,extra(structured data). - Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL.
- No secrets in log output. Enforced via code review.
| Metric | Type | Labels |
|---|---|---|
newton_health |
Gauge | module={data,analysis,trading} |
newton_signal_latency_seconds |
Histogram | instrument |
newton_trades_total |
Counter | instrument, direction, result={win,loss} |
newton_pnl_current |
Gauge | instrument |
newton_equity |
Gauge | scope={portfolio,EUR_USD,BTC_USD} |
newton_drawdown_pct |
Gauge | scope={portfolio,EUR_USD,BTC_USD} |
newton_reconciliation_status |
Gauge | broker, result |
newton_model_rolling_sharpe |
Gauge | instrument |
newton_model_rolling_accuracy |
Gauge | instrument |
newton_data_staleness_seconds |
Gauge | instrument, interval |
newton_circuit_breaker_active |
Gauge | instrument, type |
newton_regime_current |
Gauge | instrument, regime_label |
newton_regime_confidence |
Gauge | instrument |
newton_api_request_duration_seconds |
Histogram | endpoint, method |
| Alert | Level | Trigger |
|---|---|---|
| Broker API failure (after retries) | CRITICAL | 3 failed retries |
| Reconciliation mismatch | CRITICAL | SYSTEM_EXTRA or BROKER_EXTRA |
| Kill switch activated | CRITICAL | Manual activation |
| Circuit breaker triggered | WARNING | Any breaker fires |
| Data staleness | WARNING | No new candle within 2× interval |
| Model degradation | WARNING | Rolling Sharpe < 0 |
| Regime transition | INFO | Regime label changed |
| Trade executed | INFO | Order filled |
| Daily summary | INFO | 00:00 UTC daily |
Each alert message includes: timestamp, instrument (if applicable), broker (if applicable), and actionable context.
- Endpoint:
GET /api/v1/health - Response:
200 OKwith JSON:
{
"status": "healthy",
"db": true,
"brokers": {
"oanda": {"connected": true, "last_response_ms": 45},
"binance": {"connected": true, "last_response_ms": 32}
},
"instruments": {
"EUR_USD": {"last_candle_age_seconds": 120, "reconciled": true, "regime": "LOW_VOL_TRENDING", "regime_confidence": 0.65},
"BTC_USD": {"last_candle_age_seconds": 85, "reconciled": true, "regime": "HIGH_VOL_RANGING", "regime_confidence": 0.42}
},
"kill_switch_active": false,
"uptime_seconds": 86400
}- Oanda API key: Stored as env var
OANDA_API_KEY. Generated with trade-only permissions. - Binance API keys: Stored as env vars
BINANCE_API_KEYandBINANCE_API_SECRET. Generated with spot-trade-only permissions (no withdrawal, no transfer, no futures). - Binance IP whitelist: API keys restricted to server IP.
- UI auth: HTTP basic auth (username/password from env vars
NEWTON_UI_USERandNEWTON_UI_PASS). - Exposure policy (DL-008): v1 is localhost/SSH-tunnel only. No public internet exposure. If external access is needed in future, requires upgrade to stronger auth (OIDC/JWT or VPN/Tailscale).
- No secrets in code or config files. All secrets loaded from environment variables. Enforced via CI secret scan.
- Database: Accessible only from localhost. No remote connections.
Runbooks are maintained in docs/ops/runbooks.md and cover:
- Broker API outage — detection, impact assessment, manual intervention, recovery verification.
- Database failure — restart procedures, backup restoration, data integrity verification.
- Reconciliation mismatch — investigation steps, manual resolution, root cause documentation.
- Kill switch activation — when to use, post-activation checklist, restart procedure.
- Model retraining — schedule, validation criteria, rollback if metrics degrade.
- Binance-specific — rate limit exceeded, IP whitelist update, API key rotation.
- Data gap recovery — manual backfill procedures, verification.
| Type | Audience | Content | Location |
|---|---|---|---|
| Developer docs | Developer (self) | Architecture decisions, module APIs, data flows, contribution guide, setup instructions | docs/dev/ in repo |
| Operator docs | System operator | Deployment, configuration, monitoring, runbooks, troubleshooting | docs/ops/ in repo |
| User docs | UI user (may be same person) | UI workflows, strategy management, interpreting reports, FAQ | docs/user/ in repo, served via API for in-app help |
| API docs | Any client developer | REST API reference (auto-generated from OpenAPI) | Auto-generated, served at /api/v1/docs |
| Stage | Developer Docs | Operator Docs | User Docs |
|---|---|---|---|
| Stage 1 (Data) | Data pipeline architecture, schema docs, fetcher API | DB setup, data backfill procedures | — |
| Stage 2 (Analysis) | Event/token/Bayesian engine internals, feature store API | Model training procedures | — |
| Stage 3 (ML) | ML pipeline, feature engineering, model evaluation | Model retraining runbook | — |
| Stage 4 (Trading) | Risk engine, executor, reconciler internals | Risk config guide, circuit breaker reference | — |
| Stage 5 (Backtest) | Backtest engine architecture | Running backtests, interpreting results | Backtest UI guide |
| Stage 6 (Integration) | Client architecture, API contract | Deployment, auth setup | Full UI user guide, in-app help |
| Stage 7 (Paper) | Paper trading specifics | Paper account setup, monitoring | Paper trading guide |
| Stage 8 (Live) | — | Live deployment checklist, incident response | Live trading guide |
When implementation reveals a better solution than what the spec prescribes:
Process:
- Trigger: Developer identifies that deviating from the spec would produce a better outcome.
- Document: Create a spec deviation record (in
spec_deviationstable ANDspec/deviations/DEV-NNN.md):- Spec section affected.
- What the spec says vs. what is proposed.
- Justification with evidence (benchmark, simplicity, safety, etc.).
- Impact assessment (what else changes as a result).
- Risk assessment (what could go wrong with the deviation).
- Review: Self-review against checklist:
- Does this compromise safety? (If yes: do not deviate without external review.)
- Does this change the risk profile? (If yes: update risk documentation.)
- Does this affect other stages? (If yes: document cascading impacts.)
- Approve: Mark deviation as APPROVED (or REJECTED with reasoning).
- Implement: Proceed with implementation. Reference deviation ID in code comments and commit messages.
- Changelog: Update spec changelog with deviation summary.
Principle: Adherence to spec is important, but the spec serves the system, not the other way around. Deviations must be explicit, justified, and traceable.
Approval Authority:
- BJ (sole developer/operator) is the approval authority for all strategy configuration changes.
- No automated system may activate a strategy without explicit human approval.
Required Evidence Before Approval:
- Walk-forward backtest results showing all hard-gate metrics met (§9.1).
- Pessimistic mode backtest results (2× slippage/spread).
- Per-regime performance breakdown showing no regime with Sharpe < 0 (where sample size ≥ 20 trades).
- Comparison against currently active strategy (improvement or acceptable trade-off documented).
- Evidence bundle stored in
strategy_versions.approval_evidenceas JSON reference to backtest run IDs.
Rollback Triggers:
- Live Sharpe deviation > 50% below backtest Sharpe (measured over rolling 30-trade window).
- Any circuit breaker triggered that was not triggered in backtesting.
- Two consecutive weeks of negative P&L not seen in backtest.
- Manual operator decision at any time.
Emergency Rollback Process:
- Activate kill switch if positions are at risk.
- Revert to previously active strategy version via
PUT /api/v1/strategy/{instrument}/activatewith previous version ID. - Log rollback with reason to
config_changes. - Post-mortem analysis within 24 hours.
Audit Trail:
- All strategy version changes logged in
strategy_versionstable (never deleted). - All activations and rollbacks logged in
config_changestable. - Evidence bundles referenced and preserved.
main— protected baseline. Always deployable. No direct commits.stage/{N}-{name}— development branch for each stage (e.g.,stage/1-data-pipeline).- Feature branches off stage branches for larger sub-tasks.
- Create
stage/{N}-{name}branch frommain. - Develop and test on stage branch.
- Stage exit criteria must be met before merge.
- PR to
mainwith checklist review. - Merge to
main. Tag release (e.g.,v0.1.0for Stage 1).
Every stage merge to main requires all of the following:
Automated checks (CI must pass):
| Check | Tool | Threshold |
|---|---|---|
| Unit tests | pytest | All pass |
| Integration tests | pytest | All pass |
| Type checking | mypy (strict mode) | Zero errors |
| Linting | ruff | Zero errors |
| Secret scan | gitleaks or truffleHog | Zero findings |
| Code coverage (global) | pytest-cov | ≥ 80% line coverage |
| Code coverage (critical paths) | pytest-cov | 100% branch coverage for: risk engine, order execution, reconciliation, circuit breakers |
| API schema validation | openapi-spec-validator | Pass |
| No timezone-naive datetimes | custom lint rule | Zero violations |
Manual checks (self-review checklist):
- All functional requirements for the stage are implemented.
- No CRITICAL or HIGH severity bugs open.
- Documentation deliverables for the stage are complete.
- Performance within NFR bounds (signal latency < 5s, API response < 200ms p95).
- Any spec deviations documented and approved.
- Client milestones for the stage are met.
- PR description includes summary of changes and test evidence.
Branch protection rules for main:
- No direct pushes.
- PR required with at least self-review.
- All CI checks must pass.
- Squash merge preferred for clean history.
No code has been written. This plan starts from zero.
Duration: 3-4 weeks.
Server Milestones:
| Deliverable | Acceptance Criteria |
|---|---|
| Oanda data fetcher (EUR/USD) | Historical + real-time candle ingestion; row count matches expected ±1% for 3-year backfill |
| Binance spot data fetcher (BTC/USDT) | Historical + real-time candle ingestion; row count matches expected ±1% for 3-year backfill |
| TimescaleDB schema setup | All tables created per §4.2, hypertables configured, indexes verified |
| Feature store (technical indicators) | RSI, MACD, BB, OBV, ATR computed and stored for both instruments; output matches TA-Lib reference < 0.01% deviation on 100 random candles |
| Data quality checks | Gap detection, OHLC verification, staleness watchdog operational |
| Data backfill (3 years both instruments) | Verified data for 2023-01-01 to present for both instruments |
Health endpoint (/api/v1/health) |
Returns DB + broker connectivity status |
| API: data query endpoints | OHLCV + feature retrieval via REST; OpenAPI schema published |
| Feature store benchmark | < 500ms for 60-period lookback across 5 indicators |
Client Milestones (DL-004 — thin client each stage):
| Deliverable | Acceptance Criteria |
|---|---|
| Health/status page | Displays system health, DB status, broker connectivity, data freshness per instrument |
| Data viewer | View recent candles and indicator values per instrument/interval |
Exit Criteria: Verified 3-year data for both instruments. Feature store populated and benchmarked. Health endpoint operational. Client health page functional. All Stage 1 tests pass. All gate checks from §14.3 pass.
Duration: 3-4 weeks.
Server Milestones:
| Deliverable | Acceptance Criteria |
|---|---|
| Event detection engine | Configurable per instrument; detects events matching hand-calculated expectations on known price series |
| Token generator + classifier | Per-instrument token vocabularies producing correct tokens for known indicator values |
| Token selection (MI ranking, redundancy filter) | On synthetic data: informative token selected, noise token excluded |
| Bayesian scorer + isotonic calibration | Calibrated probabilities with < 5pp deviation per decile on out-of-fold data |
Signal endpoint (/api/v1/signal/{instrument}) |
Returns current signal with score, confidence, regime, and metadata |
| Backfill events + tokens (3 years both instruments) | Events and tokens computed for all historical data |
Client Milestones:
| Deliverable | Acceptance Criteria |
|---|---|
| Signal viewer | Displays current signal per instrument with score and metadata |
| Event/token explorer | Browse detected events and active tokens for recent candles |
Exit Criteria: Bayesian scores produced for both instruments. Calibration within bounds. Client signal viewer functional. All gate checks pass.
Duration: 3-4 weeks.
Server Milestones:
| Deliverable | Acceptance Criteria |
|---|---|
| Feature engineering pipeline | Rolling windows, returns, feature vectors built from feature store |
| XGBoost training with Optuna | Walk-forward evaluation per instrument; AUC > 0.55 (or fallback documented) |
| CNN-LSTM (conditional) | Available if XGBoost AUC < 0.55 for any instrument |
| Model artifact storage + versioning | Models saved with hash, metadata, version; integrity verified on load |
| Meta-learner (logistic regression) | Combined calibrated signal per instrument; calibration < 5pp |
Client Milestones:
| Deliverable | Acceptance Criteria |
|---|---|
| Model status page | Shows model version, last training date, AUC, feature importance per instrument |
| Combined signal display | Updated signal viewer shows combined (meta-learner) score |
Exit Criteria: ML model AUC > 0.55 per instrument (or Bayesian-only fallback documented). Meta-learner producing calibrated output. Client model status page functional. All gate checks pass.
Duration: 3-4 weeks.
Server Milestones:
| Deliverable | Acceptance Criteria |
|---|---|
| Signal interpretation + thresholds | Signal → action mapping per instrument per strategy config |
| Broker adapters (Oanda spot + Binance spot) | Both adapters passing integration tests with paper/testnet accounts |
| Order execution (market orders + stops) | Paper orders placed successfully on both brokers with stop-loss |
| Position lifecycle management | Open, modify stop, close working per broker |
| Risk engine (Kelly, circuit breakers, per-strategy config) | All pre-trade and in-trade checks functional; validation rejects unsafe overrides |
| Reconciliation loop | Per-broker reconciliation running every 60s; mismatch detection verified |
| Kill switch | Closes all positions on all brokers within 30 seconds |
| Regime detection | Regime labels computed per instrument using §5.7.3 formula |
| Trading API endpoints | All trading operations available via API |
Client Milestones:
| Deliverable | Acceptance Criteria |
|---|---|
| Trading status panel | Shows open positions, regime, circuit breaker status per instrument |
| Kill switch button | Functional with confirmation dialog |
| Risk configuration UI | View/edit risk parameters per instrument with validation feedback |
Exit Criteria: Complete signal-to-trade pipeline running on paper/testnet. All circuit breakers tested. Reconciliation tested. Kill switch tested. Client trading panel functional. All gate checks pass.
Duration: 2-3 weeks.
Server Milestones:
| Deliverable | Acceptance Criteria |
|---|---|
| Walk-forward framework | Per-instrument backtesting with configurable windows; minimum 4 folds |
| Purged K-fold (robustness check) | Secondary validation available; K=5 with 48h purge |
| Performance metrics calculation | All metrics from §9 computed correctly |
| Pessimistic mode | 2× slippage/spread testing available |
| Regime-aware reporting | Per-regime performance breakdown; low-sample flagging |
| Backtest API endpoints | Run backtest, get results via API |
Client Milestones:
| Deliverable | Acceptance Criteria |
|---|---|
| Backtest runner | Select instrument, strategy, date range, pessimistic mode; submit and track progress |
| Results viewer | Equity curve, trade list, all metrics, calibration plot, regime overlay |
| Trade overlay charts | Candlestick chart with entry/exit markers, stop-loss lines, regime bands; interactive (zoom, pan, tooltip) |
| Backtest comparison | Side-by-side comparison of two runs |
Exit Criteria: Backtest results meet minimum metric bars per instrument. Client backtest UI fully functional. Pessimistic mode available. All gate checks pass.
Duration: 3-4 weeks.
Server Milestones:
| Deliverable | Acceptance Criteria |
|---|---|
| OpenAPI docs auto-generation | Published at /api/v1/docs; schema passes validation |
| In-app help content API | GET /api/v1/docs/{section} returns markdown help content |
| End-to-end integration tests | Full pipeline tested: data → analysis → signal → risk check → order |
Client Milestones:
| Deliverable | Acceptance Criteria |
|---|---|
| Dashboard (full) | Portfolio equity, instrument status, health, alerts, kill switch — per §10.2 |
| Strategy management | Full CRUD, version history, comparison, approval workflow — per §10.3 |
| Live trading monitor | Positions, signals, trade history, reconciliation, circuit breakers — per §10.5 |
| System configuration | Risk params, alerts, regime overrides, trading mode — per §10.6 |
| Reports | Daily/weekly/monthly reports, PDF export — per §10.7 |
| In-app help | Help content per major section — per §10.8 |
Documentation Milestones:
| Deliverable | Acceptance Criteria |
|---|---|
| Developer docs | Architecture, module APIs, setup guide in docs/dev/ |
| Operator docs | Deployment, configuration, monitoring, runbooks in docs/ops/ |
| User docs | UI workflows, strategy guide, FAQ in docs/user/ |
Exit Criteria: Full UI functional per §10. In-app help available. All documentation deliverables complete. All gate checks pass.
Duration: 3 months minimum.
Server Milestones:
| Deliverable | Acceptance Criteria |
|---|---|
| Oanda practice account integration | Real-time paper trading for EUR/USD |
| Binance testnet integration | Real-time paper trading for BTC/USDT spot |
| WebSocket price monitoring | Real-time position management |
| Real-time signal generation | Signals on live candle closes |
| Telegram alerting | Trade alerts flowing |
Client Milestones:
| Deliverable | Acceptance Criteria |
|---|---|
| Real-time dashboard | Live equity curve, positions, signals updating in real-time |
| Paper trading performance tracking | All metrics tracked and displayed live |
Go/No-Go Criteria (per instrument, all hard-gate metrics must be met):
- Sharpe > 0.8 (or per-strategy override).
- Profit Factor > 1.3 (or per-strategy override).
- Max drawdown < 15% (or per-strategy override).
- Backtest-to-paper Sharpe deviation < 20%.
- ≥ 50 trades executed.
- No CRITICAL reconciliation mismatches unresolved.
- No kill-switch activations due to bugs.
- Calibration error < 5pp.
- If criteria not met: Do not go live. Diagnose, fix, reset paper trading timer.
Duration: Ongoing.
Milestones:
| Deliverable | Acceptance Criteria |
|---|---|
| Live broker account setup | Accounts funded, API keys configured with appropriate permissions |
| Micro-sizing phase (first 30 trades per instrument) | 0.5% equity per trade; all trades logged and reconciled |
| Kelly sizing enablement | After 30 trades with positive metrics per instrument |
| Monthly review process | Full analysis against paper baseline; documented |
Instruments may go live independently. If EUR/USD meets criteria but BTC/USD does not, EUR/USD can go live while BTC/USD continues paper trading.
| # | Question | Impact | Owner | Resolution Path |
|---|---|---|---|---|
| OQ-1 | Optimal lookback for ML input sequences per instrument | ML performance | BJ | Hyperparameter search in Stage 3. Start with 24. |
| OQ-2 | Optimal token count per instrument | Signal quality | BJ | Evaluate MI curve in Stage 2. Start with 20. |
| OQ-3 | Should SELL signals open short positions? | Revenue potential | BJ | v1: close longs only. Evaluate in backtest. Decision for v1.1. |
| OQ-4 | Multi-timeframe confirmation weighting | Signal quality | BJ | Deferred to v1.1. |
| OQ-5 | Break-even trade frequency per instrument | Viability | BJ | Calculate in Stage 5. |
| OQ-6 | GPU needed for XGBoost inference? | Deployment | BJ | XGBoost is CPU-only. CNN-LSTM may need GPU. |
| OQ-7 | BTC/USD success criteria adjustments | Go/no-go | BJ | Evaluate during Stage 5 backtesting. Decide before paper trading. Configure via performance_overrides. |
| OQ-8 | Correlation target between EUR/USD and BTC/USD returns | Portfolio construction | BJ | Measure in Stage 5 backtesting. < 0.5 desired. |
| # | Decision | Date | Rationale |
|---|---|---|---|
| D-1 | TimescaleDB as database | Pre-dev | Good fit for time-series data. |
| D-2 | EUR/USD via Oanda (spot) | Pre-dev | Liquidity, spread, data quality. |
| D-3 | BTC/USDT via Binance (spot) | FINAL | Multi-instrument from day one; crypto diversification. Spot for v1 simplicity. |
| D-4 | Token format: {INSTRUMENT}_PREFIX_PARAM_DATAPOINT_TYPE_VALUE |
v3 spec | Instrument-aware, structured, parseable. |
| D-5 | Monolith architecture for v1 | v2 spec | Solo developer, single machine. |
| D-6 | XGBoost before CNN-LSTM | v2 spec | Faster iteration, easier debugging. |
| D-7 | Meta-learner over fixed weights | v2 spec | Calibrated, data-driven. |
| D-8 | Broker-side stops mandatory | v2 spec | Non-negotiable safety. |
| D-9 | Daily loss limit: 2% | v2 spec | Conservative for unproven system. |
| D-10 | Micro-sizing for first 30 trades | v2 spec | Protect capital during learning. |
| D-11 | Feature store (long-format) over fixed columns | v3 spec | Scalable, extensible, multi-instrument native. |
| D-12 | API-first client/server separation | v3 spec | Replaceable UI, testable contracts. |
| D-13 | React SPA over Streamlit | v3 spec | Better interactivity, proper separation. |
| D-14 | Zero-code baseline (corrected project status) | v3 spec | Accurate status. |
| D-15 | Spot-only for v1 (no futures) | FINAL (DL-002) | Simplicity; no funding rates, no margin complexity. Architecture supports future derivatives. |
| D-16 | Spec/docs in spec/ subfolder |
FINAL (DL-001) | Clean organization, versioned artifacts. |
| D-17 | Thin client each stage | FINAL (DL-004) | Progressive delivery; both server and client progress per stage. |
| D-18 | Localhost/SSH-tunnel only for v1 UI | FINAL (DL-008) | Adequate for solo/controlled use. Upgrade path defined. |
| D-19 | Risk params configurable per strategy | FINAL (R-01) | Instrument-specific tuning with safe defaults and validation. |
| D-20 | Performance metrics configurable per strategy | FINAL (R-02) | Different instruments have different performance profiles. |
| D-21 | Feature provider extensibility | FINAL (R-03) | Add indicators without schema changes or disruptive refactor. |
| Decision | Options | Choice | Rationale |
|---|---|---|---|
| Architecture | Microservices vs. Monolith | Monolith (v1) | Single developer, single machine. Extract when needed. |
| ML Model | CNN-LSTM vs. XGBoost vs. Bayesian-only | XGBoost first, CNN-LSTM conditional | Faster iteration, easier debugging, feature importance. |
| Signal Combination | Fixed weights vs. Meta-learner | Meta-learner (logistic regression) | Calibrated output, data-driven weighting. |
| Database | TimescaleDB vs. ClickHouse vs. Parquet | TimescaleDB | Good SQL + time-series. Single-server simplicity. |
| Instruments | EUR/USD only vs. multi-instrument | Multi-instrument from v1 | Forces robust architecture; diversification. |
| Indicator storage | Fixed columns vs. JSONB vs. Feature store | Feature store (long-format) | Scalable, no-migration additions, namespace isolation. |
| Order Type | Market vs. Limit | Market (v1) | Guaranteed fill. Track savings for v2. |
| Stop-Loss | System-side vs. Broker-side | Broker-side | System crash cannot lose the stop. Non-negotiable. |
| Validation | Walk-forward only vs. + Purged K-fold | Both | Walk-forward primary, K-fold for robustness. |
| Client/Server | Tight coupling vs. API-first | API-first with strict separation | Replaceable UI, testable contracts. |
| UI Framework | Streamlit vs. React SPA | React SPA (or equivalent) | Better interactivity, charts, proper separation. |
| BTC venue | Futures vs. Spot | Spot (v1) | Simpler execution, no funding rates, no leverage risk. |
| Client delivery | Deferred until late stages vs. Thin client each stage | Thin client each stage | Progressive delivery, early feedback. |
{
"instruments": ["EUR_USD", "BTC_USD"],
"signal_interval": "1h",
"db_url": "ENV:DATABASE_URL",
"telegram_bot_token": "ENV:TELEGRAM_BOT_TOKEN",
"telegram_chat_id": "ENV:TELEGRAM_CHAT_ID",
"api_version": "v1",
"api_port": 8000,
"log_level": "INFO"
}See §7.2 for full contents.
See §3.4 for full contents.
See §3.4 for full contents.
{
"providers": [
{
"name": "technical",
"class": "newton.data.indicators.TechnicalIndicatorProvider",
"namespace": "technical",
"enabled": true,
"config": {
"indicators": [
{"key": "rsi", "params": {"period": 14}},
{"key": "macd", "params": {"fast": 12, "slow": 26, "signal": 9}},
{"key": "bb", "params": {"period": 20, "std": 2.0}},
{"key": "obv", "params": {}},
{"key": "atr", "params": {"period": 14}}
]
}
}
]
}| ID | Requirement | Acceptance Test | Spec Section |
|---|---|---|---|
| FR-01 | Fetch and store historical OHLCV data for EUR/USD (Oanda) and BTC/USDT (Binance spot) across 1m, 5m, 1h, 4h, 1d | Given a date range per instrument, verify row count matches expected candles ±1% | §4.1 |
| FR-02 | Detect and backfill data gaps automatically per instrument | Insert a known gap; verify system detects and fills it within one pipeline cycle | §4.4 |
| FR-03 | Calculate and store extensible technical indicators (initial: RSI(14), MACD(12,26,9), BB(20,2.0), OBV, ATR(14)) per instrument via feature provider interface | Compare output against a reference library (TA-Lib) for 100 random candles per instrument; max deviation < 0.01% | §4.3 |
| FR-04 | Define events in configuration per instrument and detect them in historical data | For a known price series, verify event labels match hand-calculated expectations | §4.6 |
| FR-05 | Generate tokens from indicator states using configured classification rules per instrument | Given indicator values, verify token output matches expected strings | §5.2 |
| FR-06 | Calculate Bayesian posterior P(Event | Tokens) with calibration | Verify calibration: for predicted probabilities in [0.5, 0.6], observed frequency is 50-60% on held-out data | §5.4 |
| FR-07 | Train and serve ML model (XGBoost or CNN-LSTM) per instrument for event probability | Model achieves out-of-sample AUC-ROC > 0.55 per instrument | §5.5 |
| FR-08 | Combine Bayesian and ML scores via meta-learner to produce calibrated signal per instrument | Calibration plot deviation < 5 pp per decile on out-of-sample data | §5.6 |
| FR-09 | Execute market orders via Oanda (EUR/USD spot) and Binance (BTC/USDT spot) with broker-side stop-loss | Place a paper trade on each broker; verify order and stop appear in respective accounts | §6.1 |
| FR-10 | Manage position lifecycle (open, update stop, close) per instrument | Open position, update trailing stop, close; verify each state in broker account | §7.4 |
| FR-11 | Enforce pre-trade risk checks (Kelly sizing, max exposure) per instrument and portfolio with configurable parameters | Attempt to exceed limits; verify rejection. Verify per-strategy overrides take precedence. | §7.3 |
| FR-12 | Enforce circuit breakers (daily loss, max drawdown) per instrument and portfolio with configurable thresholds | Simulate 2% daily loss; verify system halts new entries | §7.5 |
| FR-13 | Reconcile internal state with broker state every 60 seconds per broker | Introduce a mismatch; verify alert fires within 2 minutes | §6.4 |
| FR-14 | Provide kill switch that closes all positions across all brokers and halts trading | Activate kill switch; verify all positions closed within 30 seconds | §7.5 |
| FR-15 | Detect market regime using deterministic formula and track regime transitions per instrument | On synthetic data transitioning regimes, verify detection, labeling, and confidence scores | §5.7 |
| FR-16 | Provide regime-aware reporting showing active regime and switch events | Verify reports show regime labels, confidence, and transition timestamps | §5.7.6 |
| FR-17 | Expose all server functionality through versioned REST API | API schema validation passes; no direct DB access from client | §3.2 |
| FR-18 | Web UI: strategy management (add, edit, view, compare configurations) with risk/performance override editing | User can perform full CRUD on strategy configs via UI; invalid overrides rejected | §10.3 |
| FR-19 | Web UI: backtest execution and result visualization with trade overlays on charts | Run backtest from UI; verify chart with entry/exit markers renders | §10.4 |
| FR-20 | Web UI: detailed backtest reports with all metrics from §9 | Verify report contains all specified metrics | §10.4 |
| FR-21 | Web UI: system configuration management including per-strategy risk/performance overrides | Modify risk parameters via UI; verify server applies changes and logs audit entry | §10.6 |
| FR-22 | In-app usage documentation accessible from UI | Verify help content loads for each major UI section | §10.8 |
| FR-23 | Feature provider extensibility: new indicators addable without schema migration | Add a test indicator via provider interface; verify features stored and retrievable | §3.6, §4.3 |
| FR-24 | Risk parameter audit logging for all changes | Change a risk parameter; verify entry in config_changes table | §7.1 |
| ID | Requirement | Measurement | Spec Section |
|---|---|---|---|
| NFR-01 | Signal generation latency < 5 seconds from candle close confirmation | Measure p99 latency in paper trading over 1000 candles per instrument | §1.6 |
| NFR-02 | System availability > 99.5% | Percentage of 1-minute intervals where /health returns 200 |
§1.6 |
| NFR-03 | All timestamps in UTC | Code review; no timezone-naive datetime objects allowed; lint rule enforced | §4.5 |
| NFR-04 | Structured JSON logging to stdout | Log format validation in CI | §11.1 |
| NFR-05 | Prometheus-format /metrics endpoint |
Scrape test in CI | §11.2 |
| NFR-06 | Secrets loaded from environment variables only | No secrets in code or config files; CI secret scan | §11.5 |
| NFR-07 | Recovery from crash: system resumes within 60 seconds and reconciles state | Kill process during paper trading; measure recovery time | §3.5 |
| NFR-08 | API response time < 200ms for read endpoints (p95) | Load test in CI | §3.2 |
newton/
├── spec/ # Canonical spec location (DL-001)
│ ├── docs/spec/SPEC_DRAFT.md # This document
│ ├── SPEC.v3.md # Previous version (archived)
│ ├── SPEC.v2.md # Previous version (archived)
│ ├── SPEC_NOTES.md # Iteration notes (archived)
│ ├── SPEC_DECISIONS_LOCK.md # Decision locks (archived)
│ ├── SPEC_REVISIONS.md # Revision instructions (archived)
│ ├── deviations/ # Spec deviation records
│ │ └── DEV-001.md # Example deviation
│ └── decisions/ # Architecture Decision Records
│ └── ADR-001-feature-store.md # Example ADR
├── docs/
│ ├── dev/ # Developer documentation
│ ├── ops/ # Operator documentation
│ └── user/ # User documentation (also served in-app)
├── config/
│ ├── system.json
│ ├── risk.json
│ ├── feature_providers.json
│ ├── instruments/
│ │ ├── EUR_USD.json
│ │ └── BTC_USD.json
│ ├── strategies/
│ │ ├── EUR_USD_strategy.json
│ │ └── BTC_USD_strategy.json
│ └── classifications/
│ ├── EUR_USD_classifications.json
│ └── BTC_USD_classifications.json
├── src/
│ ├── __init__.py
│ ├── app.py # FastAPI entry point
│ ├── api/
│ │ ├── __init__.py
│ │ ├── v1/
│ │ │ ├── __init__.py
│ │ │ ├── data.py
│ │ │ ├── signals.py
│ │ │ ├── trading.py
│ │ │ ├── backtest.py
│ │ │ ├── config.py
│ │ │ ├── regime.py
│ │ │ └── docs.py
│ │ └── schemas.py
│ ├── data/
│ │ ├── __init__.py
│ │ ├── database.py
│ │ ├── fetcher_oanda.py
│ │ ├── fetcher_binance.py
│ │ ├── fetcher_base.py
│ │ ├── indicators.py
│ │ ├── feature_store.py
│ │ ├── feature_provider.py
│ │ ├── pipeline.py
│ │ ├── schema.py
│ │ └── verifier.py
│ ├── analysis/
│ │ ├── __init__.py
│ │ ├── events.py
│ │ ├── tokenizer.py
│ │ ├── token_selection.py
│ │ ├── bayesian.py
│ │ ├── ml_model.py
│ │ └── meta_learner.py
│ ├── trading/
│ │ ├── __init__.py
│ │ ├── signal.py
│ │ ├── risk.py
│ │ ├── executor.py
│ │ ├── broker_base.py
│ │ ├── broker_oanda.py
│ │ ├── broker_binance.py
│ │ ├── reconciler.py
│ │ └── circuit_breaker.py
│ ├── backtest/
│ │ ├── __init__.py
│ │ ├── engine.py
│ │ ├── simulator.py
│ │ ├── metrics.py
│ │ └── report.py
│ └── regime/
│ ├── __init__.py
│ └── detector.py
├── client/ # Web UI (separate build)
│ ├── src/
│ ├── public/
│ ├── package.json
│ └── README.md
├── tests/
│ ├── unit/
│ ├── integration/
│ ├── scenarios/
│ └── fixtures/
├── models/ # Trained model artifacts
├── docker-compose.yml
├── Dockerfile
├── requirements.txt
├── .env.example
└── README.md
Unit Tests:
- Overall: ≥ 80% line coverage.
- Critical modules (risk engine, order execution, reconciliation, circuit breakers): 100% branch coverage.
- Property-based testing (Hypothesis) for risk calculations.
Integration Tests:
- Test complete signal pipeline per instrument.
- Test order lifecycle per broker adapter (Oanda and Binance spot).
- Test reconciliation per broker.
- Test cross-instrument risk checks (portfolio exposure limits).
- Test risk parameter override precedence.
Scenario Tests:
- Multi-instrument: Simultaneous signals on both instruments; verify independent execution.
- Single broker outage: Binance down while Oanda healthy; verify EUR/USD continues, BTC/USD halts.
- Regime transition: Verify system detects regime change, adjusts behavior, logs transition.
- Kill switch multi-broker: Verify all positions on all brokers closed.
- Circuit breaker cascade: Trigger per-instrument and portfolio breakers; verify correct scope.
- Risk override validation: Attempt invalid overrides; verify rejection.
Definition of Done:
- All tests pass.
- Coverage thresholds met.
- API endpoint has OpenAPI schema and passes schema validation in CI.
- UI component has corresponding acceptance test.
- No timezone-naive datetimes.
- Secret scan clean.
To be completed in Stage 5 after initial backtest results.
Estimated monthly costs:
- Server/VM: ~$0 (existing infrastructure)
- Oanda data: $0 (included with account)
- Binance data: $0 (public API)
- Oanda spread cost: ~$X per trade
- Binance spot commission: 0.10% taker per trade
- Infrastructure (electricity, internet): ~$Z/month
Required monthly return to break even per instrument: TBD.
All CRITICAL, HIGH, MEDIUM, and LOW findings from SPEC.v2.md are incorporated in this specification:
| Finding | Status | Section |
|---|---|---|
| CRITICAL-1: Naïve Bayes independence | Addressed — calibration + correlation checks | §5.4 |
| CRITICAL-2: No regime-change detection | Addressed — full regime subsystem with deterministic formula | §5.7 |
| CRITICAL-3: Hybrid score weighting arbitrary | Addressed — meta-learner | §5.6 |
| CRITICAL-4: Stop-loss under-specified | Addressed — broker-side stops, per-broker specifics, spot-specific | §6.1, §7.4 |
| CRITICAL-5: No reconciliation loop | Addressed — per-broker reconciliation | §6.4 |
| CRITICAL-6: Backtest fill model optimistic | Addressed — per-instrument slippage/spread + pessimistic mode (spot) | §6.2 |
| HIGH-1: Kelly inputs circular | Addressed — rolling window + micro-sizing | §7.3 |
| HIGH-2: Signal thresholds arbitrary | Addressed — walk-forward optimization | §5.6 |
| HIGH-3: No data staleness detection | Addressed — watchdog per instrument | §4.4 |
| HIGH-4: NN overfitting guardrails | Addressed — regularization + acceptance criteria | §5.5 |
| HIGH-5: Indicator JSONB performance | Superseded — feature store model | §4.3 |
| HIGH-6: Event success ambiguity | Addressed — precise close-to-close definition | §4.6 |
| HIGH-7: Daily loss threshold inconsistency | Addressed — 2% standardized | §7.5 |
| MEDIUM-1: REST polling latency | Addressed — internal modules (monolith) | §3.1 |
| MEDIUM-2: Redis missing | Addressed — deferred, documented | §3.8 |
| MEDIUM-3: Walk-forward under-specified | Addressed — full specification | §8.1 |
| MEDIUM-4: Token selection unspecified | Addressed — MI ranking + redundancy filter | §5.3 |
| MEDIUM-5: 99.9% uptime unsubstantiated | Addressed — restated as 99.5% with measurement | §1.6, NFR-02 |
| MEDIUM-6: Candle-close sync | Addressed — complete candle confirmation | §4.1 |
| MEDIUM-7: Test coverage insufficient | Addressed — critical paths 100% branch | Appendix E |
| LOW-1: Hypertable typo | Fixed | §4.2 |
| LOW-2: Multi-timeframe deferred | Deferred to v1.1 | §2.2 |
| LOW-3: No cost analysis | Placeholder in Appendix F | Appendix F |
| LOW-4: Docker not specified | In project layout | Appendix D |
| LOW-5: UI no auth | HTTP basic auth | §11.5 |
All decision locks from SPEC_DECISIONS_LOCK.md resolved in this document:
| ID | Decision | Resolution | Section |
|---|---|---|---|
| DL-001 | Canonical spec/docs location | projects/newton/spec/ is canonical |
Appendix D |
| DL-002 | BTC venue/scope | Spot (BTCUSDT), no futures in v1 | §1.3, §6.1 |
| DL-003 | Event catalog | Explicit per-instrument events defined | §4.6 |
| DL-004 | Client progress by stage | Thin client each stage | §15 (all stages) |
| DL-005 | Regime confidence | Deterministic formula with numeric bands | §5.7.3 |
| DL-006 | Backtest realism model | Locked spot assumptions and formulas | §6.2 |
| DL-007 | Strategy approval/rollback | Explicit governance with evidence and triggers | §13.2 |
| DL-008 | Exposure policy | Localhost/SSH-tunnel only for v1 | §11.5 |
| DL-009 | Retention/compression | Policy defined per table with backup cadence | §4.7 |
| DL-010 | Stage exit gates | Explicit checklist with CI and manual checks | §14.3 |
All mandatory revisions from SPEC_REVISIONS.md addressed:
| Requirement | Status | Section |
|---|---|---|
| R-01: Risk management strategy-configurable | ✅ Full precedence model, validation, audit | §7.1 |
| R-02: Performance metrics strategy-configurable | ✅ Default + override + gate classification | §9.1 |
| R-03: Feature/indicator extensibility | ✅ FeatureProvider interface, add-without-refactor process | §3.6, §4.3 |
| R-04: No shorthand references to prior specs | ✅ All content fully inlined | Throughout |
| Output Rule 1: Self-contained SPEC_DRAFT.md | ✅ | This document |
| Output Rule 2: No "unchanged from v2/v3" | ✅ All sections restated completely | Throughout |
| Output Rule 3: Explicit acceptance criteria | ✅ Per requirement and per stage | §15, Appendix B |
| Content corrections: spot-only v1 | ✅ All futures references removed/replaced | Throughout |
| Content corrections: zero-code baseline | ✅ | §1.2 |
| Content corrections: server+client per stage | ✅ | §15 |
End of SPEC_DRAFT.md
This specification is self-contained and implementation-ready. It incorporates all content from SPEC.v3.md, resolves all decision locks from SPEC_DECISIONS_LOCK.md, and implements all mandatory revisions from SPEC_REVISIONS.md. No prior spec versions need to be consulted for implementation.