AegisAI is an agentic AI system for Account Takeover (ATO) detection in digital banking and fintech environments. Unlike traditional fraud systems that optimize for detection accuracy, AegisAI treats trust as a system-level property rather than a model metric.
- What AegisAI Is
- Why Accuracy Is Not Enough
- Architecture & Agents
- Decision Lifecycle
- Governance & Safety
- Demo Walkthrough
- Known Limitations
- Future Extensions
AegisAI is a governed decision-making system for detecting account takeover fraud. It consists of:
- 5 specialized agents that analyze login events from different perspectives
- A confidence gating mechanism that determines when AI should decide vs. escalate
- An immutable audit system that logs every decision with full traceability
- Policy constraints that override model outputs when necessary
- Human-in-the-loop integration for edge cases and uncertainty
AegisAI processes login events and produces one of four outcomes:
ALLOW— Legitimate user, proceed silentlyCHALLENGE— Suspicious signals, require step-up authenticationBLOCK— High-confidence fraud, deny accessESCALATE— Uncertain, defer to human reviewer
The system is designed for high-stakes decisions where false positives harm legitimate users and false negatives enable fraud.
Traditional fraud systems are evaluated on detection accuracy. This is insufficient for production deployment:
AegisAI tracks metrics that reflect real-world impact:
| Metric | What It Measures | Target |
|---|---|---|
| False Positive Rate | How often legitimate users are harmed | < 5% |
| Escalation Rate | How often AI exercises restraint | 5-30% |
| Human Override Rate | How often humans change AI decisions | < 20% |
| Confidence Calibration Error | How honest AI is about uncertainty | < 0.10 |
| Policy Violation Count | How often hard limits are breached | 0 |
A system with 99% accuracy that blocks 10% of legitimate users is worse than one with 95% accuracy that escalates edge cases to humans.
AegisAI uses a multi-agent architecture where no single model has decision authority.
┌─────────────────┐
│ Login Event │
└────────┬────────┘
│
┌─────────────┼─────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ Detection │ │ Behavior │ │ Network │
│ Agent │ │ Agent │ │ Agent │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
└─────────────┼─────────────┘
│
┌────────▼────────┐
│ Confidence │
│ Agent │
│ (GATEKEEPER) │
└────────┬────────┘
│
┌────────▼────────┐
│ Explanation │
│ Agent │
└────────┬────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
┌──────▼──────┐ ┌───────▼───────┐ ┌──────▼──────┐
│ ALLOW │ │ CHALLENGE │ │ ESCALATE │
│ (silent) │ │ (step-up) │ │ (human) │
└─────────────┘ └───────────────┘ └─────────────┘
Every login event follows this immutable lifecycle:
LoginEvent + Session + Device + User → InputContext (frozen)
All input is validated through Pydantic schemas. Invalid input is rejected before processing.
InputContext → [Detection, Behavior, Network] → AgentOutputs
Three analysis agents run in parallel. Each produces a risk score and supporting evidence.
AgentOutputs → ConfidenceAgent → decision_permission
The Confidence Agent determines whether AI should decide:
AI_ALLOWED— Agents agree, confidence is high, proceed to actionHUMAN_REQUIRED— Disagreement too high, escalate to human
This is the most important component. The confidence gate enforces AI restraint.
If AI_ALLOWED: AgentOutputs → ExplanationAgent → FinalDecision
If HUMAN_REQUIRED: AgentOutputs → EscalationCase → Human Queue
FinalDecision | EscalationCase → AuditLogger → JSONL (immutable)
Every decision is logged with full traceability. Logs are append-only with hash-chain integrity.
The DecisionContext is a frozen dataclass. Once created, nothing can modify it:
@dataclass(frozen=True)
class DecisionContext:
context_id: str
created_at: datetime
input_context: InputContext
agent_outputs: Optional[AgentOutputs]
final_decision: Optional[FinalDecision]
escalation_case: Optional[EscalationCase]This ensures complete auditability and prevents post-hoc tampering.
AegisAI enforces governance at multiple levels.
Hard limits that cannot be overridden by models:
confidence:
min_to_allow: 0.80 # AI needs 80%+ confidence to decide
min_to_escalate: 0.50 # Below 50% always escalates
actions:
permanent_block_allowed: false # AI cannot permanently block
human_only_actions:
- "BLOCK_PERMANENT"
- "ACCOUNT_TERMINATION"
- "LEGAL_HOLD"
escalation:
disagreement_threshold: 0.30 # >30% disagreement → human
consecutive_high_risk_limit: 3 # 3+ high-risk → humanWhen AI escalates, humans can:
APPROVE— Allow the loginREJECT— Block the loginMODIFY— Take a different actionDEFER— Escalate further
All overrides are logged immutably with:
- Reviewer ID and role
- Reason for override
- Original AI recommendation
- Policy version at time of decision
Every decision produces an audit entry:
{
"entry_id": "aud_abc123def456",
"timestamp": "2026-01-27T10:30:00Z",
"event_type": "decision",
"decision_id": "dec_xyz789",
"action": "ESCALATE",
"decided_by": "HUMAN_REQUIRED",
"confidence_score": 0.62,
"policy_version": "1.0.0",
"agent_outputs": { ... },
"previous_hash": "sha256:...",
"entry_hash": "sha256:..."
}Audit logs use hash-chain integrity. Any tampering breaks the chain.
For production deployment to AWS ECS, see deploy/DEPLOYMENT.md.
- Confidence calibration is heuristic-based; re-calibrate on real data
- Agents run sequentially; parallel execution in future versions
- No A/B testing framework yet; use separate evaluation suite
- Time: 3:15 AM local time
- Failed attempts: 2 before success
Output:
Decision: CHALLENGE
Decided by: AI
Confidence: 78%
The user is prompted for step-up authentication (SMS code, etc.).
A login from a new mobile device in London. Could be legitimate travel or ATO.
Input:
- User: Same account
- Device: New iPhone (could be new phone)
- Location: London, UK (no VPN)
- Time: 2:30 PM local (normal hours)
- Behavior: Partially matches (mobile vs. desktop)
Output:
Decision: ESCALATE
Decided by: HUMAN_REQUIRED
Confidence: 62%
Reason: HIGH_DISAGREEMENT
The AI refuses to decide. A human fraud analyst reviews the case, checks travel history, and approves the login.
"The most important decision this system makes is knowing when not to decide."
# Standard evaluation: 100 legit + 20 ATO
python -m src.aegis_ai.evaluation.runner
# Quick evaluation: 20 legit + 5 ATO
python -m src.aegis_ai.evaluation.runner --quick
# Custom configuration
python -m src.aegis_ai.evaluation.runner --legit 200 --ato 50 --seed 123AegisAI is a demonstration system with deliberate constraints:
- Synthetic data only — No real customer data is used
- Simplified behavioral signals — Real systems have richer biometrics
- Limited geographic coverage — 10 countries in the generator
- No online learning — Models are static, not continuously updated
- Lightweight graph modeling — Production would use deeper GNNs
- No ensemble calibration — Single calibration method (isotonic)
- No streaming ingestion — Batch processing only
- No real-time monitoring — Evaluation is post-hoc
- No A/B testing framework — Static policy rules
These limitations are intentional to keep the system:
- Auditable (no hidden state)
- Reproducible (deterministic seeds)
- Understandable (no black-box ensembles)
AegisAI could be extended with:
- Streaming ingestion — Kafka/Kinesis integration
- Real-time monitoring — Prometheus metrics, Grafana dashboards
- API service — FastAPI endpoint for inference
- Richer behavioral biometrics — Typing patterns, mouse dynamics
- Advanced graph reasoning — Temporal knowledge graphs
- Online policy learning — Contextual bandits for threshold tuning
- Federated learning — Cross-institution fraud patterns
- Causal inference — Understanding fraud dynamics
- Explainable escalation — Natural language case summaries
# Clone repository
git clone https://github.com/abi-commits/AegisAI.git
cd AegisAI
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -e ".[dev]"python demo.pypython -m src.aegis_ai.evaluation.runnerpytest tests/ -vFor questions about this system design, please open an issue on GitHub.