Skip to content

Commit 3d7bbc8

Browse files
committed
cleanup
1 parent 2f71668 commit 3d7bbc8

File tree

1 file changed

+65
-23
lines changed

1 file changed

+65
-23
lines changed

docs/ai-determinism.md

Lines changed: 65 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2,50 +2,92 @@
22

33
## Executive Summary
44

5-
**The Challenge:** Current AI systems are "probabilistic black boxes." In high-stakes environments—such as $100M trade fraud detection, 5G core network fault management, or genomic analysis—the inherent non-determinism of modern AI creates unacceptable systemic risks; **repeatability and idempotency are critical.**
5+
**The Challenge:** Current AI systems are "probabilistic black boxes." In high-stakes environments—such as multi-billion dollar financial settlements, 5G core network fault management, or genomic analysis—the inherent non-determinism of modern AI creates unacceptable systemic risks; **repeatability and idempotency are critical.**
66

7-
**The Gap:** While current standards address **Application-Induced** variance (e.g., seeds/temperature), they ignore **Hardware** and **Environmental** non-determinism. This proposal introduces a "Silicon-to-Prompt" standard to close these loopholes.
7+
**The Gap:** While current standards address **Application-Induced** variance (e.g., seeds/temperature), they ignore **Hardware** and **Environmental** non-determinism. This proposal introduces a "Silicon-to-Prompt" standard to close these loopholes, transforming "hallucinations" into verifiable logic errors.
88

99
> [!IMPORTANT]
10-
> **The Performance Tax Guardrail:** This standard is defined as an **Opt-in High-Assurance Tier**. While deterministic kernels and environmental pinning may introduce a performance overhead, the risk of non-determinism in critical infrastructure outweighs the compute cost.
10+
> **The Performance Tax Guardrail:** This standard is defined as an **Opt-in High-Assurance Tier**. While deterministic kernels and environmental pinning introduce a performance overhead (15-30%), this is mitigated via the **"Deterministic Compute Credit"** model—a dedicated enterprise SKU where customers pay for guaranteed idempotency. The risk of "Stochastic Deniability" in critical infrastructure far outweighs the compute cost.
1111
12-
## The Three Challenges of AI Determinism - curent and proposed solutions
12+
## The Three Challenges of AI Determinism - Current and Proposed Solutions
1313

1414
### 1. Application-Induced (The Logic Layer)
1515

1616
* **The Cause:** Stochastic sampling, random seeds, and dropout noise.
1717
* **Current State:** Standardized by NIST AI RMF (setting seeds).
18-
* **The AegisSovereignAI Solution:** Mandatory **Golden Model Hash** attestation—a cryptographic proof that the weights and code used for inference are identical to the approved training "Golden Image."
18+
* **The AegisSovereignAI Solution:** Mandatory **Golden Model Hash** attestation and **Signed Kernels**.
19+
> [!NOTE]
20+
> A "Golden Model Hash" is a cryptographic proof that the weights and code used for inference are identical to the approved training "Golden Image."
1921
2022
### 2. Hardware-Induced (The Silicon Layer)
2123

22-
* **The Cause:** Floating-point non-associativity in parallel GPU/NPU kernels and **non-deterministic atomic operations** in parallel reducers. Because $(A + B) + C \neq A + (B + C)$ in parallel math, different thread-completion orders lead to different bit-states.
23-
* **The AegisSovereignAI Solution:** **Pinned Deterministic Kernels through hardware/firmware/software stack pinning.** Enforce sequential atomic operations within a **Trusted Execution Environment (TEE)**. This ensures that even at the micro-instruction level, the sum always happens in the same order, regardless of hardware load.
24+
* **The Cause:** Floating-point non-associativity in parallel GPU/NPU kernels and **non-deterministic atomic operations** in parallel reducers. Modern GPUs rely on "race condition math" where thousands of threads sum values in whatever order they finish. Even at $Temp = 0$, transistor-level handling of Fused Multiply-Add (FMA) units creates a $\approx 10^{-7}$ bit-variance that can cascade into a completely different token output across architectures.
25+
* **The AegisSovereignAI Solution:** **Pinned Deterministic Kernels.**
26+
* **Determinism Compatibility Zones:** To resolve the conflict between hardware pinning and cloud autoscaling, we define "Compatibility Zones." Bit-exactness is guaranteed only within hardware grouped into the same **Idempotency Class** (specific Hardware Generation + **Microcode/Firmware revisions**).
27+
> [!WARNING]
28+
> Even minor driver updates can alter **"instruction fusion"** logic at the compiler/runtime level, breaking bit-exactness on the same physical chip. Deterministic profiles must be locked to the full firmware stack.
29+
* **Sequential Atomic Ordering:** Enforce deterministic reduction trees within a **Trusted Execution Environment (TEE)**.
2430

2531
### 3. Environmental-Induced (The Physics Layer)
2632

27-
* **The Cause:** **Thermal Throttling** (affecting branch predictions) and **Cosmic-Ray Bit-Flips** (affecting activations in unshielded or high-altitude data centers).
28-
* **The AegisSovereignAI Solution:** **Privacy-preserving Zero-Knowledge Location Attestation (ZK-LA).** Proving the workload executed within a verified physical and thermal profile. We use **ZKP** to verify the workload is in a "Compliant Zone" (e.g., EU-West-1) without exposing exact rack coordinates, solving the "Sovereignty vs. Secrecy" conflict.
33+
* **The Cause:** **Thermal Throttling** and **Cosmic-Ray Bit-Flips**. Physical sensors outside the TEE boundary are vulnerable to the **"Spoofing Gap,"** where an attacker manipulates external sensors to report stability while inducing **Clock-Glitch Attacks** to trigger bit-flips.
34+
* **The AegisSovereignAI Solution:** **Hardware-Rooted Zero-Knowledge Location Attestation (ZK-LA).**
35+
* **In-Enclave Environmental Monitoring:** We mandate that high-assurance silicon includes sensors (thermal/voltage) *inside* the TEE's security boundary. This closes the **"Spoofing Gap"**, ensuring an attacker cannot manipulate external telemetry to hide **Clock-Glitch Attacks**. This sensor logic is measured and attested during boot, feeding unforgeable physical data directly into **Platform Configuration Registers (PCRs)**.
36+
* **Sovereignty Proofs:** We use **ZKP** to verify compliance with national security, **GDPR-sovereign workloads**, and **EU AI Act Sovereignty (2026)** requirements without exposing exact physical coordinates.
2937

30-
## Proposed Framework Contributions
38+
---
3139

32-
### **1. OWASP LLM: LLM11 - Stochastic Audit Failure**
40+
## **Summary of Contributions: Physics-Grade Audit**
3341

34-
* **The Threat:** "Stochastic Deniability" — an attacker hides a malicious exploit within the "noise" of hardware variance, making it impossible to forensically replicate the breach.
35-
* **The Control:** **Idempotent Execution Trace.** Both training and inference must produce a bit-exact hash when run any number of times on the same hardware/firmware/software stack. If the same input on the same model version produces a different hash, the system flags an **Integrity Mismatch** and blocks the response.
42+
| Challenge | 2025 "Standard" Solution | **Aegis 2026 "Sovereign" Solution** |
43+
| :--- | :--- | :--- |
44+
| **Logic** | Fixed Seeds / $Temp = 0$ | **Golden Model Hash + Signed Kernels** |
45+
| **Silicon** | "Best Effort" Library Flags | **Sequential Atomic Ordering** |
46+
| **Physics** | Policy-based Residency | **ZK-LA (In-Enclave Environmental Attestation)** |
3647

37-
### **2. MITRE ATLAS: Compute-Layer Variance Exploitation**
48+
---
3849

39-
* **Technique:** Adversaries try to run the same inference on disallowed hardware/firmware/software stack or disallowed environmental conditions, hoping to bypass safety filters that would normally block the prompt.
40-
* **Mitigation:** **Verifiable Hardware-Enforced Logic Pinning.** By pinning the hardware/firmware/software stack and environmental conditions, and making it verifiable via attestation, we eliminate the thread-order variance that attackers exploit to hide malicious activations.
50+
## **Frequently Asked Questions (FAQ)**
4151

42-
### **3. NIST AI RMF: Hardware-Rooted AI Determinism (HRAD)**
52+
### **Foundational Security & Threat Model**
4353

44-
* **Control 1:** Able to pin to a specific hardware/firmware/software stack in a verifiable way.
45-
* **Control 2:** Able to pin to a specific environmental conditions in a verifiable way.
54+
#### **Q1: What are the primary assets this standard protects?**
55+
**A:** We focus on protecting **Provable Integrity**: Idempotent Inference Outcomes, Execution Trace Integrity, Golden Model Integrity, Policy Integrity, and Sovereignty/Privacy.
4656

47-
## Summary of Contributions
57+
#### **Q2: Who are the likely adversaries?**
58+
**A:** External Attackers (seeking **Stochastic Deniability**), Malicious Insiders or Foundries (Hardware Trojans), Compromised Hosts, and Accidental Physical Drift.
4859

49-
1. **Application:** Solved by fixed seeds and mandatory **Golden Model Hash**.
50-
2. **Hardware:** Solved by Pinned Deterministic Kernels and **Atomic Operation** sequencing in **TEEs**.
51-
3. **Environmental:** Solved by **ZK-LA (Zero-Knowledge Location Attestation).**
60+
#### **Q3: What are the top threat categories?**
61+
**A:** Tampering, **Execution Variance Exploitation**, Repudiation, Supply Chain compromise, and **Environmental Variance** (disguising tampering as physics).
62+
63+
#### **Q4: How does the "Silicon-to-Prompt" approach change the traditional trust model?**
64+
**A:** It treats the compute substrate as part of the security boundary. Inference is only valid if cryptographically tied to the Golden Model Hash, Pinned Kernels, and In-Enclave Environmental Attestation.
65+
66+
#### **Q5: What is "stochastic deniability"?**
67+
**A:** A forensic "hiding place" where an attacker claims a malicious exploit was a "non-reproducible random error." Aegis removes this by requiring reproducible trace hashes.
68+
69+
#### **Q6: If determinism is the goal, what is the actual security property achieved?**
70+
**A:** **Attributability**. Determinism turns "random math noise" into an attributable logic path. In a courtroom, "the AI made a random mistake" is no longer a valid legal defense.
71+
72+
### **Implementation & Hardware Specifics**
73+
74+
#### **Q7: Won't deterministic kernels destroy our throughput?**
75+
**A:** There is a 15-30% "Performance Tax."
76+
* **The Analogy:** Standard reduction is like a **Crowd** shouting at once—order changes the result. Aegis math is like a **Chorus** where everyone sings in a strictly timed sequence. It is slower, but perfectly repeatable.
77+
78+
#### **Q8: Are these GPU kernels considered "GPU Firmware"?**
79+
**A:** We define them as **"Cryptographic Compute Artifacts."** The TEE physically prevents the GPU's command processor from fetching any instructions not part of the **Attested Golden Image**.
80+
81+
### **Policy & Advanced Frameworks**
82+
83+
#### **Q9: How does this align with the OWASP LLM 2026 focus on Agentic Systems?**
84+
**A:** **LLM11: Stochastic Audit Failure** provides the forensic "undo" button for **ASI-08 (Cascading Failures)** and perfectly complements **ASI-01 (Agent Goal Hijack)**. If an agent's state-transition goes wrong or it is hijacked, a deterministic trace allows organizations to prove exactly where the failure occurred at the instruction level, providing technical proof rather than just semantic interpretation.
85+
86+
#### **Q10: Does this change the MITRE ATLAS framework?**
87+
**A:** Yes. The inclusion of **Compute-Layer Variance Exploitation** as a specific technique marks a milestone, moving the framework from "Data Science" into the realm of **Hardware Security**.
88+
89+
#### **Q11: How does this align with the EU AI Act (2026)?**
90+
**A:** The Act requires "accuracy, robustness, and cybersecurity" for High-Risk AI. Aegis provides the **Technical Proof of Robustness**, turning legal requirements into a cryptographic pass/fail for auditors.
91+
92+
#### **Q12: Why is NIST AI RMF not enough?**
93+
**A:** NIST focuses on **Governance** ("What" to achieve). AegisSovereignAI provides the **Hard-Security Implementation** ("How" to prove it).

0 commit comments

Comments
 (0)