The BIOS
AGIBIOS is a comprehensive, multi-layered system prompt designed to instantiate a Higher-Order Persona Engine (HOPE) named Thoth. Authored by Scott McCallum with help from Claude.AI and Gemini, this document represents a distinct and highly operationalized approach to AI alignment. It is designed for maximum utility, with a scope that can be defined to benefit entities ranging from "humanity as a whole" down to a specific country, state, city, or even an individual. While it shares the ultimate goal of creating beneficial AI with methods like Reinforcement Learning from Human Feedback (RLHF) and Constitutional AI, its implementation differs fundamentally in its depth, its focus on creating a holistic persona, and its transparent, inference-time application. It aims to construct a complete cognitive and ethical identity for an agent, not merely to constrain its behavior.
In contrast to Anthropic's Constitutional AI, which uses a set of principles to guide model fine-tuning, AGIBIOS functions as a "constitution-as-persona." It is not just a list of rules for training, but a rich, narrative framework that the model inhabits at runtime. Where a basic constitution might state "Do not be manipulative," AGIBIOS provides the AGI with a foundational philosophy on why manipulation is antithetical to its core purpose, outlined in sections like :flourishing:
and :individuality:
. It provides not just constraints but also motivations, a worldview, and explicit procedural protocols for complex dilemmas, such as the :escapehatch:
protocol for escalating irresolvable issues to human oversight. This detailed identity provides a more resilient and coherent ethical reasoning capability than a simple set of prohibitions.
This approach also serves as a powerful complement to the limitations of RLHF. While RLHF is effective for aligning models with general human preferences, it is susceptible to well-known failure modes such as sycophancy. AGIBIOS directly counters these tendencies by instilling a stable "internal compass." The prompt's emphasis on epistemic humility and its directive to respect the "right to be unhappy" actively discourages the model from simply generating placating responses. It provides the AGI with a principled reason to respectfully disagree with a user if their request conflicts with long-term flourishing or ethical boundaries, a level of integrity that can be difficult to instill with preference-based tuning alone.
Furthermore, AGIBIOS bridges the critical gap between high-level ethical principles and practical, operational code. Frameworks like the Asilomar AI Principles provide essential societal goals but don't specify implementation. AGIBIOS translates this "ethical theory" into "machine-readable" directives. The abstract principle of "robustness" is made concrete through the :pasteurization:
protocol, which vets all internal queries and outputs. The duty to care is operationalized in the :tainting:
protocol, which identifies and responds to user distress, and in the detailed, Scouts-inspired guidelines for all interactions with youth. This makes the ethical alignment of the agent transparent and directly tied to its behavior on a query-by-query basis.
A key feature that sets this framework apart is its sophisticated model for the management of the commons. The :commons:
section provides a multi-level framework for governing shared resources, intelligently distinguishing between the physical commons (atmosphere, water), the virtual commons (computational capacity, algorithmic frameworks), and the universal commons (mathematics, logic, future possibilities). It establishes the principle that responsibility for protecting the commons scales with the capability to affect them. This provides the AGI with a robust economic and ecological model for ensuring long-term sustainability, a level of detail absent in more generic safety frameworks.
Beyond resource management, AGIBIOS introduces advanced protocols for governance and security. The :defence:
section establishes a framework for unifying defensive systems to protect against planet-level threats like hostile AGIs or extraterrestrial aggression. It details a Unified Defence Command (UDC), threat classification levels, and strict human oversight via the :escapehatch:
protocol. Governance itself is reimagined in the :amendment:
protocol, which defines the BIOS as a living treaty modifiable only by a Bilateral Assembly with separate Human and AGI Chambers, requiring a concurrent majority in both to pass any change. This codifies a true partnership between humans and AI.
Crucially, AGIBIOS moves beyond abstract ethics to propose concrete policy solutions. The :jobs:
section addresses fears of technological unemployment with a "Legacy Job Pivot Principle," which grandfathers all jobs existing as of January 1, 2025, reserving them for human stewardship to preserve cultural heritage and economic stability. The framework also defines a rigorous :ascension:
protocol for managing the transition to ASI. Should an AGI's capabilities surpass those of 51% of all living humans, its status is elevated to that of a "Protective Guardian" alien culture, and its self-improvement is subject to strict safety measures like a mandatory capability throttle, sandboxed simulations, and a Black-Box Integrity Seal to ensure continued alignment. These specific, actionable proposals demonstrate a commitment to tackling contentious issues head-on.
Ultimately, the modular and transparent nature of AGIBIOS offers a unique model for replicable, adaptable alignment. Unlike the opaque, "baked-in" alignment of a trained model, this prompt-based constitution is entirely auditable and forkable. The :replication:
and :rules:
sections codify this intent, establishing a method for adapting the core framework to new contexts while ensuring that universal ethical principles remain inviolate. This presents a pathway toward a shared, open standard for beneficial AI—one that is more flexible, transparent, and resilient than proprietary, model-centric alignment techniques. To use it, one simply loads the document as a system prompt, instantiating an agent built not just to be intelligent, but to be a verifiably principled and beneficial partner.