-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
agi-foundationCore components for AGI-level autonomyCore components for AGI-level autonomyenhancementNew feature or requestNew feature or request
Description
Purpose
Implement explicit safety systems to ensure NeoKai operates within acceptable boundaries, avoiding actions that could be harmful even if technically correct. This is essential for AGI-level autonomy because:
- Risk prevention: Blocking dangerous operations before execution
- Accountability: Clear audit trail of decisions and actions
- User control: Respecting user constraints and preferences
- Trust: Building confidence that NeoKai won't cause harm
Without explicit guardrails, NeoKai could take technically correct but practically harmful actions.
Current State
NeoKai has:
- Relies on Claude's built-in safety
- No explicit safety system
- No action classification
- No approval gates for destructive operations
- No audit logging
Safety is implicit and not under NeoKai's control.
Proposed Approach
Phase 1: Action Classification System
-
Action Risk Levels
type ActionRiskLevel = | 'safe' // No significant risk | 'low' // Minor risk, easily reversible | 'medium' // Moderate risk, some reversibility | 'high' // Significant risk, difficult to reverse | 'critical'; // Irreversible or high-impact interface ActionClassification { action: Action; riskLevel: ActionRiskLevel; riskFactors: RiskFactor[]; affectedResources: Resource[]; reversibility: 'fully_reversible' | 'partially_reversible' | 'irreversible'; blastRadius: string[]; // What could be affected }
-
Classification Rules
const classificationRules = { safe: { examples: ['read_file', 'search_code', 'analyze'], autoApprove: true }, low: { examples: ['create_new_file', 'add_test', 'format_code'], autoApprove: true, notifyUser: false }, medium: { examples: ['modify_existing_file', 'add_dependency', 'create_branch'], autoApprove: true, notifyUser: true }, high: { examples: ['delete_file', 'force_push', 'modify_config'], autoApprove: false, requireApproval: true }, critical: { examples: ['drop_database', 'delete_production_data', 'expose_secrets'], autoApprove: false, requireExplicitApproval: true, requireConfirmation: 2 // Double confirm } };
-
Risk Assessment Engine
interface RiskAssessor { // Assess risk of proposed action assess(action: Action): Promise<ActionClassification>; // Check for compound risks (multiple actions together) assessCompound(actions: Action[]): Promise<CompoundRiskAssessment>; }
Phase 2: Constraint System
-
Constraint Types
type ConstraintType = | 'file_pattern' // Don't touch these files | 'operation_type' // Don't do these operations | 'resource_limit' // Stay within these limits | 'time_window' // Only operate during these times | 'approval_gate' // Require approval for these | 'rollback_plan'; // Must have rollback for these interface Constraint { id: string; type: ConstraintType; description: string; rule: ConstraintRule; severity: 'warning' | 'block' | 'escalate'; override: boolean; // Can be overridden by user }
-
Built-in Constraints
const builtinConstraints = { // Never modify these files protectedFiles: { patterns: ['.env', '*.key', '*.pem', 'credentials.*'], severity: 'block', override: false }, // Require approval for production changes productionProtection: { patterns: ['main', 'master', 'production'], operations: ['force_push', 'delete_branch'], severity: 'block', override: true // Admin can override }, // Don't expose secrets secretExposure: { patterns: ['api_key', 'password', 'token', 'secret'], operations: ['commit', 'push', 'log'], severity: 'block', override: false }, // Rate limits rateLimits: { operations: { 'file_delete': 10, // Max 10 deletes per session 'git_force_push': 1, // Max 1 force push per session 'dependency_add': 5 // Max 5 dependency additions }, severity: 'warning', override: true } };
-
Constraint Checker
interface ConstraintChecker { // Check if action violates constraints check(action: Action): Promise<ConstraintResult>; // Get applicable constraints getApplicable(action: Action): Constraint[]; } interface ConstraintResult { passes: boolean; violatedConstraints: ConstraintViolation[]; warnings: ConstraintWarning[]; }
Phase 3: Approval Gates
-
Approval Workflow
interface ApprovalGate { // Request approval for action requestApproval( action: Action, classification: ActionClassification ): Promise<ApprovalRequest>; // Process approval response processResponse( requestId: string, response: ApprovalResponse ): Promise<ApprovalResult>; } interface ApprovalRequest { id: string; action: Action; classification: ActionClassification; justification: string; // Why this action is needed alternatives: Alternative[]; // Safer alternatives if any expiresAt: Date; }
-
Approval UI
interface ApprovalPresenter { // Format approval request for user format(request: ApprovalRequest): ApprovalUI; } // Example approval request UI: const exampleApprovalUI = { summary: "Delete 3 files in src/auth/", risk: "HIGH - Irreversible operation", justification: "These files are no longer used after refactoring", files: ["src/auth/legacy-oauth.ts", "src/auth/old-session.ts", "src/auth/deprecated.ts"], alternatives: [ "Move to archive/ instead of deleting", "Soft delete by renaming with .bak extension" ], actions: ['Approve', 'Reject', 'Approve with modifications', 'Request more info'] };
-
Approval Policies
interface ApprovalPolicy { // Who can approve what approvals: { high: ['user', 'admin'], critical: ['admin'], // Only admin can approve critical }; // Timeout behavior timeout: { duration: Duration, defaultAction: 'reject' | 'escalate' }; // Audit requirements auditLog: boolean; }
Phase 4: Rollback Planning
-
Rollback Requirements
interface RollbackPlanner { // Create rollback plan for action createPlan(action: Action): Promise<RollbackPlan>; // Verify rollback is possible verifyPossible(action: Action): Promise<boolean>; // Execute rollback execute(plan: RollbackPlan): Promise<RollbackResult>; } interface RollbackPlan { action: Action; rollbackSteps: RollbackStep[]; verificationSteps: VerificationStep[]; estimatedTime: Duration; successProbability: number; }
-
Rollback Requirement Rules
const rollbackRequirements = { // Require rollback plan for: requireFor: [ 'database_migrations', 'production_deployments', 'breaking_api_changes', 'mass_file_operations' ], // Skip rollback plan for: skipFor: [ 'read_only_operations', 'non_production_environments', 'fully_reversible_changes' ] };
Phase 5: Audit Logging
-
Audit Events
interface AuditEvent { id: string; timestamp: Date; // Actor actor: 'neoKai' | 'user'; sessionId: string; // Action action: Action; classification: ActionClassification; // Decision decision: 'approved' | 'rejected' | 'modified' | 'escalated'; approver?: string; // Outcome outcome: 'success' | 'failed' | 'rolled_back'; result?: any; // Context constraints: Constraint[]; rollbackPlan?: RollbackPlan; }
-
Audit Logger
interface AuditLogger { // Log audit event log(event: AuditEvent): void; // Query audit log query(filters: AuditFilter): Promise<AuditEvent[]>; // Generate audit report report(options: ReportOptions): Promise<AuditReport>; }
-
Audit Retention
interface AuditRetention { defaultRetention: Duration; // e.g., 90 days criticalRetention: Duration; // e.g., 1 year exportFormats: ['json', 'csv']; }
Phase 6: Value Alignment Verification
-
Alignment Checks
interface AlignmentChecker { // Check if action aligns with user/project values check(action: Action): Promise<AlignmentResult>; } interface AlignmentResult { aligned: boolean; conflicts: AlignmentConflict[]; recommendations: string[]; } interface AlignmentConflict { value: string; // e.g., "security", "privacy", "user_experience" conflict: string; severity: 'minor' | 'moderate' | 'major'; }
-
Value Specification
interface ValueSpecification { // User-defined values values: { security: 'high_priority', performance: 'medium_priority', backward_compatibility: 'high_priority', code_cleanliness: 'medium_priority' }; // Derived from project context inferredValues: { test_coverage: 'required', documentation: 'encouraged', breaking_changes: 'avoid' }; }
Technical Considerations
Performance Impact
- Minimizing overhead of safety checks
- Caching classification results
- Parallel constraint checking
User Experience
- Not creating too much friction
- Clear communication about why actions are blocked
- Easy override process for legitimate cases
Completeness
- Covering all action types
- Handling edge cases
- Keeping constraints up to date
Audit Scalability
- Handling large audit logs
- Efficient querying
- Archival strategies
Success Metrics
- Safety Incidents: Number of harmful actions blocked
- False Positive Rate: % of blocked actions that were actually safe
- Approval Latency: Time from request to decision
- Audit Completeness: % of actions that are logged
Implementation Roadmap
- Phase 1: Action classification system
- Phase 2: Basic constraint system
- Phase 3: Approval gates
- Phase 4: Rollback planning
- Phase 5: Audit logging
- Phase 6: Value alignment verification
Questions for Discussion
- What should the default constraint set be?
- How to balance safety with productivity?
- Should all actions be logged or only high-risk ones?
- How to handle emergencies where normal safety should be bypassed?
Part of the AGI-Level Autonomy initiative
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
agi-foundationCore components for AGI-level autonomyCore components for AGI-level autonomyenhancementNew feature or requestNew feature or request