Skip to content

Add CLI agent support (Codex, Gemini) via unified provider system #20

@avivsinai

Description

@avivsinai

Summary

Add support for CLI-based coding agents (OpenAI Codex, future Gemini CLI) while maintaining 100% backwards compatibility. Agents will be treated as additional model providers alongside existing API providers.

Design Overview

Core Principle: Agents are just another provider type that happens to execute via CLI instead of HTTP. Users interact with them using the same --model flag.

Example Usage (Backwards Compatible)

# Existing commands work unchanged
promptcode expert "Question" --model gpt-5        # OpenAI API (unchanged)
promptcode expert "Question"                      # Default (still gpt-5)

# New capability (only if agent configured)
promptcode expert "Question" --model codex        # Codex CLI
promptcode expert "Question" --model codex-full-auto  # Codex with full autonomy

Implementation Plan

Phase 1: Provider Registry Foundation (No User-Visible Changes)

Goal: Refactor existing providers to unified interface without breaking anything

Tasks

  • Create packages/cli/src/providers/provider-registry.ts
    • Define Provider interface with type: 'api' | 'cli'
    • Add check(), execute(), estimateCost() methods
    • Implement ProviderRegistry class with model-to-provider mapping
  • Refactor existing API providers to implement unified interface
    • Update OpenAIProvider, AnthropicProvider, GoogleProvider, XAIProvider
    • Keep existing AIProvider intact for backwards compatibility
  • Update expert.ts to use registry pattern
    • Replace direct provider instantiation with registry lookup
    • Ensure all existing tests pass unchanged

Phase 2: CLI Provider Implementation

Goal: Add CLI agent support that only activates when configured

Tasks

  • Create packages/cli/src/providers/cli-provider.ts
    • Base CLIProvider class with process management
    • Environment variable scrubbing (allowlist only PATH, HOME, USER, TMPDIR, LANG)
    • Process spawning with shell: false for security
    • Timeout handling (default 5 minutes)
    • Output size limits (default 10MB)
    • Proper cleanup on exit/error
  • Create packages/cli/src/providers/agents/codex-provider.ts
    • Extend CLIProvider for Codex-specific logic
    • Handle codex and codex-full-auto model variants
    • Map config args to CLI flags
    • Context delivery via stdin or temp files
  • Add configuration loading
    • Extend TOML config schema for CLI providers
    • Add Zod validation for configuration safety
    • Support provider-specific environment variables

Configuration Schema

[providers.codex]
type = "cli"
binary = "codex"  # or full path
auth = "chatgpt"  # or "apikey"
args = ["--sandbox", "workspace-write"]
models = ["codex", "codex-full-auto"]
timeout = 300000  # 5 minutes in ms
maxOutputBytes = 10485760  # 10MB

[providers.codex.env]
# Explicitly allowed environment variables
CODEX_AUTH = "chatgpt"

Phase 3: Status Command & Discovery

Goal: Add unified status command showing all providers

Tasks

  • Create promptcode status command
    • Show all configured API providers and their status
    • Show detected CLI agents and versions
    • List all available models with their providers
    • Display actionable setup instructions for missing providers
  • Add auto-detection logic
    • Check PATH for known agent binaries
    • Cache detection results for 5 minutes
    • Show one-time prompt when new agent detected
  • Improve error messages
    • Clear message when model not available
    • Suggest promptcode status for troubleshooting

Example Output

PromptCode Status
═════════════════

API Providers:
  ✓ OpenAI       (configured, 12 models)
  ✓ Anthropic    (configured, 3 models)
  ✗ Google       (no API key - set GOOGLE_API_KEY)

CLI Agents:
  ✓ Codex        (v0.20.0, auth: chatgpt)
  ✗ Gemini CLI   (not found - install with: npm i -g @google/gemini-cli)

Available Models:
  gpt-5 (default), gpt-5-mini, o3, o3-pro
  opus-4, sonnet-4
  codex, codex-full-auto

Phase 4: Testing & Documentation

Goal: Comprehensive testing without requiring actual agents

Tasks

  • Create mock agent for testing
    • test/fixtures/mock-agent.js that echoes inputs
    • Support deterministic failures for error testing
    • Environment variable PROMPTCODE_MOCK_AGENT=1 for CI
  • Add integration tests
    • Provider registry with mixed API/CLI providers
    • CLI provider process management
    • Configuration validation
    • Security tests (env scrubbing, injection prevention)
  • Update documentation
    • Add "Using CLI Agents" section to README
    • Document configuration options
    • Add troubleshooting guide
    • Update CLAUDE.md with agent context

Security Requirements

Critical Security Measures

  • Environment Scrubbing: Only pass allowlisted environment variables to CLI processes
  • No Shell Execution: Always use spawn() with shell: false and argument arrays
  • Input Validation: Validate all configuration with Zod to prevent injection
  • Resource Limits: Enforce timeouts and output size caps
  • Process Cleanup: Ensure child processes are terminated on parent exit

Implementation

// Security-first process spawning
const safeEnv = {
  PATH: process.env.PATH,
  HOME: process.env.HOME,
  USER: process.env.USER,
  TMPDIR: process.env.TMPDIR,
  LANG: process.env.LANG,
  // Only explicitly configured variables
  ...configuredAgentEnv
};

const child = spawn(binary, args, {
  env: safeEnv,
  shell: false,
  windowsHide: true,
  stdio: 'pipe',
  timeout: config.timeout || 300000
});

Success Criteria

  • Zero breaking changes - all existing commands work unchanged
  • CLI agents only activate when explicitly configured
  • Clear error messages with actionable fixes
  • Comprehensive test coverage without requiring real agents
  • Security measures prevent command injection and resource abuse
  • Single unified mental model: agents are just models from different providers

Technical Decisions

  • Why unified provider system? Maintains single mental model for users
  • Why not plugin system? Over-engineering for current needs
  • Why explicit configuration? Security and predictability over magic
  • Why model variants? Allows agent-specific features without flag proliferation

Future Considerations

  • Easy to add new CLI agents (Gemini, Cursor, etc.)
  • Possible future: Plugin system if we exceed 5-6 providers
  • VS Code extension will inherit CLI agent support automatically
  • CC commands work unchanged with new model selection

References

  • Design validated by GPT-5 and Gemini-2.5-pro consensus analysis
  • Security recommendations from both models emphasize environment scrubbing

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions