A sophisticated AI-powered tool designed to systematically identify and analyze non-deterministic behaviors in programs and APIs, particularly in modified Linux kernel environments. This tool leverages advanced Large Language Models (LLMs) to automatically generate, execute, and analyze test code across multiple programming languages.
Non-deterministic behavior in software systems can lead to unpredictable results, making debugging difficult and potentially causing security vulnerabilities. This tool helps developers:
- Identify potential sources of non-determinism in their systems
- Generate comprehensive test cases to expose these behaviors
- Execute tests across multiple environments to detect variations
- Analyze results to understand the root causes of non-determinism
The tool employs a sophisticated multi-stage pipeline:
- Planning Stage: AI-driven analysis to identify potential non-deterministic behaviors
- Code Generation: Automatic test code creation using LLMs
- Execution Stage: Multi-server test execution using QAN (Quick Analysis Network)
- Analysis & Review: Categorization and review of detected behaviors
- Multi-Language Support: Test generation for 14+ programming languages including C, C++, Java, Python, Go, Rust, and more
- AI-Powered Analysis: Uses state-of-the-art LLMs for intelligent test planning and code generation
- Automated Execution: Fully automated test execution with error recovery and retry mechanisms
- Non-Determinism Detection: Specialized algorithms to detect variations across multiple test runs
- Comprehensive Reporting: Detailed metrics and categorization of detected behaviors
- Containerized Testing: Secure execution in isolated Linux Alpine containers with KVM virtualization
- Python 3.12+
- Poetry (for dependency management)
- API keys for supported LLM providers (OpenAI, Anthropic, Perplexity, DeepSeek, etc.)
- Access to QAN servers for test execution (configurable via environment)
git clone <repository-url>
cd nondeterministic-agentpoetry installCreate a .env file in the project root with your configuration:
# API Keys
DEEPSEEK_API_KEY="your_deepseek_api_key"
OPENAI_API_KEY="your_openai_api_key"
PERPLEXITYAI_API_KEY="your_perplexity_api_key"
# Model names
PLANNING_MODEL="perplexity/r1-1776"
CODE_WRITER_MODEL="perplexity/r1-1776"
CODE_FIXER_MODEL="perplexity/r1-1776"
FORCE_NON_DETERMINISM_MODEL="perplexity/r1-1776"
REVIEW_MODEL="perplexity/r1-1776"
API_SERVER_URLS="s1.server.com:8000,s2.server.com:8000,s3.server.com:8000"This tool leverages LiteLLM for model interactions, meaning you can use any model provider supported by LiteLLM, including:
- OpenAI
- Anthropic
- Perplexity AI
- DeepSeek
- Mistral AI
- Gemini (Google)
- Azure OpenAI
- And many more
Specify your preferred model in the .env file using the appropriate format required by LiteLLM (e.g., for Perplexity R1 "perplexity/r1-1776" or "anthropic/claude-3-7-sonnet-20250219", etc.).
The tool operates in three main stages:
Run the planning script to analyze your target system:
python nondeterministic_agent/planning.py--scope SCOPE_NAME Specify a scope name for the analysis
-m, --model MODEL Override the default planning model
--non-interactive Run without interactive prompts (requires --scope)
-
Describe Your Target: When prompted, provide a description of the system or API you want to analyze. The AI will identify potential sources of non-determinism such as:
- Race conditions and concurrency issues
- Memory management variations
- CPU cache and TLB effects
- Timing-dependent operations
- System call variations
- Floating-point precision issues
-
Review Generated Subjects: The tool generates a comprehensive list of test subjects categorized by type (e.g., CPU microarchitecture, concurrency, filesystem, etc.)
-
Customize Test Plans: Review and optionally modify the generated test plans in:
results/scope/<scope_name>/
Execute the generated test plans for a specific programming language:
python nondeterministic_agent/execution.py -l <language>- Compiled Languages:
c,cxx,csharp,golang,java,kotlin,rust,scala - Interpreted Languages:
javascript,perl,php,python,ruby,typescript
-l, --language LANG Required. Target programming language
--scope SCOPE_NAME Specify which scope to execute (interactive if omitted)
-f, --force-restart Force restart, ignoring saved execution state
-m, --model MODEL Override the code generation model
--limit N Limit to first N test plans
--languages List all supported languages
The tool automatically:
- Generates Test Code: Creates language-specific implementations for each test plan
- Compiles Code: For compiled languages, builds the executable
- Executes Tests: Runs tests multiple times across different QAN servers
- Analyzes Results: Compares outputs to detect non-deterministic behavior
- Auto-Fixes Errors: Attempts to fix compilation or runtime errors
- Forces Non-Determinism: If initial tests are deterministic, tries to introduce variations
# Run tests for Python
python nondeterministic_agent/execution.py -l python
# Run tests for C++ with custom model
python nondeterministic_agent/execution.py -l cxx -m "anthropic/claude-3-opus-20240219"
# Resume interrupted execution
python nondeterministic_agent/execution.py -l java
# Force restart for Rust tests
python nondeterministic_agent/execution.py -l rust -fAfter execution, use the review tool to categorize and analyze detected behaviors:
python nondeterministic_agent/review.py [error_type]Where error_type is either:
non_deterministic(default) - Review non-deterministic behaviorssystem_error- Review system-level errors
The review process:
- Scans execution results
- Uses AI to categorize errors by type
- Groups similar behaviors together
- Copies relevant test files for easy analysis
The tool generates comprehensive outputs organized as follows:
results/
βββ scope/ # Planning stage outputs
β βββ <scope_name>/ # Test subjects for each scope
βββ plan/ # Generated test plans
β βββ <scope_name>/
β βββ test_plans.json
βββ generated_code/ # Generated test programs
β βββ <language>/
β βββ <subject>/
β βββ test_*.ext
βββ api_results/ # Raw execution results
β βββ <test_id>_<language>.json
βββ metrics_<scope>_<language>.json # Execution metrics
βββ execution_state_<language>.json # Execution progress tracking
βββ reviewed/ # Categorized results
βββ non_deterministic/
βββ system_error/
Run tests for all scopes or plans using the provided scripts:
# Execute all scopes
./scripts/run_all_scopes.sh
# Execute all plans for a specific scope
./scripts/run_all_plans.shThe tool executes tests in a specialized environment:
- Container: Linux Alpine 5.10 virtualized with Hermit on x86 architecture using KVM
- Resources: Single CPU, 1024 MB RAM
- Runtime Limit: 5 seconds per test
- Special Features:
- Consistent time function returns
- Fixed randomness state
- Disabled automatic preemption
- No internet access
Generate comprehensive reports of test results:
python scripts/generate_report.pyThis creates a structured report directory with:
- Categorized non-deterministic behaviors
- System errors grouped by type
- Associated test code for each finding
Figure: Visual representation of the non-deterministic agent workflow and test execution pipeline
- Planning Agent (
agents/planning_agent.py): Analyzes subjects and generates test plans - Execution Agent (
agents/execution_agent.py): Manages code generation and execution - QAN Service (
services/qan_service.py): Handles multi-server test execution - LLM Service (
services/llm_service.py): Interfaces with various AI models - State Manager (
managers/state_manager.py): Tracks execution progress - Metrics Manager (
managers/metrics_manager.py): Collects and analyzes results
- Code Generation: AI creates test code based on plan specifications
- Precompilation: Validates and preprocesses code
- Compilation: Builds executables for compiled languages
- Multi-Server Execution: Runs tests across multiple QAN servers
- Result Analysis: Compares outputs to detect variations
- Auto-Recovery: Attempts to fix errors and retry
- Non-Determinism Forcing: Modifies code to expose hidden variations
This tool leverages:
- LiteLLM for unified LLM access
- Aider for AI-assisted code generation
- Various open-source libraries listed in
pyproject.toml
For questions, issues, or contributions:
- Author: Yuriy Babyak yuriy.babyak@outlook.com
- GitHub: https://github.com/yuriyward/