Non-Deterministic Behavior Detection AI Agent

A sophisticated AI-powered tool designed to systematically identify and analyze non-deterministic behaviors in programs and APIs, particularly in modified Linux kernel environments. This tool leverages advanced Large Language Models (LLMs) to automatically generate, execute, and analyze test code across multiple programming languages.

🎯 Purpose

Non-deterministic behavior in software systems can lead to unpredictable results, making debugging difficult and potentially causing security vulnerabilities. This tool helps developers:

Identify potential sources of non-determinism in their systems
Generate comprehensive test cases to expose these behaviors
Execute tests across multiple environments to detect variations
Analyze results to understand the root causes of non-determinism

🏗️ Architecture Overview

The tool employs a sophisticated multi-stage pipeline:

Planning Stage: AI-driven analysis to identify potential non-deterministic behaviors
Code Generation: Automatic test code creation using LLMs
Execution Stage: Multi-server test execution using QAN (Quick Analysis Network)
Analysis & Review: Categorization and review of detected behaviors

🚀 Key Features

Multi-Language Support: Test generation for 14+ programming languages including C, C++, Java, Python, Go, Rust, and more
AI-Powered Analysis: Uses state-of-the-art LLMs for intelligent test planning and code generation
Automated Execution: Fully automated test execution with error recovery and retry mechanisms
Non-Determinism Detection: Specialized algorithms to detect variations across multiple test runs
Comprehensive Reporting: Detailed metrics and categorization of detected behaviors
Containerized Testing: Secure execution in isolated Linux Alpine containers with KVM virtualization

📋 Requirements

Python 3.12+
Poetry (for dependency management)
API keys for supported LLM providers (OpenAI, Anthropic, Perplexity, DeepSeek, etc.)
Access to QAN servers for test execution (configurable via environment)

🛠️ Installation

1. Clone the Repository

git clone <repository-url>
cd nondeterministic-agent

2. Install Dependencies

poetry install

3. Environment Configuration

Create a .env file in the project root with your configuration:

# API Keys
DEEPSEEK_API_KEY="your_deepseek_api_key"
OPENAI_API_KEY="your_openai_api_key"
PERPLEXITYAI_API_KEY="your_perplexity_api_key"

# Model names
PLANNING_MODEL="perplexity/r1-1776"
CODE_WRITER_MODEL="perplexity/r1-1776" 
CODE_FIXER_MODEL="perplexity/r1-1776"
FORCE_NON_DETERMINISM_MODEL="perplexity/r1-1776"
REVIEW_MODEL="perplexity/r1-1776"

API_SERVER_URLS="s1.server.com:8000,s2.server.com:8000,s3.server.com:8000"

Model Compatibility

This tool leverages LiteLLM for model interactions, meaning you can use any model provider supported by LiteLLM, including:

OpenAI
Anthropic
Perplexity AI
DeepSeek
Mistral AI
Gemini (Google)
Azure OpenAI
And many more

Specify your preferred model in the .env file using the appropriate format required by LiteLLM (e.g., for Perplexity R1 "perplexity/r1-1776" or "anthropic/claude-3-7-sonnet-20250219", etc.).

📖 Usage Guide

The tool operates in three main stages:

1. Planning Stage - Identifying Non-Deterministic Behaviors

Run the planning script to analyze your target system:

python nondeterministic_agent/planning.py

Command Line Options:

--scope SCOPE_NAME    Specify a scope name for the analysis
-m, --model MODEL     Override the default planning model
--non-interactive     Run without interactive prompts (requires --scope)

Interactive Workflow:

Describe Your Target: When prompted, provide a description of the system or API you want to analyze. The AI will identify potential sources of non-determinism such as:
- Race conditions and concurrency issues
- Memory management variations
- CPU cache and TLB effects
- Timing-dependent operations
- System call variations
- Floating-point precision issues
Review Generated Subjects: The tool generates a comprehensive list of test subjects categorized by type (e.g., CPU microarchitecture, concurrency, filesystem, etc.)
Customize Test Plans: Review and optionally modify the generated test plans in:
```
results/scope/<scope_name>/
```

2. Execution Stage - Generating and Running Tests

Execute the generated test plans for a specific programming language:

python nondeterministic_agent/execution.py -l <language>

Supported Languages:

Compiled Languages: c, cxx, csharp, golang, java, kotlin, rust, scala
Interpreted Languages: javascript, perl, php, python, ruby, typescript

Command Line Options:

-l, --language LANG      Required. Target programming language
--scope SCOPE_NAME       Specify which scope to execute (interactive if omitted)
-f, --force-restart      Force restart, ignoring saved execution state
-m, --model MODEL        Override the code generation model
--limit N                Limit to first N test plans
--languages              List all supported languages

Execution Process:

The tool automatically:

Generates Test Code: Creates language-specific implementations for each test plan
Compiles Code: For compiled languages, builds the executable
Executes Tests: Runs tests multiple times across different QAN servers
Analyzes Results: Compares outputs to detect non-deterministic behavior
Auto-Fixes Errors: Attempts to fix compilation or runtime errors
Forces Non-Determinism: If initial tests are deterministic, tries to introduce variations

Example Workflow:

# Run tests for Python
python nondeterministic_agent/execution.py -l python

# Run tests for C++ with custom model
python nondeterministic_agent/execution.py -l cxx -m "anthropic/claude-3-opus-20240219"

# Resume interrupted execution
python nondeterministic_agent/execution.py -l java

# Force restart for Rust tests
python nondeterministic_agent/execution.py -l rust -f

3. Review Stage - Analyzing Results

After execution, use the review tool to categorize and analyze detected behaviors:

python nondeterministic_agent/review.py [error_type]

Where error_type is either:

non_deterministic (default) - Review non-deterministic behaviors
system_error - Review system-level errors

The review process:

Scans execution results
Uses AI to categorize errors by type
Groups similar behaviors together
Copies relevant test files for easy analysis

📊 Output Structure

The tool generates comprehensive outputs organized as follows:

results/
├── scope/                     # Planning stage outputs
│   └── <scope_name>/         # Test subjects for each scope
├── plan/                     # Generated test plans
│   └── <scope_name>/        
│       └── test_plans.json
├── generated_code/           # Generated test programs
│   └── <language>/
│       └── <subject>/
│           └── test_*.ext
├── api_results/              # Raw execution results
│   └── <test_id>_<language>.json
├── metrics_<scope>_<language>.json  # Execution metrics
├── execution_state_<language>.json  # Execution progress tracking
└── reviewed/                 # Categorized results
    ├── non_deterministic/
    └── system_error/

🔧 Advanced Features

Batch Execution

Run tests for all scopes or plans using the provided scripts:

# Execute all scopes
./scripts/run_all_scopes.sh

# Execute all plans for a specific scope
./scripts/run_all_plans.sh

Custom Test Environments

The tool executes tests in a specialized environment:

Container: Linux Alpine 5.10 virtualized with Hermit on x86 architecture using KVM
Resources: Single CPU, 1024 MB RAM
Runtime Limit: 5 seconds per test
Special Features:
- Consistent time function returns
- Fixed randomness state
- Disabled automatic preemption
- No internet access

Report Generation

Generate comprehensive reports of test results:

python scripts/generate_report.py

This creates a structured report directory with:

Categorized non-deterministic behaviors
System errors grouped by type
Associated test code for each finding

🏗️ Architecture Details

Figure: Visual representation of the non-deterministic agent workflow and test execution pipeline

Core Components

Planning Agent (agents/planning_agent.py): Analyzes subjects and generates test plans
Execution Agent (agents/execution_agent.py): Manages code generation and execution
QAN Service (services/qan_service.py): Handles multi-server test execution
LLM Service (services/llm_service.py): Interfaces with various AI models
State Manager (managers/state_manager.py): Tracks execution progress
Metrics Manager (managers/metrics_manager.py): Collects and analyzes results

Test Execution Pipeline

Code Generation: AI creates test code based on plan specifications
Precompilation: Validates and preprocesses code
Compilation: Builds executables for compiled languages
Multi-Server Execution: Runs tests across multiple QAN servers
Result Analysis: Compares outputs to detect variations
Auto-Recovery: Attempts to fix errors and retry
Non-Determinism Forcing: Modifies code to expose hidden variations

Acknowledgments

This tool leverages:

LiteLLM for unified LLM access
Aider for AI-assisted code generation
Various open-source libraries listed in pyproject.toml

Contact

For questions, issues, or contributions:

Author: Yuriy Babyak yuriy.babyak@outlook.com
GitHub: https://github.com/yuriyward/

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.devcontainer		.devcontainer
images		images
nondeterministic_agent		nondeterministic_agent
qan		qan
results/scope		results/scope
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Non-Deterministic Behavior Detection AI Agent

🎯 Purpose

🏗️ Architecture Overview

🚀 Key Features

📋 Requirements

🛠️ Installation

1. Clone the Repository

2. Install Dependencies

3. Environment Configuration

Model Compatibility

📖 Usage Guide

1. Planning Stage - Identifying Non-Deterministic Behaviors

Command Line Options:

Interactive Workflow:

2. Execution Stage - Generating and Running Tests

Supported Languages:

Command Line Options:

Execution Process:

Example Workflow:

3. Review Stage - Analyzing Results

📊 Output Structure

🔧 Advanced Features

Batch Execution

Custom Test Environments

Report Generation

🏗️ Architecture Details

Core Components

Test Execution Pipeline

Acknowledgments

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages