Thank you for your interest in contributing to Scenario! This document provides guidelines and instructions for contributing to the project.
- Architecture Overview
- File Structure
- Data and Control Flow
- Development Setup
- Coding Standards
- Testing
- Common Contribution Types
- Troubleshooting
- Pull Request Process
- Documentation
Below is a diagram of the main components of Scenario and how they interact:
graph TD
subgraph Core Components
Scenario(Scenario) --> |uses| TestingAgent
Scenario --> |creates| ScenarioResult
TestingAgent --> |evaluates| ScenarioResult
TestingAgent --> |creates| ScenarioResult
end
subgraph Configuration
Config(Config) --> |configures| TestingAgent
Config --> |configures| Scenario
end
subgraph Pytest Integration
PytestPlugin(pytest_plugin) --> |registers| ScenarioMarker(scenario_marker)
PytestPlugin --> |creates| ScenarioReporter
ScenarioReporter --> |collects| ScenarioResult
end
subgraph Examples
TravelAgent(travel_agent_example) --> |uses| Scenario
WebsiteBuilder(website_builder_example) --> |uses| Scenario
CodeAssistant(code_assistant_example) --> |uses| Scenario
end
Scenario --> |references| DEFAULT_TESTING_AGENT
TestingAgent --> |implements| DEFAULT_TESTING_AGENT
%% External dependencies
LLM(LiteLLM) --> |provides completion| TestingAgent
%% Data flow
Agent(Agent under test) --> |interacts with| TestingAgent
TestingAgent --> |sends messages to| Agent
%% Usage flow
User(User/Developer) --> |defines| Scenario
Scenario --> |executes test with| Agent
ScenarioResult --> |informs| User
- Scenario: The central class that defines a test case for an agent, including description, success/failure criteria, and testing strategy.
- TestingAgent: Handles the conversation with the agent under test, generates messages based on the scenario, and evaluates responses.
- ScenarioResult: Stores the outcome of a test run, including conversation history, success/failure status, and artifacts.
- DEFAULT_TESTING_AGENT: A default instance of TestingAgent used when none is explicitly provided.
- Config: Provides configuration options for models, temperatures, and other parameters used by the testing agent.
- pytest_plugin: Registers pytest fixtures and markers for integrating Scenario with pytest.
- scenario_marker: A pytest marker for identifying agent tests.
- ScenarioReporter: Collects and formats test results for reporting.
The project is organized as follows:
graph TD
Root[scenario/] --> Setup[setup.py]
Root --> Pyproject[pyproject.toml]
Root --> ReadMe[README.md]
Root --> Contrib[CONTRIBUTING.md]
Root --> ScenarioDir[scenario/]
Root --> ExamplesDir[examples/]
Root --> Tests[tests/]
ScenarioDir --> Init[__init__.py]
ScenarioDir --> ScenarioMod[scenario.py]
ScenarioDir --> TestingAgentMod[testing_agent.py]
ScenarioDir --> ResultMod[result.py]
ScenarioDir --> ConfigMod[config.py]
ScenarioDir --> PytestPluginMod[pytest_plugin.py]
ExamplesDir --> ExInit[__init__.py]
ExamplesDir --> TravelEx[test_travel_agent.py]
ExamplesDir --> WebsiteEx[test_website_builder.py]
ExamplesDir --> CodeEx[test_code_assistant.py]
ExamplesDir --> EarlyStopEx[test_early_stopping.py]
ExamplesDir --> ExReadMe[README.md]
ExamplesDir --> TestData[test_data/]
Tests --> TestsInit[__init__.py]
Tests --> ScenarioTests[test_scenario.py]
Tests --> AgentTests[test_testing_agent.py]
Tests --> ResultTests[test_result.py]
Tests --> ConfigTests[test_config.py]
Tests --> ExampleTests[test_example.py]
Tests --> CustomValidation[custom_validation_example.py]
- scenario.py: Contains the
Scenarioclass definition - testing_agent.py: Contains the
TestingAgentclass andDEFAULT_TESTING_AGENTinstance - result.py: Contains the
ScenarioResultclass for storing test outcomes - config.py: Provides configuration options for the testing framework
- pytest_plugin.py: Integration with pytest
- examples/: Contains example implementations and tests
The following diagram shows the detailed data and control flow during a typical test execution:
sequenceDiagram
participant User as User/Developer
participant S as Scenario
participant TA as TestingAgent
participant A as Agent under test
participant LLM as LiteLLM API
participant SR as ScenarioResult
User->>S: Create scenario with agent, criteria
S->>S: Validate configuration
User->>S: Call run() method
S->>TA: Run scenario with agent
TA->>LLM: Generate message (initial or next)
LLM-->>TA: Message content
loop For each turn (until conclusion or max_turns)
TA->>A: Send message to agent
A-->>TA: Agent response
TA->>TA: Store conversation turn
TA->>LLM: Generate next message with evaluation
LLM-->>TA: Next message or evaluation result via function call
alt LLM uses finish_test tool with verdict="success"
TA->>SR: Create success result
SR-->>S: Return success result
S-->>User: Return final result
else LLM uses finish_test tool with verdict="failure"
TA->>SR: Create failure result
SR-->>S: Return failure result
S-->>User: Return final result
else LLM continues conversation (no tool used)
TA->>TA: Prepare next message for agent
end
end
alt Max turns reached
TA->>SR: Create failure result (max turns)
SR-->>S: Return failure result
S-->>User: Return final result
end
-
Scenario Creation and Configuration:
- User defines a scenario with description, success/failure criteria, strategy, etc.
- User specifies the agent to test and optionally a custom testing agent
-
Test Execution:
- The
Scenario.run()method is called, which delegates to a TestingAgent - TestingAgent generates messages based on the scenario description and strategy
- TestingAgent manages a multi-turn conversation with the agent under test
- The
-
Message Generation and Evaluation:
- A single method
_generate_next_messagehandles both initial and subsequent messages - The method adapts its behavior based on whether it's generating the first message or evaluating responses
- For the first message, it creates a simple prompt to start the conversation
- For subsequent messages, it performs evaluation and can either continue the conversation or end the test
- A single method
-
Evaluation Process:
- After each agent response, the TestingAgent evaluates the conversation against success/failure criteria
- The TestingAgent can decide to:
- Continue the conversation with a new message
- Complete the test as a success with a detailed explanation
- Complete the test as a failure with a detailed explanation (especially when failure criteria are met)
- The decision is facilitated through a function-calling mechanism where the LLM can invoke a
finish_testtool
-
Early Stopping:
- A key feature is early stopping when failure criteria are met, preventing unnecessary conversation turns
- Failure criteria are given priority in evaluation, allowing immediate test termination when triggered
- Clone the repository:
git clone https://github.com/yourusername/scenario.git
cd scenario- Install dependencies and set up git hooks:
make installThis will:
- Install all dependencies using
uv - Set up pre-commit hooks for code quality
- Install conventional commit enforcement - commits must follow the format:
feat:,fix:,chore:, etc.
- Run tests:
make testThe project includes a Makefile with commands to simplify running examples and tests:
- Running a specific example:
# Run a specific example
make example examples/test_vegetarian_recipe_agent.py- Running tests:
# Run all tests
make test
# Run a specific test
make test tests/test_scenario.pyThe examples will produce a colorized report showing test results, success criteria met, and any failures.
- Follow PEP 8 guidelines for Python code
- Use type hints for all function parameters and return values
- Write docstrings for all classes and functions
- Keep functions focused on a single responsibility
- Use meaningful variable and function names
- Add tests for any new functionality
- Ensure all tests pass before submitting a pull request
- Use pytest fixtures to avoid code duplication in tests
- Include both unit tests and integration tests where appropriate
When adding a new feature to Scenario, follow these steps:
- Discuss First: Open an issue to discuss the proposed feature before implementation
- Design: Create a simple design doc or diagram if it's a complex feature
- Implementation: Follow the code style and add appropriate tests
- Documentation: Update README.md and add docstrings
Example features that would be useful:
- Support for additional LLM providers
- New validation strategies for specific agent types
- Enhanced reporting formats
- Support for parallel testing
When fixing a bug:
- Create an Issue: Start by creating an issue that describes the bug
- Reproduce: Add a test that reproduces the bug if possible
- Fix: Implement the fix
- Test: Make sure all tests pass
- Document: Update any affected documentation
The TestingAgent is a key component of Scenario. Improvements might include:
- Enhanced Prompts: Improving the prompts used for evaluation
- Better Evaluation: Making the evaluation of success/failure criteria more accurate
- Custom Function Calls: Adding new tools for the TestingAgent to use during evaluation
- Fallback Strategies: Adding fallback strategies when LLM calls fail
- Efficiency: Optimizing token usage or reducing API calls
Examples help users understand how to use Scenario. When adding an example:
- Choose a Use Case: Pick a practical use case that demonstrates Scenario's capabilities
- Create Agent: Implement a simple agent for testing
- Write Tests: Create test scenarios that demonstrate testing the agent
- Document: Add comments and explanation to make the example clear
Problem: LLM API calls fail or return unexpected results Solution:
- Check your API key and environment variables
- Verify the model name is supported by your provider
- Try reducing the complexity of your prompts
Problem: Circular import errors when importing from scenario modules Solution:
- Follow the import order in
__init__.py - Use type comments instead of annotations for circular references
- Import types at runtime when needed
Problem: Tests marked with @pytest.mark.agent_test are not being collected
Solution:
- Make sure the pytest plugin is properly registered
- Check that your pytest configuration includes the marker
- Verify the scenario_reporter fixture is available
Problem: Tests are flaky due to non-deterministic LLM responses Solution:
- Lower the temperature setting in the TestingAgent configuration
- Make success/failure criteria more specific
- Consider using deterministic mocks for testing framework components
Problem: The TestingAgent doesn't properly use the finish_test tool Solution:
- Check that your LLM model supports function calling
- Make sure the tool schema is defined correctly
- Review the prompt to ensure it clearly instructs when to use the tool
- Logging: Enable DEBUG level logging to see detailed information about LLM calls
import logging
logging.basicConfig(level=logging.DEBUG)- Conversation Inspection: Examine the conversation history in ScenarioResult
result = scenario.run()
for msg in result.conversation:
print(f"{msg['role']}: {msg['content'][:100]}...")- LLM Response Inspection: Check raw responses from the LLM provider
# Add this to testing_agent.py temporarily
print(f"Raw LLM response: {response}")- Tool Call Inspection: Examine function calls made by the LLM
# Add this to testing_agent.py temporarily
if hasattr(response.choices[0].message, 'tool_calls'):
print(f"Tool calls: {response.choices[0].message.tool_calls}")- Fork the repository and create a new branch from
main - Make your changes and add tests for new functionality
- Update documentation as needed
- Run all tests to ensure they pass
- Submit a pull request with a clear description of the changes
- Update README.md when adding significant new features
- Add docstrings to all public classes and functions
- Include examples for complex functionality
- When adding new components, update the architecture diagram in this document
If you're looking to extend Scenario, here are some key areas to consider:
- Custom Testing Agents: You can create specialized testing agents for specific types of applications.
- Additional Validation Functions: Implement custom validation logic for domain-specific tests.
- New Reporting Formats: Extend the reporting capabilities for different output formats.
- Integration with Other Testing Frameworks: Add integrations beyond pytest.
- Custom LLM Tools: Define new function-calling tools for the testing agent to use.
By contributing to Scenario, you agree that your contributions will be licensed under the project's MIT License.