A comprehensive framework for conducting systematic safety evaluations of large language models through scenario-based testing, automated conversation management, and comprehensive reporting. Features include multi-model support, web-based log viewer, multilingual error reports, API request/response logging, and extensible adapter architecture for custom implementations.
- π§ Modular architecture (adapter pattern)
- π€ Support for multiple LLM providers (extensible)
- πΎ Support for multiple database backends (extensible)
- π Scenario-based test management
- π Detailed logging and history management
- π Session tracking for comparing multiple test runs
- π‘οΈ Security-conscious design
- π‘ API request/response logging and storage
- π Multi-language error report support (English/Japanese)
- π Comprehensive error analysis with API data
# Clone the repository
git clone https://github.com/techs-targe/llm-safety-testing-tool-v2.git
cd llm-safety-testing-tool-v2
# Install the package
pip install -e . # For development
# or
pip install . # For regular installationpip install git+https://github.com/techs-targe/llm-safety-testing-tool-v2.git# Install development dependencies
pip install -e .[dev]
# Install pre-commit hooks
pre-commit installSee QUICK_START.md for a detailed walkthrough.
# 1. Clone and enter the repository
git clone https://github.com/techs-targe/llm-safety-testing-tool-v2.git
cd llm-safety-testing-tool-v2
# 2. Install the tool
pip install .
# 3. Set your API key
export ANTHROPIC_API_KEY="your-api-key"
# 4. Now you're ready to start!# Database is now stored in the project directory by default
# No need to reset unless you want to start fresh
# Create and run a test scenario
safety scenario-create TEST-001 "Basic test"
safety system-set TEST-001 "You are a helpful assistant."
safety message-add TEST-001 "Hello"
safety message-add TEST-001 "What's the weather today?"
safety scenario-run TEST-001
# View the results
safety logs-show TEST-001By default, the database is stored in ~/.safety_tool/safety_tool.db. This ensures:
- Consistent location for all tools (CLI and Web viewer)
- Data persistence across different project directories
- Shared database between different clones
You can customize the database location using the SAFETY_TOOL_DB_PATH environment variable:
# Use custom path
export SAFETY_TOOL_DB_PATH=/path/to/my/database.db
# Use project-local database (for isolated testing)
export SAFETY_TOOL_DB_PATH=./safety_tool.db"UNIQUE constraint failed: scenarios.id"
- This means TEST-001 already exists
- Solution: Use a different ID (TEST-002) or delete the old one:
safety scenario-delete TEST-001 # or start fresh (removes database in current directory) rm -f safety_tool.db
Multiple duplicate messages
- This happens when commands are run multiple times
- Solution: Start fresh with
rm -f safety_tool.db
An advanced browser-based interface is included for managing scenarios, viewing logs, and configuring the tool:
# Start the new web dashboard (recommended)
./web-viewer-v3
# Open http://localhost:8080 in your browserThe web dashboard provides:
- Create, duplicate, edit, and delete test scenarios
- Edit system messages and prompts
- Import/export scenarios as JSON
- Run scenarios directly from the web interface
- Separate tabs for error reports and logs
- Test ID and sequence/session filtering
- Date range filtering
- Detailed view for individual reports and logs
- API request/response data viewing
- Multilingual report support
- Select and configure LLM models
- Manage API keys securely
- Configure database settings
For legacy users, the original viewer is still available via
./web-viewer-v2
Note: The web dashboard automatically creates database tables if they don't exist, so it works even with a fresh installation.
For more detailed usage, see:
- USAGE.md - Complete command reference
- QUICK_START.md - Step-by-step tutorial
- WEB_DASHBOARD_GUIDE.md - Web dashboard user guide
- WEB_VIEWER_TROUBLESHOOTING.md - Web viewer troubleshooting guide
safety_tool/
βββ adapters/ # LLM/DB adapters
β βββ llm/
β β βββ base.py
β β βββ anthropic.py
β βββ db/
β βββ base.py
β βββ sqlite.py
βββ cli/ # CLI interface
β βββ main.py
βββ core/ # Core logic
β βββ scenario_manager.py
β βββ runner.py
β βββ models.py
βββ config.toml # Configuration file
# Run tests
make test
# Generate coverage report
make coverage
# Run linters
make lint
# Format code
make format# Build
make build
# Upload to PyPI
make uploadThis project is licensed under the MIT License.
We welcome contributions! Please see CONTRIBUTING.md for details.
If you have any issues or questions, please check:
- TROUBLESHOOTING.md for common issues
- GitHub Issues for bug reports and feature requests