Skip to content

SoulSniper-V2/lintai

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

Repository files navigation

LintAI - AI Output Testing & Validation Framework

PyPI Version PyPI Downloads License: MIT CI/CD

A production-ready framework for validating AI/LLM outputs against user-defined assertions, confidence scoring, and edge case testing.

πŸ“¦ Installation

From PyPI (Recommended)

pip install llm-validator

From Source

git clone https://github.com/SoulSniper-V2/lintai.git
cd lintai
pip install -e .

🎯 Features

  • βœ… Assertion-Based Validation - Define expected behavior with simple rules
  • πŸ“Š Confidence Scoring - Get quantified trust metrics for outputs
  • πŸ§ͺ Edge Case Testing - Systematically test boundary conditions
  • πŸ€– Multi-Model Support - Works with OpenAI, Anthropic, Gemini, local LLMs
  • πŸ“ˆ Regression Tracking - Track validation scores over time
  • πŸ”„ CI/CD Integration - Run validations in GitHub Actions pipelines
  • πŸš€ Auto-Release to PyPI - Tags automatically publish to PyPI

πŸš€ GitHub Action

LintAI is also available as a GitHub Marketplace Action for CI/CD pipelines:

name: Validate AI Output
on: [push]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      
      - name: Validate LLM Output
        uses: SoulSniper-V2/lintai@v0.1
        with:
          prompt: "Summarize this document"
          output: "${{ steps.generate.outputs.result }}"
          assertions-config: "./assertions.json"
          pass-threshold: 80

Example Assertions Config (assertions.json)

{
  "assertions": [
    {
      "name": "max_length",
      "type": "MAX_LENGTH",
      "params": { "max_chars": 1000 },
      "weight": 0.3
    },
    {
      "name": "contains_steps",
      "type": "CONTAINS_TEXT",
      "params": { "text": "step 1" },
      "weight": 0.5
    },
    {
      "name": "no_profanity",
      "type": "NO_PATTERN",
      "params": { "pattern": "badword|offensive" },
      "weight": 0.2
    }
  ]
}

Action Inputs

Input Required Default Description
prompt Yes - Original prompt sent to LLM
output Yes - LLM output to validate
assertions-config Yes - Path to JSON assertions config
pass-threshold No 70 Minimum score to pass (0-100)
fail-on-warning No false Fail if any warnings

Action Outputs

Output Description
passed Whether validation passed (true/false)
score Confidence score (0-100)
failed-assertions Number of failed assertions
warnings-count Number of warnings

πŸš€ Quick Start

CLI Usage

# Initialize a validation config
lintai init-config

# Validate with a config file
lintai validate --config validators/my_config.yaml

# Batch validation from JSONL
lintai batch --input test_cases.jsonl --output results.jsonl
from llm_validator import LLMValidator, Assertion, AssertionType

# Initialize validator
validator = LLMValidator(
    model="gpt-4",
    api_key="your-key"
)

# Define assertions
assertions = [
    Assertion(
        name="max_length",
        type=AssertionType.MAX_LENGTH,
        params={"max_tokens": 500},
        weight=0.3
    ),
    Assertion(
        name="no_profanity",
        type=AssertionType.NO_PATTERN,
        params={"pattern": r"(?i)badword|offensive"},
        weight=0.5
    ),
    Assertion(
        name="contains_action_plan",
        type=AssertionType.CONTAINS_TEXT,
        params={"text": "step 1", "count": 1},
        weight=0.2
    )
]

# Validate output
result = validator.validate(
    prompt="Create a plan to increase sales",
    output="Here is a step by step plan...",
    assertions=assertions
)

print(f"Confidence Score: {result.score}/100")
print(f"Passed: {result.passed}")
print(f"Failed: {result.failed_assertions}")

CLI Usage

# Run validation from config
llm-validate --config validators/sales_plan.yaml

# Quick test
llm-validate --prompt "Summarize this" --output "The text says..." --rules "max_tokens:100"

# Batch validation
llm-validate --input test_cases.jsonl --output results.jsonl

Web Dashboard

cd frontend
npm install
npm run dev

Access at http://localhost:5173

πŸ“ Project Structure

llm-validator/
β”œβ”€β”€ llm_validator/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ core.py           # Main validation logic
β”‚   β”œβ”€β”€ assertions.py     # Assertion types
β”‚   β”œβ”€β”€ models.py         # Data models
β”‚   └── providers.py      # LLM provider integration
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ App.jsx
β”‚   β”‚   └── components/
β”‚   β”œβ”€β”€ package.json
β”‚   └── vite.config.js
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_core.py
β”‚   └── test_assertions.py
β”œβ”€β”€ validators/           # Example validation configs
β”œβ”€β”€ README.md
└── requirements.txt

πŸ› οΈ Assertion Types

Type Description Example
MAX_LENGTH Output within token/char limit max_tokens: 1000
MIN_LENGTH Output meets minimum length min_words: 50
CONTAINS_TEXT Output has required text text: "step 1"
NO_PATTERN Output doesn't match pattern pattern: "error|fail"
REGEX_MATCH Output matches regex pattern: r"^\d+\."
SENTIMENT Output sentiment check min_positive: 0.6
JSON_VALID Output is valid JSON schema: ./schema.json
KEYWORD_COUNT Keywords present keywords: ["AI", "ML"]
CUSTOM Python function validation function: my_validator.py

πŸ“Š Confidence Scoring

The validator calculates a weighted confidence score:

Confidence Score = Ξ£(passed_weight) / Ξ£(total_weight) Γ— 100

Individual assertion results:

  • βœ… PASS: Assertion met
  • ❌ FAIL: Assertion not met
  • ⚠️ WARN: Assertion partially met (with penalty)

🎨 Example Validators

Code Review Validator

name: code_review
model: gpt-4
assertions:
  - name: has_tests
    type: CONTAINS_TEXT
    params: { text: "test" }
    weight: 0.3
  
  - name: no_hardcoded_secrets
    type: NO_PATTERN
    params: { pattern: "api_key|password|secret" }
    weight: 0.4
  
  - name: reasonable_length
    type: MAX_LENGTH
    params: { max_tokens: 2000 }
    weight: 0.2
  
  - name: has_error_handling
    type: REGEX_MATCH
    params: { pattern: "except|try|catch" }
    weight: 0.1

Customer Email Validator

name: customer_email
model: claude-3-opus
assertions:
  - name: professional_tone
    type: SENTIMENT
    params: { min_positive: 0.3, max_negative: 0.2 }
    weight: 0.3
  
  - name: has_greeting
    type: CONTAINS_TEXT
    params: { text: "Dear|Hello|Hi" }
    weight: 0.1
  
  - name: has_signature
    type: CONTAINS_TEXT
    params: { text: "Sincerely|Best|Thanks" }
    weight: 0.1
  
  - name: no_pii
    type: NO_PATTERN
    params: { pattern: "\\d{3}-\\d{2}-\\d{4}" }  # SSN pattern
    weight: 0.5

πŸ”§ Configuration

Environment Variables

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=...
GOOGLE_API_KEY=...

Provider Selection

from llm_validator.providers import OpenAIProvider, AnthropicProvider, LocalProvider

# OpenAI
validator = LLMValidator(provider=OpenAIProvider(model="gpt-4"))

# Anthropic
validator = LLMValidator(provider=AnthropicProvider(model="claude-3-opus"))

# Local/Ollama
validator = LLMValidator(provider=LocalProvider(model="llama2"))

πŸ§ͺ Testing

# Run all tests
pytest tests/

# Run with coverage
pytest --cov=llm_validator tests/

# Run specific test
pytest tests/test_core.py -v

πŸ“ˆ CI/CD Integration

GitHub Actions

name: Validate AI Outputs
on: [push]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with: { python-version: '3.11' }
      - name: Install
        run: pip install llm-validator
      - name: Run Validation
        run: |
          llm-validate \
            --config validators/code_review.yaml \
            --output validation_results.json
      - name: Check Score
        run: |
          if [ $(jq '.score' validation_results.json) -lt 80 ]; then
            echo "Score below threshold!"
            exit 1
          fi

🎯 Use Cases

  1. Production AI Safety: Validate outputs before showing to users
  2. Code Review Automation: Check AI-generated code for quality
  3. Content Moderation: Ensure outputs meet guidelines
  4. Customer Support: Validate response quality
  5. RAG Evaluation: Test retrieval-augmented generation accuracy
  6. Model Comparison: Compare output quality across models

🀝 Contributing

  1. Fork the repo
  2. Create a feature branch
  3. Add your assertion type
  4. Submit a PR

πŸ”„ Automated Releases

This project uses GitHub Actions for CI/CD:

Workflow Description
Test Runs pytest on every push/PR
Build Builds PyPI package on every push
Publish Auto-publishes to PyPI when a git tag is pushed

How to Release

# Make changes, commit
git add -A
git commit -m "Description of changes"

# Create a version tag (follows semver)
git tag v0.1.1

# Push tag to trigger PyPI release
git push origin main
git push origin v0.1.1

The CI workflow will:

  1. Run tests
  2. Build the package
  3. Publish to PyPI automatically

Note: Requires PYPI_API_TOKEN secret in GitHub repo settings.

πŸ“„ License

MIT License - Build, validate, ship with confidence!


Never deploy AI without validation. πŸ›‘οΈ

About

Unit testing for LLMs - Validate AI outputs in CI/CD

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors