CaseCraft is an Agentic QA Engine that turns feature requirements into structured test suites using Native LLMs, RAG, and MCP. This document outlines the technical architecture, layer details, module communication, and setup instructions.
sequenceDiagram
actor User as User / AI Agent
participant Interface as Interface Layer (CLI/MCP)
participant Core as Core Orchestration (generator.py)
participant RAG as AI & Knowledge (retriever.py)
participant LLM as LLM Client (llm_client.py)
User->>Interface: Request test generation (PDF/TXT)
Interface->>Core: Pass file path & parameters
Core->>Core: Parse and chunk document
Core->>RAG: Query Product Context
RAG-->>Core: Return historical context chunks
Core->>LLM: Send Prompt + Context
Note over LLM: Dispatches to Ollama, OpenAI, or Google
LLM-->>Core: Return generated JSON
Core->>Core: Sanitize, Validate, Deduplicate
Core->>Interface: Save to outputs/ & return status
Interface-->>User: Success response
The CaseCraft system follows a clean, modular, three-tier architecture separating the user interface from the business logic and external integrations.
This layer handles all incoming requests and routes them to the core orchestration engine. It ensures that regardless of how the user runs CaseCraft, the underlying execution remains identical.
- CLI (Command Line Interface): Allows direct terminal execution for generating tests and ingesting documents into the RAG database.
- MCP Server (Model Context Protocol): Exposes
generate_testsandquery_knowledgeas standard tools to AI clients like AnythingLLM, Claude Desktop, and Cursor. It acts as a bridge between your local AI assistant and the CaseCraft engine.
The "brain" of CaseCraft. It manages the end-to-end pipeline of reading a document, breaking it down, consulting the knowledge base, calling the LLM, and saving the results.
- Pipeline Execution: Manages chunking of large files, processing each chunk, and merging results.
- Data Validation: Enforces strict JSON schemas using Pydantic, ensuring the LLM output is structurally sound.
- Quality Control: Performs exact and semantic deduplication to remove redundant test cases generated across different chunks.
This layer abstracts remote or local AI dependencies.
- LLM Client: A unified driver that communicates with Ollama (Local), Google Gemini, or OpenAI-compatible endpoints.
- RAG System: A local vector database (ChromaDB) that stores historical project context. It ensures newly generated tests don't contradict established system behaviors.
- Role: Application entry points.
- Details:
main.pyparses arguments like--app-typeand--customto overridecasecraft.yamlsettings dynamically, triggering test generation.ingest.pyhandles parsing URLs/PDFs and pushing them into the local vector database.
- Role: FastMCP application wrapper.
- Details: Defines the
@mcp.tool()endpoints. Critical for security: it sandboxes file paths (_validate_file_path) and implements lazy loading of heavy modules (like PyTorch) so the MCP handshake completes instantly without timing out.
- Role: The heavy lifter for test generation.
- Details: Contains
generate_test_suite(). It breaks documents into chunks, retrieves product context via RAG, builds Jinja2 prompts, makes resilient calls to the LLM (with retries for bad JSON), sanitizes outputs, and performs deduplication (_deduplicate_semantically).
- Role: Universal LLM adapter.
- Details: The
LLMClientclass routes generation requests to_generate_ollama,_generate_openai_compatible, or_generate_googlebased on thecasecraft.yamlconfig. It handles network timeouts and REST payloads.
- Role: Context provider.
- Details: Instantiates a ChromaDB client. Given a document chunk, it searches the database for the top-K most similar historical documents to inject into the LLM prompt.
- Request Initiation: The user runs a CLI command or an AI Agent triggers an MCP tool call. The Interface Layer validates the requested file path.
- Orchestration Hand-off: The Interface passes the file path and any configuration overrides (e.g., app type) to
core.generator.generate_test_suite(). - Context Retrieval:
generator.pychunks the document and asksretriever.py(AI & Knowledge Layer) for relevant background info from the Vector DB. - Prompt Resolution:
generator.pycombines the chunk, the retrieved context, and the system prompts. - LLM Execution:
generator.pycallsllm_client.py.generate(). TheLLMClientreadscore.configto determine which provider to use and handles the network request. - Response & Validation: The LLM returns a string.
generator.pyparses it, coerces it intocore.schema.TestSuite, sanitizes fields, and deduplicates. - Data Export: Final validated data is sent to
core.exporter.pyto be written to disk as.xlsxand.json. The path of the generated file is returned to the Interface Layer.
- Python 3.10+
- Ollama (if running locally)
# Clone the repository
git clone https://github.com/T-Tests/casecraft.git
cd casecraft
# Install dependencies
pip install -r requirements-runtime.txt
# Download default local model
ollama pull llama3.2:3bThe repository includes a casecraft.yaml.example file with default settings. You must copy this to casecraft.yaml to configure the application:
cp casecraft.yaml.example casecraft.yamlOnce copied, edit casecraft.yaml to set your preferred LLM provider, models, and generation parameters:
general:
llm_provider: "ollama" # Options: ollama, openai, google
model: "llama3.2:3b" # Or gemini-1.5-flash
base_url: "http://localhost:11434" # Or https://generativelanguage.googleapis.com/v1beta
generation:
app_type: "web" # Options: web, mobile, desktop, apiTip: Set API keys via environment variables (e.g., CASECRAFT_GENERAL_API_KEY).
You can use GitHub Copilot's underlying LLMs (like GPT-4o) as the brain for CaseCraft. Configure your casecraft.yaml to use the GitHub Models API (which is OpenAI compatible):
general:
llm_provider: "openai"
model: "gpt-4o"
base_url: "https://models.inference.ai.azure.com"Note: You must set the environment variable CASECRAFT_GENERAL_API_KEY to your GitHub Personal Access Token (PAT) with Copilot access.
Run CaseCraft directly from your terminal:
# Generate tests for a feature document
python cli/main.py generate features/Activity_management.pdf
# Ingest documentation into RAG Knowledge base
python cli/ingest.py docs ./docs/CaseCraft can be used as a smart tool by AI clients.
- Add Server: Configure your AI client (AnythingLLM or
claude_desktop_config.json) to run the server script:Command: python|Args: c:\path\to\casecraft\casecraft_mcp.py - Restart Client: Ensure your AI client recognizes the new tools (
generate_tests,query_knowledge). - Prompt the Agent:
"Use the generate_tests tool to create test cases for features/Activity_management.pdf"
(Note: Ensure AnythingLLM is in 'Agent' mode, not 'Chat' mode, and that file generation targets the features/ directory.)
This folder contains feature documents that you want to generate test cases from (PDF, MD, TXT, etc.).
- Best Practice: One feature per file works best. Include acceptance criteria for comprehensive coverage.
- Usage: Pass these paths to the CLI/MCP (e.g.,
python cli/main.py generate features/your_feature.pdf).
This folder contains the product knowledge used for RAG (Retrieval-Augmented Generation). During test generation, CaseCraft automatically searches the index here to include relevant context in the LLM prompt.
index.json&chroma/: Vector embeddings. Do not edit manually. Delete them to rebuild the index from scratch.raw/: Put your source documents here for ingestion (features/,product_docs/,system_rules/).
How to Add Knowledge:
# From Local Documents
python cli/ingest.py docs knowledge_base/raw/
# From Websites (Sitemap or Single URL)
python cli/ingest.py sitemap https://docs.example.com/sitemap.xml
python cli/ingest.py url https://docs.example.com/page
# From URL List File
python cli/ingest.py urls my_urls.txtHow to Remove Knowledge (Clear Index): There are no single-document delete commands. To clear the knowledge base or rebuild it from scratch, delete the index files:
# Windows
del knowledge_base\index.json
rmdir /s /q knowledge_base\chroma
# Mac/Linux
rm knowledge_base/index.json
rm -rf knowledge_base/chroma