RAFT Toolkit

📋 Table of Contents

RAFT Toolkit

🚀 Overview

What is RAFT?

RAFT (Retrieval Augmented Fine-Tuning) is a technique that trains language models to better utilize retrieved documents when answering questions. Unlike traditional RAG systems that rely on frozen pre-trained models, RAFT fine-tunes models specifically for document-based reasoning tasks.

The RAFT Toolkit automates the creation of training datasets by generating {question, answer, documents} triplets from your documents, enabling you to fine-tune models that excel at retrieval-augmented generation tasks.

RAFT Training Process Flow

graph TD
    A[📄 Input Sources<br/>Local, S3, SharePoint] --> B{🔧 RAFT Toolkit<br/>CLI or Web UI}
    
    B --> C[📑 Document Chunking<br/>Semantic/Fixed/Sentence]
    C --> D[❓ Question Generation<br/>LLM-powered Q&A creation]
    D --> E[📝 Answer Generation<br/>Context-based responses]
    E --> F[🎭 Distractor Addition<br/>Irrelevant docs for robustness]
    F --> G[📊 Training Dataset<br/>JSONL/Parquet format]
    
    G --> H[🤖 Model Fine-tuning<br/>OpenAI/HuggingFace/Azure]
    H --> I[🎯 Fine-tuned Model<br/>Domain-optimized LLM]
    
    G --> J{🛠️ Analysis Tools}
    J --> K[📈 Dataset Evaluation<br/>eval.py]
    J --> L[💬 Answer Generation<br/>answer.py]  
    J --> M[🔍 PromptFlow Analysis<br/>pfeval_*.py]
    
    K --> N[📊 Performance Metrics]
    L --> O[🔄 Model Comparison]
    M --> P[📋 Quality Assessment]
    
    N --> Q[✨ Production Model<br/>Optimized for RAG tasks]
    O --> Q
    P --> Q
    
    style B fill:#e1f5fe,color:#000000
    style J fill:#f3e5f5,color:#000000
    style Q fill:#e8f5e8,color:#000000

🔧 Toolkit Components:

Core Engine: Document processing and dataset generation
Analysis Tools: Six evaluation and comparison utilities
Web Interface: Visual workflow management and monitoring
CLI Tools: Scriptable automation and batch processing

Key Features

Features:

📊 Dual Interface: Command-line tool and modern web interface
🛠️ Analysis Tools Suite: Evaluation, answer generation, and PromptFlow analysis
🏗️ 12-Factor Architecture: Cloud-native, scalable design
📄 Multi-Format Support: PDF, TXT, JSON, PPTX, and API documentation
☁️ Multiple Input Sources: Local files, Amazon S3, SharePoint Online
🔐 Enterprise Authentication: AWS credentials, Azure AD, SharePoint integration
🎯 Flexible Output: HuggingFace, OpenAI completion/chat, and evaluation formats
⚡ Parallel Processing: Configurable workers for optimal performance
📋 Enhanced Logging: Production-ready logging with progress tracking, external service integration (Sentry, DataDog), and structured output
📊 Observability: Optional LangWatch integration for LLM call tracing and performance monitoring
🧪 Comprehensive Testing: Unit, integration, API, and CLI test suites
🐳 Container Ready: Docker support for easy deployment
🚀 Kubernetes Ready: Complete Kubernetes deployment configurations

RAFT vs Traditional RAG

Aspect	Traditional RAG	RAFT Fine-Tuning
Model Training	Uses frozen pre-trained models	Fine-tunes models on domain-specific data
Document Utilization	May ignore or misuse retrieved documents	Learns to effectively use retrieved information
Performance	Depends on base model's retrieval reasoning	Optimized for specific document types/domains
Latency	Requires runtime retrieval + inference	Faster inference with better document integration
Setup Complexity	Lower initial setup	Higher setup (requires training data generation)
Customization	Limited to prompt engineering	Deep customization through fine-tuning

When to Use RAFT vs Traditional RAG:

Use RAFT Fine-Tuning When:

You have consistent document types/formats
Performance on document reasoning is critical
You can invest time in data generation and training
You need predictable, high-quality outputs
Latency optimization is important

Use Traditional RAG When:

Working with diverse, changing document types
Quick prototyping or proof-of-concept needed
Limited resources for training data generation
Documents change frequently
General-purpose question answering is sufficient

📦 Installation

📋 Complete Installation Guide: For detailed installation instructions, prerequisites, Docker setup, and advanced configuration options, see docs/INSTALLATION_GUIDE.md.

Quick Start

# Clone the repository
git clone https://github.com/your-repo/raft-toolkit.git
cd raft-toolkit

# Set up environment
cp .env.example .env
# Edit .env with your OpenAI API key

# Fast installation (core functionality only)
pip install .

# Or standard installation (recommended)
pip install .[standard]

# Test installation
python -m cli.main --datapath sample_data/sample.pdf --output ./output --preview

Installation Options

Choose the installation that best fits your needs:

🚀 Core Installation (Fastest - ~30-60 seconds)

pip install .

Includes: Basic CLI, document processing, OpenAI integration
Use cases: Quick testing, lightweight deployments, basic CI

📊 Standard Installation (Recommended)

pip install .[standard]

Includes: Full AI/ML functionality, embeddings, LangChain ecosystem
Use cases: Production deployments, full RAFT functionality

🌐 Complete Installation

pip install .[complete]

Includes: Standard + cloud services + observability
Use cases: Enterprise deployments, cloud integration

🛠️ Development Installation

pip install .[all]

Includes: Everything + development tools
Use cases: Contributing, local development, full testing

🎯 Custom Combinations

# Web interface with AI
pip install .[standard,web]

# Cloud deployment with tracing
pip install .[ai,langchain,cloud,tracing]

# Development with specific features
pip install .[standard,dev]

🐳 Docker Installation

docker compose up -d

🚀 Performance Note: The optimized dependency structure provides 70-80% faster CI builds compared to previous versions. See CI Optimization Guide for details.

📚 Installation Resources:

Complete Installation Guide - Detailed setup instructions
Requirements Management - Dependency structure and installation patterns

📚 CLI Documentation:

CLI Reference Guide - Comprehensive CLI parameter documentation
CLI Quick Reference - Quick reference card for CLI parameters

🌐 Usage

Web Interface

📚 See also: Web Interface Guide for detailed documentation on all web UI features, analysis tools, and job management.

# Start the web server
python run_web.py

# Or with custom configuration
python run_web.py --host 0.0.0.0 --port 8080 --debug

# Open http://localhost:8000 in your browser

Web UI Features:

📤 Dataset Generation: Drag & drop file upload with visual configuration
🛠️ Analysis Tools: Six powerful evaluation and analysis tools
⚙️ Visual Configuration: Interactive forms for all settings
👀 Live Preview: See processing estimates before running
📊 Job Management: Track multiple processing jobs with real-time updates
📥 Download Results: Direct download of generated datasets and analysis results
📈 Results Visualization: Comprehensive display of metrics and statistics

Analysis Tools Available:

Dataset Evaluation: Evaluate model performance with configurable metrics
Answer Generation: Generate high-quality answers using various LLMs
PromptFlow Analysis: Multi-dimensional evaluation (relevance, groundedness, fluency, coherence)
Dataset Analysis: Statistical analysis and quality metrics
Model Comparison: Side-by-side performance comparison
Batch Processing: Automated workflows for multiple datasets

Command Line Interface

📚 Complete CLI Documentation:

CLI Reference Guide - Comprehensive documentation of all CLI parameters and options

CLI Quick Reference - Quick reference card for common commands and use cases

The tools/ directory contains powerful standalone evaluation utilities:

# Navigate to tools directory
cd tools/

# Install tool dependencies
pip install -r requirements.txt

# Run dataset evaluation
python eval.py --question-file dataset.jsonl --answer-file answers.jsonl

# Generate answers for evaluation
python answer.py --input questions.jsonl --output answers.jsonl --workers 8

# Run PromptFlow evaluation
python pfeval_chat.py --input dataset.jsonl --output evaluation.json

See the tools/README.md for comprehensive documentation on all available tools.

Basic Workflow:

Chunk Generation: Document is split into chunks
QA Generation: LLM generates N questions and answers per chunk
Distractor Appending: Random chunks are added as distractors for each QA pair
Dataset Export: Data is saved in the specified format for fine-tuning

Tips:

Use a .env file for OpenAI/Azure keys
For Azure, set deployment names with --completion-model and --embedding-model
Use --chunking-strategy and --chunking-params for best results on your data

Using Ollama for Local Models

You can use Ollama as a local OpenAI-compatible API for running models like Llama 3, Mistral, and others. This allows you to run RAFT without cloud API keys.

1. Start Ollama with your desired model:

ollama run llama3

2. Set the OpenAI-compatible endpoint in your environment:

export OPENAI_API_BASE_URL="http://localhost:11434/v1"
export OPENAI_API_KEY="ollama-anything"  # Any non-empty string

Or add these to your .env file:

OPENAI_API_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama-anything

3. Run RAFT as usual:

python3 raft.py \
  --datapath sample_data/United_States_PDF.pdf \
  --output ./sample_ds4 \
  --distractors 4 \
  --doctype pdf \
  --chunk_size 512 \
  --questions 5 \
  --openai_key $OPENAI_API_KEY

Note:

Ollama's API is compatible with the OpenAI API, but some advanced features may not be supported.
You can specify different models by running ollama run <model_name> and setting the appropriate model in your RAFT command if needed.

📝 RAFT Training Guide

Best Practices

📚 See also: Complete Configuration Guide for advanced RAFT configuration options and best practices.

Document Preparation

Quality Over Quantity: Use high-quality, authoritative documents
Consistent Format: Maintain consistent document structure and formatting
Domain Relevance: Focus on documents representative of target use cases
Optimal Length: Use documents of 1,000-10,000 tokens for best chunking results

Question Generation

Diverse Question Types: Include factual, analytical, and inferential questions
Appropriate Difficulty: Match question complexity to intended use case
Natural Language: Generate questions that users would realistically ask
Coverage: Ensure questions cover all important document sections

Dataset Composition

Distractor Ratio: Use 3-5 distractor documents per training example
Oracle Probability: Include source document 80-100% of the time
Balanced Difficulty: Mix easy, medium, and hard questions
Size Guidelines: Aim for 1,000-10,000 training examples minimum

Quality Assurance

Manual Review: Sample and manually verify question-answer pairs
Consistency Checks: Ensure answers are actually derivable from context
Bias Detection: Check for dataset biases and systematic errors
Evaluation Split: Reserve 10-20% of data for evaluation

Chunking Strategies

Effective chunking is critical for RAFT success. Choose your strategy based on document type and use case:

📏 Chunk Size Guidelines

Document Type	Recommended Chunk Size	Reasoning
Technical Documentation	300-512 tokens	Preserves complete concepts and code examples
Legal Documents	512-768 tokens	Maintains clause/section coherence
Medical Literature	256-512 tokens	Balances detail with focused topics
Research Papers	512-1024 tokens	Captures complete paragraphs and findings
FAQ/Knowledge Base	128-256 tokens	Each chunk = one question/topic
News Articles	256-512 tokens	Preserves story coherence

🔄 Overlap Strategy

Overlap %	Use Case	Trade-offs
0%	Distinct topics, FAQ	Clean separation, no redundancy
10-20%	Technical docs	Minimal context preservation
20-40%	Narrative content	Good context flow, some redundancy
40-60%	Complex topics	Maximum context, high redundancy

# Low overlap for distinct topics
--chunking-params '{"overlap": 0}'

# Medium overlap for connected content  
--chunking-params '{"overlap": 100}'  # ~20% of 512 tokens

# High overlap for complex documents
--chunking-params '{"overlap": 200}'  # ~40% of 512 tokens

❓ Questions Per Chunk

Questions/Chunk	Use Case	Quality vs Quantity
1-2	High-quality, focused datasets	Maximum quality, minimal redundancy
3-5	Balanced approach (recommended)	Good quality, reasonable coverage
6-10	Comprehensive coverage	Risk of lower quality questions

# Focused, high-quality
--questions 2 --chunk_size 512

# Balanced approach (recommended)
--questions 5 --chunk_size 384

# Comprehensive coverage
--questions 8 --chunk_size 256

🎭 Distractor Configuration

Distractors	Training Benefit	Dataset Size Impact
2-3	Basic robustness	Moderate increase
4-6	Strong robustness (recommended)	5-7x dataset size
7-10	Maximum robustness	8-11x dataset size

# Recommended configuration
--distractors 4 --questions 5 --chunk_size 512

# Resource-constrained
--distractors 2 --questions 3 --chunk_size 384

# Maximum robustness
--distractors 6 --questions 3 --chunk_size 256

⚙️ Strategy-Specific Recommendations

🧠 Semantic Chunking (Recommended)

--chunking-strategy semantic --chunk_size 512 \
--chunking-params '{"overlap": 50, "min_chunk_size": 200}'

Best for: Most document types, preserves meaning
Overlap: 50-100 tokens for context preservation
Min size: 200 tokens to ensure meaningful chunks

📐 Fixed Chunking

--chunking-strategy fixed --chunk_size 384 \
--chunking-params '{"overlap": 75}'

Best for: Consistent processing, structured documents
Overlap: 15-25% of chunk size
Use when: Semantic understanding less critical

📝 Sentence Chunking

--chunking-strategy sentence --chunk_size 256 \
--chunking-params '{"overlap": 0}'

Best for: Natural language, narrative content
Overlap: Usually 0 (sentence boundaries are natural breaks)
Chunk size: Maximum tokens per chunk (actual size varies)

The RAFT Process

1. Training Data Generation (This Toolkit)

# Generate RAFT training dataset
python raft.py --datapath documents/ --output training_data/

Document Chunking: Split documents into semantic chunks
Question Generation: Create relevant questions for each chunk
Answer Generation: Generate accurate answers using the source chunk
Distractor Addition: Include irrelevant documents to improve robustness
Format Conversion: Export in format suitable for fine-tuning platforms

2. Model Fine-Tuning

# Example with OpenAI fine-tuning
openai api fine_tunes.create \
  -t training_data.jsonl \
  -m gpt-3.5-turbo \
  --suffix "raft-medical-docs"

Platform Selection: Choose fine-tuning platform (OpenAI, HuggingFace, etc.)
Model Selection: Start with instruction-tuned base models
Training Configuration: Set learning rate, epochs, batch size
Validation: Monitor training metrics and validation performance

3. Evaluation & Iteration

# Evaluate fine-tuned model
python tools/eval.py --model ft:gpt-3.5-turbo:suffix --question-file eval.jsonl

Performance Testing: Compare against baseline models
Error Analysis: Identify common failure patterns
Data Augmentation: Generate additional training examples for weak areas
Iterative Improvement: Refine dataset and retrain

📝 Template System

RAFT Toolkit includes a comprehensive template system for customizing prompts used in embedding generation and question-answer pair creation. Templates can be customized to improve quality and relevance for specific domains.

Default Template Behavior

No Configuration Required: RAFT Toolkit works out of the box with intelligent defaults:

Automatically selects appropriate templates based on model type (GPT, Llama, etc.)
Provides robust fallback mechanisms if custom templates are not found
Includes multiple layers of default templates for different complexity levels
Gracefully handles missing template directories or malformed template files

# Works immediately with defaults - no template configuration needed
python raft.py --datapath docs/ --output training_data/

Available Templates

Embedding Templates

embedding_prompt_template.txt: Default template for embedding generation
- Provides context and instructions for generating document embeddings
- Supports variables: {content}, {document_type}, {metadata}
- Customizable for domain-specific embedding optimization

Question-Answer Generation Templates

gpt_template.txt: GPT-style question-answering template with reasoning and citations
gpt_qa_template.txt: GPT question generation template with content filtering
llama_template.txt: Llama-style question-answering template optimized for Llama models
llama_qa_template.txt: Llama question generation template with complexity guidelines

Template Configuration

Environment Variables:

# Custom prompt templates
export RAFT_EMBEDDING_PROMPT_TEMPLATE="/path/to/templates/my_embedding_template.txt"
export RAFT_QA_PROMPT_TEMPLATE="/path/to/templates/my_qa_template.txt"
export RAFT_ANSWER_PROMPT_TEMPLATE="/path/to/templates/my_answer_template.txt"

# Templates directory
export RAFT_TEMPLATES="/path/to/templates/"

CLI Arguments:

# Use custom templates
python raft.py --datapath docs/ --output training_data/ \
  --embedding-prompt-template "/path/to/custom_embedding.txt" \
  --qa-prompt-template "/path/to/custom_qa.txt" \
  --answer-prompt-template "/path/to/custom_answer.txt"

# Use custom templates directory
python raft.py --datapath docs/ --output training_data/ \
  --templates "/path/to/custom/templates/"

Programmatic Configuration:

config = RAFTConfig(
    templates="./templates",
    embedding_prompt_template="templates/my_custom_embedding.txt",
    qa_prompt_template="templates/gpt_qa_template.txt",
    answer_prompt_template="templates/gpt_template.txt"
)

Template Variables

Embedding Templates

{content}: The document content to be embedded
{document_type}: File type (pdf, txt, json, pptx, etc.)
{metadata}: Additional document metadata
{chunk_index}: Index of the current chunk within the document
{chunking_strategy}: The chunking method used

QA Generation Templates

{question}: The question to be answered (for answer templates)
{context}: The context/chunk for question generation
%s: Placeholder for number of questions to generate

Domain-Specific Examples

Medical Documents

Generate embeddings for medical literature that capture:
- Clinical terminology and procedures
- Drug names and dosages
- Symptoms and diagnoses
- Treatment protocols and outcomes

Content: {content}

Legal Documents

Generate embeddings for legal documents focusing on:
- Legal terminology and concepts
- Case citations and precedents
- Statutory references
- Contractual terms and conditions

Document Type: {document_type}
Content: {content}

Technical Documentation

Generate embeddings for technical documentation emphasizing:
- API endpoints and parameters
- Code examples and syntax
- Configuration options
- Error messages and troubleshooting

Content: {content}
Metadata: {metadata}

See the templates/README.md for comprehensive template documentation and customization examples.

🔧 Advanced Configuration

Rate Limiting

The RAFT Toolkit includes comprehensive rate limiting to handle the constraints imposed by cloud-based AI services. Rate limiting is disabled by default to maintain backward compatibility, but is highly recommended for production use to avoid hitting API limits and reduce costs.

Why Rate Limiting Matters

Common Issues Without Rate Limiting:

API rate limit errors (HTTP 429) causing processing failures
Unexpected costs from burst API usage
Inconsistent processing times due to throttling
Failed batches requiring expensive reprocessing

Benefits of Rate Limiting:

Predictable Costs: Control API spending with token and request limits
Reliable Processing: Avoid rate limit errors through intelligent throttling
Optimized Performance: Adaptive strategies adjust to service response times
Better Monitoring: Detailed statistics on API usage and throttling

Quick Start Examples

Using Preset Configurations:

# OpenAI GPT-4 with recommended limits
python raft.py --datapath docs/ --output training_data/ \
  --rate-limit --rate-limit-preset openai_gpt4

# Azure OpenAI with conservative limits  
python raft.py --datapath docs/ --output training_data/ \
  --rate-limit --rate-limit-preset azure_openai_standard

# Anthropic Claude with aggressive processing
python raft.py --datapath docs/ --output training_data/ \
  --rate-limit --rate-limit-preset anthropic_claude

Custom Rate Limiting:

# Custom limits for your specific API tier
python raft.py --datapath docs/ --output training_data/ \
  --rate-limit \
  --rate-limit-strategy sliding_window \
  --rate-limit-requests-per-minute 100 \
  --rate-limit-tokens-per-minute 5000 \
  --rate-limit-max-burst 20

# Adaptive rate limiting (adjusts based on response times)
python raft.py --datapath docs/ --output training_data/ \
  --rate-limit --rate-limit-strategy adaptive \
  --rate-limit-requests-per-minute 200

Rate Limiting Strategies

Sliding Window (Recommended)
- Best for: Most production use cases
- How it works: Tracks requests over a rolling time window
- Advantages: Smooth rate distribution, handles bursts well
Fixed Window
- Best for: Simple rate limiting scenarios
- How it works: Resets limits at fixed intervals (every minute)
- Advantages: Simple to understand, predictable behavior
Token Bucket
- Best for: Bursty workloads with occasional high throughput needs
- How it works: Accumulates "tokens" over time, consumes them for requests
- Advantages: Allows controlled bursts above average rate
Adaptive
- Best for: Unknown or variable API performance
- How it works: Automatically adjusts rate based on response times
- Advantages: Self-tuning, optimizes for service performance

Available Presets

Preset	Service	Requests/min	Tokens/min	Burst	Use Case
`openai_gpt4`	OpenAI GPT-4	500	10,000	50	Production GPT-4
`openai_gpt35_turbo`	OpenAI GPT-3.5 Turbo	3,500	90,000	100	High-throughput GPT-3.5
`azure_openai_standard`	Azure OpenAI	120	6,000	20	Standard Azure tier
`anthropic_claude`	Anthropic Claude	1,000	100,000	50	Claude API
`conservative`	Any service	60	2,000	10	Safe/cautious processing
`aggressive`	Any service	1,000	50,000	100	Fast processing

Enhanced Logging

The RAFT Toolkit features a comprehensive logging system designed for production use, debugging, and integration with external monitoring tools.

🚀 Production Deployment

Docker with Enhanced Logging:

# docker-compose.yml
version: '3.8'
services:
  raft-toolkit:
    environment:
      RAFT_LOG_LEVEL: INFO
      RAFT_LOG_FORMAT: json
      RAFT_LOG_OUTPUT: both
      RAFT_SENTRY_DSN: ${SENTRY_DSN}
    volumes:
      - ./logs:/app/logs

Kubernetes ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: raft-logging-config
data:
  RAFT_LOG_LEVEL: "INFO"
  RAFT_LOG_FORMAT: "json"
  RAFT_LOG_OUTPUT: "both"
  RAFT_LOG_STRUCTURED: "true"

File Utilities

Split large JSONL files:

from raft_toolkit.core.utils.file_utils import split_jsonl_file
split_jsonl_file('yourfile.jsonl', max_size=50_000_000)

Extract random rows:

from raft_toolkit.core.utils.file_utils import extract_random_jsonl_rows
extract_random_jsonl_rows('yourfile.jsonl', 100, 'sampled_output.jsonl')

🏗️ Architecture & Development

Project Structure

raft-toolkit/
├── 📁 raft_toolkit/              # Main package
│   ├── 📁 core/                  # Core business logic
│   │   ├── clients/              # External API clients
│   │   ├── config.py             # Configuration management
│   │   ├── formatters/           # Dataset format converters
│   │   ├── models.py             # Data models and schemas
│   │   ├── raft_engine.py       # Main orchestration engine
│   │   ├── security.py          # Security utilities
│   │   └── services/             # Business services
│   │       ├── dataset_service.py    # Dataset operations
│   │       ├── document_service.py   # Document processing
│   │       └── llm_service.py       # LLM interactions
│   ├── 📁 cli/                   # Command-line interface
│   │   └── main.py               # CLI entry point
│   ├── 📁 web/                   # Web interface
│   │   ├── app.py                # FastAPI application
│   │   └── static/               # Frontend assets
│   ├── 📁 tools/                 # Standalone evaluation tools
│   │   ├── eval.py               # Dataset evaluation
│   │   ├── answer.py             # Answer generation
│   │   └── pfeval_*.py          # PromptFlow evaluations
│   └── 📁 templates/             # Prompt templates
├── 📁 tests/                     # Comprehensive test suite
│   ├── unit/                     # Unit tests
│   ├── integration/              # Integration tests
│   ├── api/                      # API tests
│   └── cli/                      # CLI tests
├── 📁 docs/                      # Documentation
│   ├── WEB_INTERFACE.md          # Web UI guide
│   ├── DEPLOYMENT.md             # Deployment instructions
│   ├── CONFIGURATION.md          # Configuration reference
│   └── TEST_DIRECTORIES.md      # Test configuration guide
├── 📁 .github/                   # CI/CD workflows
│   └── workflows/
│       ├── build.yml             # Build & code quality
│       ├── test.yml              # Comprehensive testing
│       ├── release.yml           # Release automation
│       └── security.yml          # Security scanning
├── 🐳 docker-compose.yml         # Multi-service orchestration
├── 🐳 docker-compose.test.yml    # Testing environment
├── 🐳 Dockerfile                 # Multi-stage container builds
├── 🔧 requirements*.txt          # Python dependencies
├── ⚙️ .env.example              # Environment template
├── ⚙️ .env.test.example         # Test configuration template
├── 🧪 run_tests.py              # Test runner with configurable directories
├── 🌐 run_web.py                # Web server launcher
├── 📋 raft.py                   # Legacy CLI entry point
└── 📖 README.md                 # This documentation

Architecture Overview

This toolkit follows 12-factor app principles with a modular architecture:

raft-toolkit/
├── raft_toolkit/           # Main package
│   ├── core/              # Shared business logic
│   │   ├── config.py      # Configuration management
│   │   ├── models.py      # Data models
│   │   ├── raft_engine.py # Main orchestration
│   │   └── services/      # Business services
│   ├── cli/               # Command-line interface
│   ├── web/               # Web interface & API
│   └── tools/             # Evaluation tools
├── raft.py                # CLI entry point
├── run_web.py             # Web entry point
└── docker-compose.yml     # Container orchestration

Benefits:

✅ Separation of Concerns: UI and business logic decoupled
✅ Environment Parity: Same code for dev/prod
✅ Configuration via Environment: 12-factor compliance
✅ Horizontal Scaling: Stateless design
✅ Container Ready: Docker & Kubernetes support

See ARCHITECTURE.md for detailed technical documentation.

🧪 Testing

The toolkit includes a comprehensive test suite covering unit tests, integration tests, API tests, and CLI tests.

Running Tests

# Install test dependencies
pip install -r requirements-test.txt

# Run all tests
python run_tests.py

# Run specific test categories
python run_tests.py --unit           # Unit tests only
python run_tests.py --integration    # Integration tests only
python run_tests.py --api            # API tests only
python run_tests.py --cli            # CLI tests only

# Run with coverage
python run_tests.py --coverage

# Run with verbose output
python run_tests.py --verbose

Test Categories

Unit Tests: Core functionality and business logic
Integration Tests: Service interactions and data flow
API Tests: Web interface endpoints and responses
CLI Tests: Command-line interface validation

Configurable Test Directories:

Configure test directories via CLI arguments or environment variables:

# Custom directories via CLI
python run_tests.py --integration \
  --output-dir ./ci-results \
  --temp-dir /tmp/fast-ssd \
  --coverage-dir ./coverage

# Via environment variables
export TEST_OUTPUT_DIR=./my-results
export TEST_TEMP_DIR=/tmp/my-temp
export TEST_COVERAGE_DIR=./coverage
python run_tests.py --coverage

# Docker testing with custom directories
export HOST_TEST_RESULTS_DIR=/shared/test-results
docker compose -f docker-compose.test.yml up

See Test Directories Configuration Guide for complete configuration guide.

Dependency Troubleshooting

If you encounter dependency conflicts during installation:

# Run dependency checker
python scripts/check_dependencies.py

# Check for conflicts
pip check

# Clean installation
pip install -r requirements.txt --force-reinstall

See Dependency Troubleshooting Guide for comprehensive troubleshooting guide.

Docker Testing

# Run tests in Docker environment
docker compose -f docker-compose.test.yml up --abort-on-container-exit

# Specific test suites
docker compose -f docker-compose.test.yml run raft-test-unit
docker compose -f docker-compose.test.yml run raft-test-integration

Code Quality

# Install code quality tools
pip install -r requirements-test.txt

# Run linting
flake8 .
black --check .
isort --check-only .
mypy .

# Auto-format code
black .
isort .

Security Scanning

# Install security tools
pip install bandit safety

# Run security scans
bandit -r . -f json -o security-report.json
safety scan -r requirements.txt

See TESTING.md for detailed testing documentation.

🛠️ Command Line Tools

The RAFT Toolkit includes powerful command-line tools for evaluating and analyzing datasets. These tools are automatically installed as console commands when you install the package.

Available Tools

After installation, the following tools are available from anywhere in your terminal:

raft-eval - Dataset evaluation with parallel processing
raft-answer - Answer generation for evaluation datasets
raft-pfeval-chat - PromptFlow chat format evaluation
raft-pfeval-completion - PromptFlow completion evaluation
raft-pfeval-local - Local evaluation without API calls

Quick Examples

# Evaluate model performance on a dataset
raft-eval --question-file questions.jsonl --workers 8

# Generate answers using different models
raft-answer --input questions.jsonl --output answers.jsonl --model gpt-4

# Advanced PromptFlow evaluation
raft-pfeval-chat --input dataset.jsonl --output detailed_results.json

Complete Workflow

# 1. Generate dataset with main RAFT toolkit
raft --datapath document.pdf --output evaluation_data

# 2. Generate answers using the tools
raft-answer --input evaluation_data/questions.jsonl --output generated_answers.jsonl --workers 8

# 3. Evaluate performance
raft-eval --question-file evaluation_data/questions.jsonl --answer-file generated_answers.jsonl

# 4. Advanced PromptFlow evaluation
raft-pfeval-chat --input generated_answers.jsonl --output detailed_evaluation.json

📚 Complete Tools Documentation: For detailed usage instructions, configuration options, and advanced workflows, see docs/TOOLS.md.

🛠️ Fine-tuning & Evaluation

Model Fine-tuning

See Deployment Guide for Azure AI Studio fine-tuning guidance
Use generated datasets with popular fine-tuning frameworks:
- HuggingFace Transformers
- OpenAI Fine-tuning API
- Azure AI Studio
- Local training with LoRA/QLoRA

Legacy Tool Usage

The original Python scripts are still available in the tools/ directory:

# Navigate to tools directory
cd tools/

# Basic evaluation
python eval.py --question-file YOUR_EVAL_FILE.jsonl --answer-file YOUR_ANSWER_FILE

# PromptFlow evaluations
python pfeval_chat.py --input dataset.jsonl --output results.json
python pfeval_completion.py --input dataset.jsonl --output results.json
python pfeval_local.py --input dataset.jsonl --output results.json --mode local

# Answer generation
python answer.py --input questions.jsonl --output answers.jsonl --model gpt-4

Evaluation Metrics:

Relevance: How relevant is the answer to the question?
Groundedness: Is the answer grounded in the provided context?
Fluency: How fluent and natural is the language?
Coherence: How coherent and logical is the response?
Similarity: How similar is the answer to reference answers?

🚀 Deployment

📋 Complete Deployment Guide: For detailed deployment instructions including Docker, Kubernetes, cloud platforms, CI/CD integration, and production configurations, see docs/DEPLOYMENT.md.

Quick Deployment Options:

🐳 Docker: docker compose up -d for containerized deployment
☸️ Kubernetes: Multi-cloud support for production scaling
☁️ Cloud Platforms: AWS ECS, Azure Container Apps, Google Cloud Run
🔄 CI/CD: GitHub Actions, GitLab CI, Jenkins integration
🔒 Security: Container scanning, network policies, secret management

Local Development:

# Development mode with auto-reload
python run_web.py --debug

# Production mode
python run_web.py --host 0.0.0.0 --port 8000

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
.github		.github
deploy		deploy
docs		docs
k8s		k8s
notebooks		notebooks
raft_toolkit		raft_toolkit
scripts		scripts
tests		tests
.bandit		.bandit
.dockerignore		.dockerignore
.env.example		.env.example
.env.test.example		.env.test.example
.flake8		.flake8
.gitignore		.gitignore
.python-version		.python-version
.safety-policy.yml		.safety-policy.yml
.trivyignore		.trivyignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
COMBINED_RELEASES.md		COMBINED_RELEASES.md
Dockerfile		Dockerfile
Dockerfile.windows		Dockerfile.windows
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
raft.py		raft.py
raft_toolkit-0.2.0-py3-none-any.whl		raft_toolkit-0.2.0-py3-none-any.whl
requirements-cloud.txt		requirements-cloud.txt
requirements-dev.txt		requirements-dev.txt
requirements-k8s.txt		requirements-k8s.txt
requirements-test.txt		requirements-test.txt
requirements-web.txt		requirements-web.txt
requirements.in		requirements.in
requirements.txt		requirements.txt
run_coverage.py		run_coverage.py
run_stable_tests.py		run_stable_tests.py
run_tests.py		run_tests.py
run_web.py		run_web.py
test_docker_setup.py		test_docker_setup.py
verify_python.py		verify_python.py

MakerCorn/raft-toolkit

Folders and files

Latest commit

History

Repository files navigation

RAFT Toolkit

📋 Table of Contents

🚀 Overview

What is RAFT?

RAFT Training Process Flow

Key Features

RAFT vs Traditional RAG

📦 Installation

Quick Start

Installation Options

🚀 Core Installation (Fastest - ~30-60 seconds)

📊 Standard Installation (Recommended)

🌐 Complete Installation

🛠️ Development Installation

🎯 Custom Combinations

🐳 Docker Installation

🌐 Usage

Web Interface

Command Line Interface

Using Ollama for Local Models

📝 RAFT Training Guide

Best Practices

Document Preparation

Question Generation

Dataset Composition

Quality Assurance

Chunking Strategies

📏 Chunk Size Guidelines

🔄 Overlap Strategy

❓ Questions Per Chunk

🎭 Distractor Configuration

⚙️ Strategy-Specific Recommendations

The RAFT Process

1. Training Data Generation (This Toolkit)

2. Model Fine-Tuning

3. Evaluation & Iteration

📝 Template System

Default Template Behavior

Available Templates

Embedding Templates

Question-Answer Generation Templates

Template Configuration

Template Variables

Embedding Templates

QA Generation Templates

Domain-Specific Examples

Medical Documents

Legal Documents

Technical Documentation

🔧 Advanced Configuration

Rate Limiting

Why Rate Limiting Matters

Quick Start Examples

Rate Limiting Strategies

Available Presets

Enhanced Logging

🚀 Production Deployment

File Utilities

🏗️ Architecture & Development

Project Structure

Architecture Overview

🧪 Testing

Running Tests

Test Categories

Dependency Troubleshooting

Docker Testing

Code Quality

Security Scanning

🛠️ Command Line Tools

Available Tools

Quick Examples

Complete Workflow

🛠️ Fine-tuning & Evaluation

Model Fine-tuning

Legacy Tool Usage

Packages