The Claude Task Fine-Tuning Dataset is designed for training RuvLTRA models to intelligently route tasks to appropriate Claude Flow agents and select optimal Claude models (Haiku/Sonnet/Opus) based on task complexity.
Agent: coder
Focus: Code generation, debugging, refactoring
Model Routing:
- Simple: Haiku (quick fixes, simple functions)
- Moderate: Sonnet (component development, API integration)
- Complex: Opus (complex algorithms, system-level code)
Example Tasks:
- Implement authentication middleware
- Debug race condition in concurrent code
- Refactor monolithic service into microservices
- Write unit tests with 90% coverage
Agent: researcher
Focus: Analysis, exploration, documentation
Model Routing:
- Simple: Haiku (basic documentation)
- Moderate: Sonnet (most research tasks)
- Complex: Sonnet (deep analysis)
Example Tasks:
- Analyze performance bottlenecks
- Research best practices for GraphQL
- Document API endpoints
- Compare database solutions
Agent: security
Focus: Audit, vulnerability analysis, threat detection
Model Routing:
- All: Opus (security requires highest quality)
Example Tasks:
- Audit authentication flow for vulnerabilities
- Review cryptographic implementation
- Identify SQL injection vectors
- Ensure GDPR compliance
Agent: architecture
Focus: System design, planning, architecture
Model Routing:
- Simple: Sonnet (basic schemas)
- Moderate: Opus (microservices, APIs)
- Complex: Opus (distributed systems)
Example Tasks:
- Design microservices architecture
- Plan database schema for e-commerce
- Architect caching strategy
- Design disaster recovery system
Agent: reviewer
Focus: Code review, quality assessment
Model Routing:
- Simple: Haiku (standards compliance)
- Moderate: Sonnet (quality review, performance)
- Complex: Sonnet (architecture review)
Example Tasks:
- Review pull request for best practices
- Assess code quality and maintainability
- Review error handling patterns
- Analyze scalability of design
Each line in the JSONL file represents a single training example:
{
"input": "Implement async authentication middleware in TypeScript for JWT validation",
"context": "The middleware should verify JWT tokens from Bearer header, check expiration, and validate signature using RS256",
"output_agent": "coder",
"metadata": {
"category": "Coder",
"complexity": "Moderate",
"domain": "Web",
"expected_model": "sonnet",
"quality_score": 0.87,
"tags": ["authentication", "middleware", "jwt", "security"]
}
}Type: String Description: The task description or request from the user. This is what the model receives as input.
Type: String Description: Additional context, requirements, constraints, or details about the task. Provides necessary background information.
Type: String
Enum: "coder", "researcher", "security", "architecture", "reviewer"
Description: The expected agent that should handle this task.
Type: TaskCategory enum
Values: Coder, Researcher, Security, Architecture, Reviewer
Description: Primary task category
Type: ComplexityLevel enum
Values: Simple, Moderate, Complex
Description: Task complexity level determining model selection
Type: DomainType enum
Values: Web, Systems, DataScience, Mobile, DevOps, Security, Database, Api
Description: Technical domain context
Type: String
Values: "haiku", "sonnet", "opus"
Description: Recommended Claude model for this task based on complexity and category
Cost Optimization:
- Haiku: ~75% cheaper than Opus, 2-3x faster
- Sonnet: Balanced cost/quality, handles most tasks
- Opus: Highest quality, use for complex/critical tasks
Type: Float (0.0-1.0) Description: Quality rating of this training example. Higher scores indicate more reliable examples for training.
Type: Array of strings Description: Descriptive tags for filtering and analysis
The dataset generator applies three augmentation techniques:
Purpose: Increase linguistic diversity Method: Synonym replacement, phrase restructuring Example:
- Original: "Implement a function to validate user input"
- Paraphrased: "Create a function to validate user input"
Purpose: Create training examples at different complexity levels Method: Vary complexity while keeping core task same Example:
- Simple: "Add error handling to API endpoint"
- Moderate: "Implement comprehensive error handling with retry logic"
- Complex: "Design fault-tolerant error handling with circuit breakers"
Purpose: Generalize across technical domains Method: Apply same task pattern to different domains Example:
- Web: "Optimize React component rendering"
- Mobile: "Optimize Flutter widget rendering"
- Systems: "Optimize kernel thread scheduling"
Typical generated dataset (100 base examples per category + augmentation):
Total Examples: ~1,500 (500 base + 1,000 augmented)
By Category:
- Coder: ~300 (20%)
- Researcher: ~300 (20%)
- Security: ~300 (20%)
- Architecture: ~300 (20%)
- Reviewer: ~300 (20%)
By Complexity:
- Simple: ~500 (33%)
- Moderate: ~600 (40%)
- Complex: ~400 (27%)
By Model:
- Haiku: ~400 (27%) - Cost-effective for simple tasks
- Sonnet: ~700 (47%) - Balanced for most tasks
- Opus: ~400 (27%) - High-quality for complex/security
Recommended split ratios:
- Training: 70% (~1,050 examples)
- Validation: 15% (~225 examples)
- Test: 15% (~225 examples)
Stratified sampling ensures balanced representation across categories and complexity levels.
Each training example includes a quality score (0.0-1.0) based on:
-
Template Quality (0.8-0.96)
- Seed templates: Hand-crafted, highest quality
- Paraphrased: Slightly lower due to automated generation
-
Category Appropriateness
- Security tasks: Higher scores (0.90-0.96)
- Code generation: Good scores (0.83-0.90)
-
Complexity Alignment
- Well-defined complexity: Higher scores
- Ambiguous complexity: Lower scores
Train model to predict output_agent given input and context.
# Pseudo-code
def train_task_router(dataset):
for example in dataset:
x = embed(example.input + example.context)
y = encode_agent(example.output_agent)
model.train(x, y)Train model to predict expected_model given task characteristics.
# Pseudo-code
def train_model_selector(dataset):
for example in dataset:
features = extract_features(example.input, example.context)
complexity = encode_complexity(example.metadata.complexity)
category = encode_category(example.metadata.category)
x = [features, complexity, category]
y = encode_model(example.metadata.expected_model)
model.train(x, y)- One example per line
- Memory-efficient streaming
- Standard for LLM fine-tuning
- File:
claude_training_full.jsonl
- Full array of examples
- Human-readable
- Good for inspection
- File:
claude_training_full.json
- Columnar format
- Highly compressed
- Fast for analytics
- Integration with Arrow/Polars
use ruvllm::training::{DatasetGenerator, DatasetConfig};
// Configure dataset
let config = DatasetConfig {
examples_per_category: 100,
enable_augmentation: true,
..Default::default()
};
// Generate dataset
let mut generator = DatasetGenerator::new(config);
let dataset = generator.generate();
// Export to JSONL
dataset.export_jsonl("training.jsonl")?;
// Split for training
let (train, val, test) = dataset.split(0.7, 0.15, 0.15, 42);The dataset is designed for fine-tuning RuvLTRA models with:
-
Task Embedding Layer
- Input: Task description + context
- Output: 768-dim semantic embedding
-
Agent Classification Head
- Input: Task embedding
- Output: 5-way classification (5 agent types)
-
Model Selection Head
- Input: Task embedding + complexity features
- Output: 3-way classification (Haiku/Sonnet/Opus)
-
Quality Prediction Head
- Input: Task embedding
- Output: Quality score (0-1)
Current Version: 1.0.0 Format Version: 1.0 Last Updated: 2024-01
Training data follows the same license as RuvLTRA (MIT/Apache-2.0).
- Claude Flow Documentation: https://github.com/ruvnet/claude-flow
- RuvLTRA Architecture:
../crates/ruvllm/README.md - SONA Learning:
../crates/sona/README.md