🧬 ScientificLLM-Forge

A Complete MLOps Platform for Scientific Language Model Development & Deployment

🚀 Overview

ScientificLLM-Forge is a production-ready MLOps platform specifically designed for fine-tuning and deploying Large Language Models on scientific datasets. It provides a complete "papers → training → serving" workflow with memory-efficient QLoRA fine-tuning, genomics-specific processing, and production-grade inference serving.

🎯 Key Achievements

✅ Memory-Efficient Training: QLoRA fine-tuning of 7B models on <16GB GPU memory
✅ Scientific Data Processing: Automated genomics paper processing and quality scoring
✅ Production Inference Server: FastAPI-based REST API with real-time monitoring
✅ Comprehensive Testing: 82% test coverage with async test support
✅ Complete MLOps Pipeline: End-to-end workflow from data to deployment

🌟 Features

🔬 Scientific Data Processing

PubMed Integration: Automated collection of high-quality genomics papers
Quality Scoring: AI-powered paper quality assessment and filtering
Scientific Text Processing: Citation removal, notation normalization, terminology preservation
Benchmark Detection: Automatic identification of datasets and benchmarks used

🧠 Advanced Model Training

QLoRA Fine-tuning: Memory-efficient parameter-efficient fine-tuning
Enhanced Trainer: Extended training pipeline with gradient checkpointing
Multi-GPU Support: Distributed training with DeepSpeed integration
MLflow Tracking: Comprehensive experiment tracking and model versioning
Dynamic Batch Sizing: Automatic memory optimization during training

🚀 Production Inference Server

FastAPI REST API: Production-ready async inference server
Genomics Endpoints: Specialized endpoints for gene queries and pathway analysis
Paper Analysis: Scientific paper summarization and key findings extraction
Performance Monitoring: Real-time metrics, health checks, and auto-scaling
Memory Optimization: Efficient model loading with quantization support

🛠️ MLOps Infrastructure

Configuration Management: YAML-based configuration system
Comprehensive Logging: Structured logging with performance metrics
Testing Framework: Unit, integration, and async test suites
CI/CD Ready: Pre-commit hooks and automated quality checks
Docker Support: Containerized deployment configurations

📁 Project Structure

scientific-llm-forge/
├── 📂 src/                           # Core source code
│   ├── 📊 data/                     # Scientific data processing
│   │   ├── scientific_dataset.py    # High-quality paper dataset loader
│   │   ├── text_processor.py        # Genomics-specific text preprocessing
│   │   ├── pubmed_client.py         # PubMed API integration
│   │   └── quality_scorer.py        # AI-powered paper quality assessment
│   ├── 🧠 models/                   # Advanced model training
│   │   ├── enhanced_trainer.py      # QLoRA-enhanced training pipeline
│   │   ├── model_loader.py          # Memory-efficient model loading
│   │   ├── qlora_config.py          # QLoRA configuration management
│   │   └── checkpoint_manager.py    # Model checkpoint handling
│   ├── 🚀 serving/                  # Production inference server
│   │   ├── inference_server.py      # FastAPI inference server
│   │   ├── api.py                   # API endpoint definitions
│   │   └── deployment.py           # Deployment configurations
│   └── 🛠️ utils/                    # Utility functions
│       ├── logger.py                # Structured logging
│       ├── config.py                # Configuration management
│       └── metrics.py               # Performance metrics
├── 📋 configs/                      # Configuration files
│   ├── training.yaml                # Training configuration
│   ├── serving.yaml                 # Serving configuration
│   └── deepspeed_config.json        # Distributed training config
├── 🧪 tests/                        # Comprehensive test suite
│   ├── test_inference_server.py     # FastAPI server tests (21 tests)
│   ├── test_enhanced_trainer.py     # Training pipeline tests
│   └── test_*.py                    # Component-specific tests
├── 📚 examples/                     # Usage examples and demos
│   ├── inference_server_example.py  # FastAPI server demo
│   ├── enhanced_trainer_example.py  # Training pipeline demo
│   └── *.py                         # Component examples
├── 📜 scripts/                      # Automation scripts
├── 📄 .kiro/specs/                  # Feature specifications
└── 📦 requirements.txt              # Python dependencies

🚀 Quick Start

Prerequisites

Python 3.8+ (3.12 recommended)
CUDA-capable GPU (optional, for training)
16GB+ RAM (8GB minimum)
Git

⚡ One-Command Setup

# Clone and setup everything
git clone https://github.com/scientificllmforge/scientific-llm-forge.git
cd scientific-llm-forge
python scripts/setup_dev.py && source activate_dev.sh

This automatically:

✅ Creates Python virtual environment
✅ Installs all dependencies (PyTorch, FastAPI, Transformers, etc.)
✅ Sets up pre-commit hooks for code quality
✅ Creates example configurations
✅ Initializes logging and monitoring

🔥 Complete Workflow Demo

1. Start the Inference Server

# Start FastAPI server with auto-reload
python -m src.serving.inference_server
# Server starts at: http://localhost:8000
# API docs at: http://localhost:8000/docs

2. Load a Fine-tuned Model

# Load model from checkpoint
curl -X POST "http://localhost:8000/api/v1/load-model" \
  -H "Content-Type: application/json" \
  -d '{"model_path": "/path/to/checkpoint", "use_quantization": true}'

3. Query Genomics Information

# Ask genomics questions
curl -X POST "http://localhost:8000/api/v1/genomics/query" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the function of BRCA1?", "query_type": "gene_function"}'

4. Analyze Scientific Papers

# Analyze research papers
curl -X POST "http://localhost:8000/api/v1/papers/analyze" \
  -H "Content-Type: application/json" \
  -d '{"title": "CRISPR gene editing study", "abstract": "...", "analysis_type": "summary"}'

🧠 Training Your Own Model

Memory-Efficient QLoRA Fine-tuning

# Train 7B model on <16GB GPU
python examples/enhanced_trainer_example.py

# Or use the training script
python scripts/train.py --config configs/training.yaml

Process Scientific Data

# Load and process genomics papers
python examples/scientific_dataset_example.py

# Collect data from PubMed
python scripts/collect_pubmed_data.py --query "CRISPR genomics" --max_papers 1000

🧪 Run Tests

# Run all tests (82% coverage)
pytest tests/ -v

# Run specific component tests
pytest tests/test_inference_server.py -v  # 21 FastAPI tests
pytest tests/test_enhanced_trainer.py -v  # Training pipeline tests
pytest tests/test_scientific_dataset.py -v  # Data processing tests

🔧 Configuration

Training Configuration (`configs/training.yaml`)

training:
  model:
    name: "meta-llama/Llama-2-7b-hf"  # Support for LLaMA-2
    model_type: "llama"
    max_length: 2048
    
  # QLoRA configuration for memory efficiency
  qlora:
    enabled: true
    r: 16                    # LoRA rank
    lora_alpha: 32          # LoRA scaling
    lora_dropout: 0.1       # LoRA dropout
    target_modules: ["q_proj", "v_proj", "k_proj", "o_proj"]
    quantization:
      load_in_4bit: true
      bnb_4bit_compute_dtype: "float16"
      bnb_4bit_use_double_quant: true
      bnb_4bit_quant_type: "nf4"
      
  # Scientific data processing
  scientific_data:
    data_file: "data/high_quality_papers_demo.json"
    text_fields: ["title", "abstract", "full_text"]
    preprocessing:
      remove_citations: true
      normalize_scientific_notation: true
      handle_special_tokens: true
      
  # MLflow experiment tracking
  mlflow:
    enabled: true
    experiment_name: "genomics-llm-finetuning"
    tracking_uri: "file:./mlruns"
    
  hyperparameters:
    learning_rate: 2e-4
    batch_size: 4           # Optimized for memory
    gradient_accumulation_steps: 4
    num_epochs: 3
    warmup_steps: 100
    weight_decay: 0.01

Serving Configuration (`configs/serving.yaml`)

serving:
  server:
    host: "0.0.0.0"
    port: 8000
    workers: 1
    reload: false
    
  model:
    checkpoint_path: "/path/to/fine-tuned/model"
    use_quantization: true
    device: "auto"          # Auto-detect GPU/CPU
    
  inference:
    max_length: 512
    temperature: 0.7
    top_p: 0.9
    batch_size: 8
    
  monitoring:
    enable_metrics: true
    log_requests: true
    health_check_interval: 30

DeepSpeed Configuration (`configs/deepspeed_config.json`)

{
  "train_batch_size": 16,
  "gradient_accumulation_steps": 4,
  "optimizer": {
    "type": "AdamW",
    "params": {
      "lr": 2e-4,
      "weight_decay": 0.01
    }
  },
  "fp16": {
    "enabled": true,
    "loss_scale": 0,
    "loss_scale_window": 1000,
    "hysteresis": 2,
    "min_loss_scale": 1
  },
  "zero_optimization": {
    "stage": 2,
    "allgather_partitions": true,
    "allgather_bucket_size": 2e8,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 2e8,
    "contiguous_gradients": true
  }
}

🏗️ Architecture & Components

FastAPI Inference Server (`src/serving/inference_server.py`)

# Production-ready async inference server
from src.serving.inference_server import app, run_server

# Start server
run_server(host="0.0.0.0", port=8000, workers=4)

Key Endpoints:

POST /api/v1/generate - General text generation
POST /api/v1/genomics/query - Genomics-specific queries
POST /api/v1/papers/analyze - Scientific paper analysis
GET /api/v1/health - Health checks and monitoring
GET /api/v1/metrics - Performance metrics
GET /docs - Interactive API documentation

Enhanced Training Pipeline (`src/models/enhanced_trainer.py`)

# Memory-efficient QLoRA training
from src.models.enhanced_trainer import EnhancedModelTrainer
from src.models.qlora_config import QLoRAConfig

# Configure QLoRA for 7B model on <16GB GPU
config = QLoRAConfig(
    r=16, lora_alpha=32, 
    load_in_4bit=True,
    target_modules=["q_proj", "v_proj"]
)

trainer = EnhancedModelTrainer(config)
trainer.train()  # Memory-efficient training

Scientific Data Processing (`src/data/`)

# Load and process genomics papers
from src.data.scientific_dataset import ScientificDataModule
from src.data.text_processor import ScientificTextProcessor

# Load high-quality papers
data_module = ScientificDataModule()
papers = data_module.load_high_quality_papers("data/papers.json")

# Process scientific text
processor = ScientificTextProcessor()
processed_text = processor.preprocess_scientific_text(paper_text)

🧪 Testing & Quality Assurance

Comprehensive Test Suite

✅ 21 FastAPI server tests (82% coverage)
✅ Async test support with pytest-asyncio
✅ Mock-based testing for model interactions
✅ Integration tests for end-to-end workflows
✅ Performance benchmarking tests

Code Quality Tools

Black: Automatic code formatting
isort: Import organization
flake8: Code linting and style checks
mypy: Static type checking
pre-commit: Automated quality checks

Running Tests

# Full test suite with coverage
pytest tests/ --cov=src --cov-report=html

# Specific component tests
pytest tests/test_inference_server.py -v    # FastAPI server (21 tests)
pytest tests/test_enhanced_trainer.py -v    # Training pipeline
pytest tests/test_scientific_dataset.py -v  # Data processing
pytest tests/test_qlora_config.py -v        # QLoRA configuration

# Performance and integration tests
pytest tests/ -m "not slow"  # Skip slow tests
pytest tests/ -m "integration"  # Run integration tests only

📊 Performance & Benchmarks

Memory Efficiency

✅ 7B Parameter Models: Fine-tune on <16GB GPU memory
✅ QLoRA Optimization: 4-bit quantization with LoRA adapters
✅ Dynamic Batch Sizing: Automatic memory optimization
✅ Gradient Checkpointing: Reduced memory footprint during training

Inference Performance

⚡ Async Processing: FastAPI async request handling
⚡ Batch Support: Multiple queries processed simultaneously
⚡ Memory Monitoring: Real-time GPU memory tracking
⚡ Auto-scaling: Dynamic resource allocation based on load

Scientific Accuracy

🎯 Genomics Specialization: Domain-specific query processing
🎯 Citation Handling: Proper scientific text preprocessing
🎯 Benchmark Detection: Automatic dataset identification
🎯 Quality Scoring: AI-powered paper quality assessment

🚀 Production Deployment

Docker Deployment

# Dockerfile example
FROM python:3.12-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY src/ ./src/
COPY configs/ ./configs/

EXPOSE 8000
CMD ["python", "-m", "src.serving.inference_server"]

Kubernetes Deployment

# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: scientific-llm-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: scientific-llm
  template:
    metadata:
      labels:
        app: scientific-llm
    spec:
      containers:
      - name: inference-server
        image: scientific-llm-forge:latest
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "8Gi"
            nvidia.com/gpu: 1
          limits:
            memory: "16Gi"
            nvidia.com/gpu: 1

Monitoring & Observability

# Health monitoring
curl http://localhost:8000/api/v1/health

# Performance metrics
curl http://localhost:8000/api/v1/metrics

# MLflow experiment tracking
mlflow ui --backend-store-uri ./mlruns

🤝 Contributing

We welcome contributions! Here's how to get started:

Development Workflow

Fork the repository
Create feature branch: git checkout -b feature/amazing-feature
Implement your changes with tests
Run quality checks: pytest tests/ && pre-commit run --all-files
Submit pull request with detailed description

Contribution Areas

🔬 Scientific Domain Expertise: Add new scientific domains beyond genomics
🧠 Model Architectures: Support for new LLM architectures
📊 Data Processing: Enhanced scientific text processing techniques
🚀 Deployment: Kubernetes, cloud deployment configurations
🧪 Testing: Additional test coverage and benchmarks

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support & Community

📧 Issues: GitHub Issues
💬 Discussions: GitHub Discussions
📚 Documentation: Full Documentation
🐦 Updates: Follow @ScientificLLM

🗺️ Roadmap

Completed ✅

✅ FastAPI Inference Server - Production-ready REST API
✅ QLoRA Fine-tuning - Memory-efficient training pipeline
✅ Scientific Data Processing - Genomics paper processing
✅ Comprehensive Testing - 82% test coverage
✅ MLflow Integration - Experiment tracking
✅ Performance Monitoring - Real-time metrics

In Progress 🚧

🚧 Distributed Training - Multi-GPU DeepSpeed integration
🚧 Model Versioning - Advanced checkpoint management
🚧 Web Dashboard - Training and monitoring UI

Planned 📋

📋 Multi-Domain Support - Chemistry, biology, physics datasets
📋 Advanced Augmentation - Scientific text augmentation techniques
📋 Cloud Deployment - AWS, GCP, Azure deployment guides
📋 Kubernetes Operators - Native K8s integration
📋 Model Hub Integration - HuggingFace Hub publishing
📋 Real-time Streaming - WebSocket inference endpoints

🧬 ScientificLLM-Forge: Advancing Scientific Discovery Through AI 🚀

Built with ❤️ for the scientific research community

⭐ Star us on GitHub | 📖 Read the Docs | 🤝 Contribute

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.kiro/specs/llm-fine-tuning-pipeline		.kiro/specs/llm-fine-tuning-pipeline
configs		configs
data		data
examples		examples
logs		logs
outputs		outputs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
INFERENCE_SERVER_SUMMARY.md		INFERENCE_SERVER_SUMMARY.md
README.md		README.md
activate_dev.sh		activate_dev.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_pubmed_basic.py		test_pubmed_basic.py

srikarjy/llm-forge

Folders and files

Latest commit

History

Repository files navigation