A Complete MLOps Platform for Scientific Language Model Development & Deployment
ScientificLLM-Forge is a production-ready MLOps platform specifically designed for fine-tuning and deploying Large Language Models on scientific datasets. It provides a complete "papers β training β serving" workflow with memory-efficient QLoRA fine-tuning, genomics-specific processing, and production-grade inference serving.
- β Memory-Efficient Training: QLoRA fine-tuning of 7B models on <16GB GPU memory
- β Scientific Data Processing: Automated genomics paper processing and quality scoring
- β Production Inference Server: FastAPI-based REST API with real-time monitoring
- β Comprehensive Testing: 82% test coverage with async test support
- β Complete MLOps Pipeline: End-to-end workflow from data to deployment
- PubMed Integration: Automated collection of high-quality genomics papers
- Quality Scoring: AI-powered paper quality assessment and filtering
- Scientific Text Processing: Citation removal, notation normalization, terminology preservation
- Benchmark Detection: Automatic identification of datasets and benchmarks used
- QLoRA Fine-tuning: Memory-efficient parameter-efficient fine-tuning
- Enhanced Trainer: Extended training pipeline with gradient checkpointing
- Multi-GPU Support: Distributed training with DeepSpeed integration
- MLflow Tracking: Comprehensive experiment tracking and model versioning
- Dynamic Batch Sizing: Automatic memory optimization during training
- FastAPI REST API: Production-ready async inference server
- Genomics Endpoints: Specialized endpoints for gene queries and pathway analysis
- Paper Analysis: Scientific paper summarization and key findings extraction
- Performance Monitoring: Real-time metrics, health checks, and auto-scaling
- Memory Optimization: Efficient model loading with quantization support
- Configuration Management: YAML-based configuration system
- Comprehensive Logging: Structured logging with performance metrics
- Testing Framework: Unit, integration, and async test suites
- CI/CD Ready: Pre-commit hooks and automated quality checks
- Docker Support: Containerized deployment configurations
scientific-llm-forge/
βββ π src/ # Core source code
β βββ π data/ # Scientific data processing
β β βββ scientific_dataset.py # High-quality paper dataset loader
β β βββ text_processor.py # Genomics-specific text preprocessing
β β βββ pubmed_client.py # PubMed API integration
β β βββ quality_scorer.py # AI-powered paper quality assessment
β βββ π§ models/ # Advanced model training
β β βββ enhanced_trainer.py # QLoRA-enhanced training pipeline
β β βββ model_loader.py # Memory-efficient model loading
β β βββ qlora_config.py # QLoRA configuration management
β β βββ checkpoint_manager.py # Model checkpoint handling
β βββ π serving/ # Production inference server
β β βββ inference_server.py # FastAPI inference server
β β βββ api.py # API endpoint definitions
β β βββ deployment.py # Deployment configurations
β βββ π οΈ utils/ # Utility functions
β βββ logger.py # Structured logging
β βββ config.py # Configuration management
β βββ metrics.py # Performance metrics
βββ π configs/ # Configuration files
β βββ training.yaml # Training configuration
β βββ serving.yaml # Serving configuration
β βββ deepspeed_config.json # Distributed training config
βββ π§ͺ tests/ # Comprehensive test suite
β βββ test_inference_server.py # FastAPI server tests (21 tests)
β βββ test_enhanced_trainer.py # Training pipeline tests
β βββ test_*.py # Component-specific tests
βββ π examples/ # Usage examples and demos
β βββ inference_server_example.py # FastAPI server demo
β βββ enhanced_trainer_example.py # Training pipeline demo
β βββ *.py # Component examples
βββ π scripts/ # Automation scripts
βββ π .kiro/specs/ # Feature specifications
βββ π¦ requirements.txt # Python dependencies
- Python 3.8+ (3.12 recommended)
- CUDA-capable GPU (optional, for training)
- 16GB+ RAM (8GB minimum)
- Git
# Clone and setup everything
git clone https://github.com/scientificllmforge/scientific-llm-forge.git
cd scientific-llm-forge
python scripts/setup_dev.py && source activate_dev.shThis automatically:
- β Creates Python virtual environment
- β Installs all dependencies (PyTorch, FastAPI, Transformers, etc.)
- β Sets up pre-commit hooks for code quality
- β Creates example configurations
- β Initializes logging and monitoring
# Start FastAPI server with auto-reload
python -m src.serving.inference_server
# Server starts at: http://localhost:8000
# API docs at: http://localhost:8000/docs# Load model from checkpoint
curl -X POST "http://localhost:8000/api/v1/load-model" \
-H "Content-Type: application/json" \
-d '{"model_path": "/path/to/checkpoint", "use_quantization": true}'# Ask genomics questions
curl -X POST "http://localhost:8000/api/v1/genomics/query" \
-H "Content-Type: application/json" \
-d '{"query": "What is the function of BRCA1?", "query_type": "gene_function"}'# Analyze research papers
curl -X POST "http://localhost:8000/api/v1/papers/analyze" \
-H "Content-Type: application/json" \
-d '{"title": "CRISPR gene editing study", "abstract": "...", "analysis_type": "summary"}'# Train 7B model on <16GB GPU
python examples/enhanced_trainer_example.py
# Or use the training script
python scripts/train.py --config configs/training.yaml# Load and process genomics papers
python examples/scientific_dataset_example.py
# Collect data from PubMed
python scripts/collect_pubmed_data.py --query "CRISPR genomics" --max_papers 1000# Run all tests (82% coverage)
pytest tests/ -v
# Run specific component tests
pytest tests/test_inference_server.py -v # 21 FastAPI tests
pytest tests/test_enhanced_trainer.py -v # Training pipeline tests
pytest tests/test_scientific_dataset.py -v # Data processing teststraining:
model:
name: "meta-llama/Llama-2-7b-hf" # Support for LLaMA-2
model_type: "llama"
max_length: 2048
# QLoRA configuration for memory efficiency
qlora:
enabled: true
r: 16 # LoRA rank
lora_alpha: 32 # LoRA scaling
lora_dropout: 0.1 # LoRA dropout
target_modules: ["q_proj", "v_proj", "k_proj", "o_proj"]
quantization:
load_in_4bit: true
bnb_4bit_compute_dtype: "float16"
bnb_4bit_use_double_quant: true
bnb_4bit_quant_type: "nf4"
# Scientific data processing
scientific_data:
data_file: "data/high_quality_papers_demo.json"
text_fields: ["title", "abstract", "full_text"]
preprocessing:
remove_citations: true
normalize_scientific_notation: true
handle_special_tokens: true
# MLflow experiment tracking
mlflow:
enabled: true
experiment_name: "genomics-llm-finetuning"
tracking_uri: "file:./mlruns"
hyperparameters:
learning_rate: 2e-4
batch_size: 4 # Optimized for memory
gradient_accumulation_steps: 4
num_epochs: 3
warmup_steps: 100
weight_decay: 0.01serving:
server:
host: "0.0.0.0"
port: 8000
workers: 1
reload: false
model:
checkpoint_path: "/path/to/fine-tuned/model"
use_quantization: true
device: "auto" # Auto-detect GPU/CPU
inference:
max_length: 512
temperature: 0.7
top_p: 0.9
batch_size: 8
monitoring:
enable_metrics: true
log_requests: true
health_check_interval: 30{
"train_batch_size": 16,
"gradient_accumulation_steps": 4,
"optimizer": {
"type": "AdamW",
"params": {
"lr": 2e-4,
"weight_decay": 0.01
}
},
"fp16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 2e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 2e8,
"contiguous_gradients": true
}
}# Production-ready async inference server
from src.serving.inference_server import app, run_server
# Start server
run_server(host="0.0.0.0", port=8000, workers=4)Key Endpoints:
POST /api/v1/generate- General text generationPOST /api/v1/genomics/query- Genomics-specific queriesPOST /api/v1/papers/analyze- Scientific paper analysisGET /api/v1/health- Health checks and monitoringGET /api/v1/metrics- Performance metricsGET /docs- Interactive API documentation
# Memory-efficient QLoRA training
from src.models.enhanced_trainer import EnhancedModelTrainer
from src.models.qlora_config import QLoRAConfig
# Configure QLoRA for 7B model on <16GB GPU
config = QLoRAConfig(
r=16, lora_alpha=32,
load_in_4bit=True,
target_modules=["q_proj", "v_proj"]
)
trainer = EnhancedModelTrainer(config)
trainer.train() # Memory-efficient training# Load and process genomics papers
from src.data.scientific_dataset import ScientificDataModule
from src.data.text_processor import ScientificTextProcessor
# Load high-quality papers
data_module = ScientificDataModule()
papers = data_module.load_high_quality_papers("data/papers.json")
# Process scientific text
processor = ScientificTextProcessor()
processed_text = processor.preprocess_scientific_text(paper_text)- β 21 FastAPI server tests (82% coverage)
- β Async test support with pytest-asyncio
- β Mock-based testing for model interactions
- β Integration tests for end-to-end workflows
- β Performance benchmarking tests
- Black: Automatic code formatting
- isort: Import organization
- flake8: Code linting and style checks
- mypy: Static type checking
- pre-commit: Automated quality checks
# Full test suite with coverage
pytest tests/ --cov=src --cov-report=html
# Specific component tests
pytest tests/test_inference_server.py -v # FastAPI server (21 tests)
pytest tests/test_enhanced_trainer.py -v # Training pipeline
pytest tests/test_scientific_dataset.py -v # Data processing
pytest tests/test_qlora_config.py -v # QLoRA configuration
# Performance and integration tests
pytest tests/ -m "not slow" # Skip slow tests
pytest tests/ -m "integration" # Run integration tests only- β 7B Parameter Models: Fine-tune on <16GB GPU memory
- β QLoRA Optimization: 4-bit quantization with LoRA adapters
- β Dynamic Batch Sizing: Automatic memory optimization
- β Gradient Checkpointing: Reduced memory footprint during training
- β‘ Async Processing: FastAPI async request handling
- β‘ Batch Support: Multiple queries processed simultaneously
- β‘ Memory Monitoring: Real-time GPU memory tracking
- β‘ Auto-scaling: Dynamic resource allocation based on load
- π― Genomics Specialization: Domain-specific query processing
- π― Citation Handling: Proper scientific text preprocessing
- π― Benchmark Detection: Automatic dataset identification
- π― Quality Scoring: AI-powered paper quality assessment
# Dockerfile example
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY src/ ./src/
COPY configs/ ./configs/
EXPOSE 8000
CMD ["python", "-m", "src.serving.inference_server"]# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: scientific-llm-server
spec:
replicas: 3
selector:
matchLabels:
app: scientific-llm
template:
metadata:
labels:
app: scientific-llm
spec:
containers:
- name: inference-server
image: scientific-llm-forge:latest
ports:
- containerPort: 8000
resources:
requests:
memory: "8Gi"
nvidia.com/gpu: 1
limits:
memory: "16Gi"
nvidia.com/gpu: 1# Health monitoring
curl http://localhost:8000/api/v1/health
# Performance metrics
curl http://localhost:8000/api/v1/metrics
# MLflow experiment tracking
mlflow ui --backend-store-uri ./mlrunsWe welcome contributions! Here's how to get started:
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature - Implement your changes with tests
- Run quality checks:
pytest tests/ && pre-commit run --all-files - Submit pull request with detailed description
- π¬ Scientific Domain Expertise: Add new scientific domains beyond genomics
- π§ Model Architectures: Support for new LLM architectures
- π Data Processing: Enhanced scientific text processing techniques
- π Deployment: Kubernetes, cloud deployment configurations
- π§ͺ Testing: Additional test coverage and benchmarks
This project is licensed under the MIT License - see the LICENSE file for details.
- π§ Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions
- π Documentation: Full Documentation
- π¦ Updates: Follow @ScientificLLM
- β FastAPI Inference Server - Production-ready REST API
- β QLoRA Fine-tuning - Memory-efficient training pipeline
- β Scientific Data Processing - Genomics paper processing
- β Comprehensive Testing - 82% test coverage
- β MLflow Integration - Experiment tracking
- β Performance Monitoring - Real-time metrics
- π§ Distributed Training - Multi-GPU DeepSpeed integration
- π§ Model Versioning - Advanced checkpoint management
- π§ Web Dashboard - Training and monitoring UI
- π Multi-Domain Support - Chemistry, biology, physics datasets
- π Advanced Augmentation - Scientific text augmentation techniques
- π Cloud Deployment - AWS, GCP, Azure deployment guides
- π Kubernetes Operators - Native K8s integration
- π Model Hub Integration - HuggingFace Hub publishing
- π Real-time Streaming - WebSocket inference endpoints
𧬠ScientificLLM-Forge: Advancing Scientific Discovery Through AI π
Built with β€οΈ for the scientific research community
β Star us on GitHub | π Read the Docs | π€ Contribute