Skip to content

Center-for-AI-Innovation/LLMFlux

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

151 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLMFlux: LLM Batch Processing Pipeline for HPC Systems

A streamlined solution for running Large Language Models (LLMs) in batch mode on HPC systems powered by Slurm. LLMFlux uses the OpenAI-compatible API format with a JSONL-first architecture, enabling your prompts to flow efficiently through LLM engines at scale.

PyPI version License: MIT

Architecture

      JSONL Input                    Batch Processing                    Results
   (OpenAI Format)                 (Ollama/vLLM + Model)               (JSON Output)
         │                                 │                                 │
         │                                 │                                 │
         ▼                                 ▼                                 ▼
    ┌──────────┐                   ┌──────────────┐                   ┌──────────┐
    │  Batch   │                   │              │                   │  Output  │
    │ Requests │─────────────────▶ │   Model on   │─────────────────▶ │  Results │
    │  (JSONL) │                   │    GPU(s)    │                   │  (JSON)  │
    └──────────┘                   │              │                   └──────────┘
                                   └──────────────┘                    

LLMFlux processes JSONL files in a standardized OpenAI-compatible batch API format, enabling efficient processing of thousands of prompts on HPC systems with minimal overhead.

Documentation

Installation

pip install llmflux

Or for development:

  1. Create and Activate Conda Environment:

    conda create -n llmflux python=3.11 -y
    conda activate llmflux
  2. Install Package:

    pip install -e .
  3. Environment Setup:

    cp .env.example .env
    # Edit .env with your SLURM account and model details

Quick Start

Core Batch Processing on SLURM

The primary workflow for LLMFlux is submitting JSONL files for batch processing on SLURM:

from llmflux.slurm import SlurmRunner
from llmflux.core.config import Config

# Setup SLURM configuration
config = Config()
slurm_config = config.get_slurm_config()
slurm_config.account = "myaccount"

# Initialize runner
runner = SlurmRunner(config=slurm_config)

# Submit JSONL file directly for processing
job_id = runner.run(
    input_path="prompts.jsonl",
    output_path="results.json",
    model="llama3.2:3b",
    batch_size=4
)
print(f"Job submitted with ID: {job_id}")

JSONL Input Format

JSONL input format follows the OpenAI Batch API specification:

{"custom_id":"request1","method":"POST","url":"/v1/chat/completions","body":{"model":"llama3.2:3b","messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"Explain quantum computing"}],"temperature":0.7,"max_tokens":500}}
{"custom_id":"request2","method":"POST","url":"/v1/chat/completions","body":{"model":"llama3.2:3b","messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"What is machine learning?"}],"temperature":0.7,"max_tokens":500}}

For advanced options like custom batch sizes, processing settings, or SLURM configuration, see the Configuration Guide.

For advanced model configuration, see the Models Guide.

Command-Line Interface

LLMFlux includes a command-line interface for submitting batch processing jobs:

# Process JSONL file directly (core functionality)
llmflux run --model llama3.2:3b --input data/prompts.jsonl --output results/output.json

For detailed command options:

llmflux --help

Output Format

Results are saved in the user's workspace:

[
  {
    "input": {
      "custom_id": "request1",
      "method": "POST",
      "url": "/v1/chat/completions",
      "body": {
        "model": "llama3.2:3b",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant"},
          {"role": "user", "content": "Original prompt text"}
        ],
        "temperature": 0.7,
        "max_tokens": 1024
      },
      "metadata": {
        "source_file": "example.txt"
      }
    },
    "output": {
      "id": "chat-cmpl-123",
      "object": "chat.completion",
      "created": 1699123456,
      "model": "llama3.2:3b",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Generated response text"
          },
          "finish_reason": "stop"
        }
      ]
    },
    "metadata": {
      "model": "llama3.2:3b",
      "timestamp": "2023-11-04T12:34:56.789Z",
      "processing_time": 1.23
    }
  }
]

Utility Converters

LLMFlux provides utility converters to help prepare JSONL files from various input formats:

# Convert CSV to JSONL
llmflux convert csv --input data/papers.csv --output data/papers.jsonl --template "Summarize: {text}"

# Convert directory to JSONL
llmflux convert dir --input data/documents/ --output data/docs.jsonl --recursive

For code examples of converters, see the examples directory.

Benchmarking

LLMFlux ships with a benchmarking workflow that can source prompts, submit the SLURM job, and collect results/metrics for you.

llmflux benchmark --model llama3.2:3b --name nightly --num-prompts 60 \
  --account ACCOUNT_NAME --partition PARTITION_NAME --nodes 1
  • Prompt sources: omit --input to automatically download and cache LiveBench categories (benchmark_data/). Provide --input path/to/prompts.jsonl to reuse an existing JSONL file instead. Use --num-prompts, --temperature, and --max-tokens to control synthetic dataset generation.
  • Outputs: results default to results/benchmarks/<name>_results.json and a metrics summary (<name>_metrics.txt) containing elapsed SLURM runtime and number of prompts processed.
  • Batch tuning: adjust --batch-size for throughput. Pass model arguments such as --temperature and --max-tokens to forward them to the runner.
  • SLURM overrides: forward scheduler settings with --account, --partition, --nodes, --gpus-per-node, --time, --mem, and --cpus-per-task.
  • Job controls: add --rebuild to force an Apptainer image rebuild or --debug to keep the generated job script for inspection.

For the complete option reference:

llmflux benchmark --help

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

MIT License

About

LLM Batch Processing SDK for Slurm

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 8