HTTP serving for text-to-image diffusion models with cache-dit acceleration.
Adapted from SGLang.
pip install -e ".[serving]"
cache-dit-serve --model-path black-forest-labs/FLUX.1-dev --cache
curl http://localhost:8000/healthGET /health- Health checkGET /get_model_info- Model informationPOST /generate- Generate imagesPOST /flush_cache- Flush cacheGET /docs- API documentation
python -m cache_dit.serve.client \
--prompt "A beautiful sunset over the ocean" \
--width 1024 \
--height 1024 \
--steps 50 \
--output output.pngimport requests
import base64
from PIL import Image
from io import BytesIO
response = requests.post(
"http://localhost:8000/generate",
json={
"prompt": "A beautiful sunset over the ocean",
"width": 1024,
"height": 1024,
"num_inference_steps": 50
}
)
result = response.json()
img_data = base64.b64decode(result["images"][0])
img = Image.open(BytesIO(img_data))
img.save("output.png")curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "A beautiful sunset over the ocean",
"width": 1024,
"height": 1024,
"num_inference_steps": 50
}' | jq -r '.images[0]' | base64 -d > output.png--model-path- Model path (required)--host- Server host (default: 0.0.0.0)--port- Server port (default: 8000)--device- Device (default: cuda)--dtype- Model dtype (default: bfloat16)
--cache- Enable DBCache--rdt- Residual diff threshold (default: 0.08)--Fn- First N compute blocks (default: 8)--Bn- Last N compute blocks (default: 0)
--parallel-type- Parallelism type (tp/ulysses/ring)- Tensor Parallelism (tp): Supported via broadcast-based synchronization
- Context Parallelism (ulysses/ring): Supported
--compile- Enable torch.compile (enables auto warmup per shape)
--enable-cpu-offload- Enable CPU offload--device-map- Device map strategy
cache-dit-serve --model-path black-forest-labs/FLUX.1-dev --cachecache-dit-serve --model-path black-forest-labs/FLUX.1-dev --cache --compiletorchrun --nproc_per_node=2 -m cache_dit.serve.serve \
--model-path black-forest-labs/FLUX.1-dev \
--cache \
--parallel-type ulyssestorchrun --nproc_per_node=2 -m cache_dit.serve.serve \
--model-path black-forest-labs/FLUX.1-dev \
--cache \
--parallel-type tpFlux, Qwen-Image, Wan, CogView3+/4, HunyuanDiT/Video, Mochi, LTX-Video, etc.
Adapted from SGLang: