Skip to content
Merged
280 changes: 280 additions & 0 deletions flux.1-dev-trt-b200/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,280 @@
# Flux 1.0 Dev - TensorRT

This model provides high-quality text-to-image generation using [Flux.1-dev model](https://huggingface.co/black-forest-labs/FLUX.1-dev) optimized with TensorRT for the B200 GPU.

## Model Information

- **Model**: Flux 1.0 Dev (black-forest-labs/FLUX.1-dev)
- **Optimization**: TensorRT 8.6.1
- **Hardware**: NVIDIA B200 GPU
- **Framework**: PyTorch with TensorRT acceleration

## Features

- High-quality image generation from text prompts
- TensorRT optimization for fast inference
- Support for custom image dimensions (must be multiples of 8)
- Configurable denoising steps and guidance scale
- CUDA graph optimization support

## Usage

### Basic Usage

```bash
curl -X POST https://app.baseten.co/models/{MODEL_ID}/predict \
-H "Authorization: Api-Key API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A beautiful landscape with mountains and a lake, photorealistic, high quality",
"negative_prompt": "blurry, low quality, distorted",
"height": 1024,
"width": 1024,
"num_inference_steps": 30,
"guidance_scale": 3.5,
"seed": 42,
"batch_size": 1,
"batch_count": 1
}' | python show.py
```

### Batch Processing

The model supports efficient batch processing for generating multiple images in a single request. You can generate up to 4 images simultaneously with either the same prompt or different prompts for each image.

#### Same Prompt for All Images

```bash
curl -X POST https://app.baseten.co/models/{MODEL_ID}/predict \
-H "Authorization: Api-Key API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A beautiful landscape with mountains and a lake, photorealistic, high quality",
"negative_prompt": "blurry, low quality, distorted",
"height": 1024,
"width": 1024,
"num_inference_steps": 30,
"guidance_scale": 3.5,
"seed": 42,
"batch_size": 4,
"batch_count": 1
}' | python show_batch.py
```

#### Different Prompts for Each Image

```bash
curl -X POST https://app.baseten.co/models/{MODEL_ID}/predict \
-H "Authorization: Api-Key API_KEY" \
-H "Content-Type: application/json" \
-d '{
"prompt": [
"A beautiful landscape with mountains and a lake, photorealistic, high quality",
"A futuristic city skyline at sunset, neon lights, cyberpunk style, high quality",
"A cute cat sitting in a garden, soft lighting, detailed, high quality",
"Abstract geometric patterns in vibrant colors, modern art style, high quality"
],
"negative_prompt": [
"blurry, low quality, distorted",
"blurry, low quality, distorted",
"blurry, low quality, distorted",
"blurry, low quality, distorted"
],
"height": 1024,
"width": 1024,
"num_inference_steps": 30,
"guidance_scale": 3.5,
"seed": 42,
"batch_size": 4,
"batch_count": 1
}' | python show_batch.py
```

#### Batch Processing Benefits

- **Parallel Processing**: All images in a batch are generated simultaneously, not sequentially
- **Better GPU Utilization**: More efficient use of GPU resources compared to separate requests
- **Faster Total Time**: Generating 4 images in a batch is significantly faster than 4 separate API calls
- **Consistent Parameters**: All images in a batch use the same dimensions, inference steps, and guidance scale

#### Batch Processing Limitations

- **Maximum Batch Size**: Limited to 4 images per batch (MAX_BATCH_SIZE = 4)
- **Prompt Array Length**: When using different prompts, the prompt array length must match the batch_size
- **Memory Requirements**: Larger batches require more GPU memory

#### Displaying Batch Results

Use the included `show_batch.py` script to handle multiple images in the response:

```bash
# The script automatically detects single vs multiple images
curl ... | python show_batch.py
```

The script will:
- Save each image with a unique filename
- Automatically open all generated images
- Print status information about the batch

### Load Test
Before running the load test, update `load_test.py` with your actual endpoint URL and API key. Replace the placeholder values for `api_url` and `api_key` with your deployment's information (lines 43 and 44).

```bash
python load_test.py --save-all-images --use-varied-prompts --concurrent --num-requests 30

🚀 Starting Flux Truss API test...
==================================================
🎨 Using 30 varied prompts for load testing
📝 First 3 prompts to be tested:
1. a beautiful photograph of Mt. Fuji during cherry blossom, photorealistic, high quality
2. a majestic dragon soaring through a mystical forest, digital art, detailed
3. a cozy coffee shop interior with warm lighting, people working on laptops, photorealistic
... and 27 more
🚀 Starting concurrent load test with 30 requests, max 5 workers
============================================================
📤 Sending request 1/30: 'a beautiful photograph of Mt. Fuji during cherry b...'
Testing Truss API endpoint with prompt: 'a beautiful photograph of Mt. Fuji during cherry blossom, photorealistic, high quality'

...

============================================================
📊 LOAD TEST SUMMARY
============================================================
Total requests: 30
Successful: 30
Failed: 0
Success rate: 100.0%
Total time: 74.72 seconds
Average request time: 11.63 seconds
Min request time: 2.91 seconds
Max request time: 12.86 seconds
Throughput: 0.40 requests/second

==================================================

💾 Saving 30 successful images...
✅ Successfully saved 30/30 images to './output'
📁 Opened output directory

==================================================
```

### Advanced Usage

```python
{
"prompt": "A beautiful landscape with mountains and a lake, photorealistic, high quality",
"prompt2": "Additional prompt for T5 tokenizer", # Optional, uses prompt if not provided
"negative_prompt": "blurry, low quality, distorted", # Optional
"height": 1024, # Must be multiple of 8
"width": 1024, # Must be multiple of 8
"denoising_steps": 50, # Number of denoising steps
"guidance_scale": 3.5, # Classifier-free guidance scale (must be > 1)
"seed": 42, # Random seed for reproducible results
"batch_size": 1, # Number of images to generate
"batch_count": 1, # Number of batches
"num_warmup_runs": 0, # Number of warmup runs
"max_sequence_length": 512, # Max sequence length (up to 512 for flux.1-dev)
"t5_ws_percentage": None, # T5 weight streaming percentage
"transformer_ws_percentage": None # Transformer weight streaming percentage
}
```

## Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prompt` | string/array | required | Text prompt(s) for image generation. Can be a single string or array of strings for different prompts per batch item |
| `prompt2` | string/array | same as prompt | Additional prompt(s) for T5 tokenizer. Can be a single string or array of strings |
| `negative_prompt` | string/array | "" | Negative prompt(s) to avoid certain elements. Can be a single string or array of strings |
| `height` | int | 1024 | Image height (must be multiple of 8) |
| `width` | int | 1024 | Image width (must be multiple of 8) |
| `denoising_steps` | int | 50 | Number of denoising steps |
| `guidance_scale` | float | 3.5 | Classifier-free guidance scale (> 1) |
| `seed` | int | None | Random seed for reproducibility |
| `batch_size` | int | 1 | Number of images per batch |
| `batch_count` | int | 1 | Number of batches |
| `num_warmup_runs` | int | 0 | Number of warmup runs |
| `max_sequence_length` | int | 512 | Maximum sequence length (≤ 512) |
| `t5_ws_percentage` | int | None | T5 weight streaming percentage |
| `transformer_ws_percentage` | int | None | Transformer weight streaming percentage |

## Response Format

The model returns a JSON response with the following structure:

### Single Image Response

```json
{
"status": "success",
"data": "base64_encoded_image",
"time": 2.34,
"prompt": "A beautiful landscape with mountains and a lake, photorealistic, high quality",
"negative_prompt": "blurry, low quality, distorted",
"height": 1024,
"width": 1024,
"num_inference_steps": 30,
"guidance_scale": 3.5,
"seed": 42
}
```

### Batch Response (Multiple Images)

```json
{
"status": "success",
"data": [
"base64_encoded_image_1",
"base64_encoded_image_2",
"base64_encoded_image_3",
"base64_encoded_image_4"
],
"time": 7.23,
"prompt": [
"A beautiful landscape with mountains and a lake, photorealistic, high quality",
"A futuristic city skyline at sunset, neon lights, cyberpunk style, high quality",
"A cute cat sitting in a garden, soft lighting, detailed, high quality",
"Abstract geometric patterns in vibrant colors, modern art style, high quality"
],
"negative_prompt": [
"blurry, low quality, distorted",
"blurry, low quality, distorted",
"blurry, low quality, distorted",
"blurry, low quality, distorted"
],
"height": 1024,
"width": 1024,
"num_inference_steps": 30,
"guidance_scale": 3.5,
"seed": 42
}
```

## Performance Notes

The model is optimized with a pre-compiled TensorRT engine for the NVIDIA B200 GPU.
Performance characteristic is described below for the [basic usage](#basic-usage)


```text
|------------------|--------------|
| Module | Latency |
|------------------|--------------|
| CLIP | 2.02 ms |
| T5 | 6.43 ms |
| Transformer x 50 | 2361.44 ms |
| VAE-Dec | 11.67 ms |
|------------------|--------------|
| Pipeline | 2382.45 ms |
|------------------|--------------|
```

## Model Variants

This implementation supports the `flux.1-dev` variant with:
- Maximum sequence length: 512 tokens
- Default image dimensions: 1024x1024
- Optimized for high-quality image generation
17 changes: 17 additions & 0 deletions flux.1-dev-trt-b200/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
base_image:
image: nvcr.io/nvidia/pytorch:25.06-py3
description: Generate high-quality images from text prompts using Black Forest Labs's Flux model with TensorRT optimization.
external_package_dirs: []
model_name: Flux 1.0 Dev - TensorRT
requirements_file: ./requirements.txt
resources:
accelerator: B200
use_gpu: true
runtime:
predict_concurrency: 1
secrets:
hf_access_token: null
system_packages:
- ffmpeg
- libsm6
- libxext6
Loading
Loading