Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ model_name: BEI-baai-bge-m3-embedding-dense-truss-example
python_version: py39
requirements: []
resources:
accelerator: A100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rejected, these are autogenerated. Will pr it for you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accelerator.A100, # Bert has long-context issues (>8K tokens on 24Gb Ram machines. Using 80B therefore)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be outdated for this file, replaced it with h100

accelerator: H100
cpu: '1'
memory: 10Gi
use_gpu: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ model_name: BEI-baai-bge-reranker-v2-m3-multilingual-truss-example
python_version: py39
requirements: []
resources:
accelerator: A100
accelerator: H100
cpu: '1'
memory: 10Gi
use_gpu: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ model_name: BEI-snowflake-snowflake-arctic-embed-l-v2.0-truss-example
python_version: py39
requirements: []
resources:
accelerator: A100
accelerator: H100
cpu: '1'
memory: 10Gi
use_gpu: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ model_name: Briton-qwen-qwen2-57b-a14b-moe-int4-truss-example
python_version: py39
requirements: []
resources:
accelerator: A100
accelerator: H100
cpu: '1'
memory: 10Gi
use_gpu: true
Expand Down
2 changes: 1 addition & 1 deletion cogvlm/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ requirements:
- xformers==0.0.22
- accelerate==0.25.0
resources:
accelerator: A100
accelerator: H100
cpu: '3'
memory: 15Gi
use_gpu: true
Expand Down
2 changes: 1 addition & 1 deletion comfyui-truss/examples/anime-style-transfer/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ requirements:
- accelerate==0.23.0
- opencv-python
resources:
accelerator: A100
accelerator: H100
use_gpu: true
secrets: {}
system_packages:
Expand Down
2 changes: 1 addition & 1 deletion deepspeed-mii/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ python_version: py311
requirements:
- deepspeed-mii==0.1.1
resources:
accelerator: A100
accelerator: H100
cpu: '3'
memory: 14Gi
use_gpu: true
Expand Down
2 changes: 1 addition & 1 deletion gemma/gemma-2-27b-it-vllm/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ requirements:
- vllm==0.5.1
- https://github.com/flashinfer-ai/flashinfer/releases/download/v0.0.8/flashinfer-0.0.8+cu121torch2.3-cp311-cp311-linux_x86_64.whl
resources:
accelerator: A100
accelerator: H100
use_gpu: true
runtime:
predict_concurrency: 128
Expand Down
2 changes: 1 addition & 1 deletion gemma/gemma-2-9b-it-vllm/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ requirements:
- vllm==0.5.1
- https://github.com/flashinfer-ai/flashinfer/releases/download/v0.0.8/flashinfer-0.0.8+cu121torch2.3-cp311-cp311-linux_x86_64.whl
resources:
accelerator: A100
accelerator: H100
use_gpu: true
runtime:
predict_concurrency: 128
Expand Down
2 changes: 1 addition & 1 deletion llama/llama-2-13b-chat/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ requirements:
- torch==2.0.1
- transformers==4.32.1
resources:
accelerator: A100
accelerator: H100
cpu: '3'
memory: 14Gi
use_gpu: true
Expand Down
2 changes: 1 addition & 1 deletion llama/llama-2-13b/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ requirements:
- torch==2.0.1
- transformers==4.32.1
resources:
accelerator: A100:1
accelerator: H100:1
cpu: '3'
memory: 14Gi
use_gpu: true
Expand Down
2 changes: 1 addition & 1 deletion llama/llama-2-70b-chat/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ requirements:
- torch==2.0.1
- transformers==4.32.1
resources:
accelerator: A100:2
accelerator: H100:2
cpu: '3'
memory: 14Gi
use_gpu: true
Expand Down
2 changes: 1 addition & 1 deletion llama/llama-2-70b/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ requirements:
- torch==2.0.1
- transformers==4.32.1
resources:
accelerator: A100:2
accelerator: H100:2
cpu: '3'
memory: 14Gi
use_gpu: true
Expand Down
2 changes: 1 addition & 1 deletion llama/llama-3-8b-instruct/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ requirements:
- transformers
- torch
resources:
accelerator: A100
accelerator: H100
use_gpu: true
secrets:
hf_access_token: "your-hf-access-token"
Expand Down
2 changes: 1 addition & 1 deletion llama/llama-3_1_70b-instruct/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ requirements:
- vllm==0.5.3post1
- accelerate
resources:
accelerator: A100:4
accelerator: H100:4
use_gpu: true
runtime:
predict_concurrency: 128
Expand Down
2 changes: 1 addition & 1 deletion llama/llama-3_2-11b-vision-instruct/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ docker_server:
predict_endpoint: /v1/chat/completions
server_port: 8000
resources:
accelerator: A100
accelerator: H100
use_gpu: true
model_name: Llama 3.2 11B Vision Instruct
secrets:
Expand Down
2 changes: 1 addition & 1 deletion llama/llama-7b-exllama-streaming/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ python_version: py311
requirements:
- exllamav2==0.0.5
resources:
accelerator: A100
accelerator: H100
cpu: '1'
memory: 2Gi
use_gpu: true
Expand Down
2 changes: 1 addition & 1 deletion llama/llama-7b-exllama/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ python_version: py311
requirements:
- exllamav2==0.0.5
resources:
accelerator: A100
accelerator: H100
cpu: '1'
memory: 2Gi
use_gpu: true
Expand Down
2 changes: 1 addition & 1 deletion llava/llava-1.6-sgl/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ python_version: py310
requirements: []
requirements_file: ./requirements.txt
resources:
accelerator: A100
accelerator: H100
use_gpu: true
runtime:
predict_concurrency: 128
Expand Down
2 changes: 1 addition & 1 deletion llava/llava-v1.6-34b/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ python_version: py311
requirements:
- git+https://github.com/haotian-liu/LLaVA.git
resources:
accelerator: A100
accelerator: H100
use_gpu: true
secrets: {}
system_packages: []
4 changes: 2 additions & 2 deletions mistral/mixtral-8x22b-trt-int8-weights-only/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ external_package_dirs: []
model_metadata:
avatar_url: https://cdn.baseten.co/production/static/explore/mistral_logo.png
cover_image_url: https://cdn.baseten.co/production/static/explore/mistral.png
engine_repository: baseten/mixtral-8x22B_i60000_o4000_bs2_tp4_int8_weights_only_A100-tllm_0.9.0.dev2024022000
engine_repository: baseten/mixtral-8x22B_i60000_o4000_bs2_tp4_int8_weights_only_H100-tllm_0.9.0.dev2024022000
example_model_input:
max_tokens: 512
messages:
Expand All @@ -31,7 +31,7 @@ requirements:
- tritonclient[all]
- transformers==4.42.3
resources:
accelerator: A100:4
accelerator: H100:4
use_gpu: true
runtime:
num_workers: 1
Expand Down
2 changes: 1 addition & 1 deletion mistral/mixtral-8x22b/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ requirements:
- transformers==4.42.3
- torch==2.2.0
resources:
accelerator: A100:4
accelerator: H100:4
use_gpu: true
secrets:
hf_access_token: "ENTER HF ACCESS TOKEN HERE"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ requirements:
- tritonclient[all]
- transformers==4.42.3
resources:
accelerator: A100
accelerator: H100
use_gpu: true
runtime:
num_workers: 1
Expand Down
2 changes: 1 addition & 1 deletion mistral/mixtral-8x7b-instruct-trt-llm/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ requirements:
- tritonclient[all]
- transformers==4.42.3
resources:
accelerator: A100:2
accelerator: H100:2
use_gpu: true
runtime:
num_workers: 1
Expand Down
4 changes: 2 additions & 2 deletions mistral/mixtral-8x7b-instruct-vllm-a100-t-tp2/config.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
environment_variables: {}
external_package_dirs: []
model_name: Mixtral 8x7B — VLLM TP2 — A100:2
model_name: Mixtral 8x7B — VLLM TP2 — H100:2
python_version: py310
requirements:
- vllm
resources:
accelerator: A100:2
accelerator: H100:2
use_gpu: true
runtime:
predict_concurrency: 128
Expand Down
2 changes: 1 addition & 1 deletion mistral/mixtral-8x7b-instruct-vllm/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ python_version: py310
requirements:
- vllm==0.2.5
resources:
accelerator: A100:2
accelerator: H100:2
use_gpu: true
runtime:
predict_concurrency: 128
Expand Down
2 changes: 1 addition & 1 deletion mistral/pixtral-12b/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,5 +39,5 @@ secrets:
requirements:
- vllm==0.6.1
resources:
accelerator: A100
accelerator: H100
use_gpu: true
2 changes: 1 addition & 1 deletion nous-capybara/nous-capybara-34b-openai/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ requirements:
- scipy==1.11.4
- sentencepiece==0.1.99
resources:
accelerator: A100
accelerator: H100
cpu: '3'
memory: 20Gi
use_gpu: true
Expand Down
2 changes: 1 addition & 1 deletion nous-capybara/nous-capybara-34b/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ requirements:
- scipy==1.11.4
- sentencepiece==0.1.99
resources:
accelerator: A100
accelerator: H100
cpu: '3'
memory: 20Gi
use_gpu: true
Expand Down
2 changes: 1 addition & 1 deletion orpheus-tts/orpheus-tts-streaming/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ requirements:
- huggingface_hub[hf_transfer]
- hf_transfer==0.1.9
resources:
accelerator: A100
accelerator: H100
# accelerator: H100_40GB
use_gpu: true
runtime:
Expand Down
4 changes: 2 additions & 2 deletions stable-diffusion/playground-v2-trt/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ environment_variables:
HF_HUB_ENABLE_HF_TRANSFER: 1
external_package_dirs: []
model_cache:
- repo_id: baseten/playground-v2-trt-8.6.1.post1-engine-A100
- repo_id: baseten/playground-v2-trt-8.6.1.post1-engine-H100
- allow_patterns:
- config.json
- diffusion_pytorch_model.safetensors
Expand Down Expand Up @@ -42,7 +42,7 @@ requirements:
- --extra-index-url https://pypi.nvidia.com
- tensorrt==8.6.1.post1
resources:
accelerator: A100
accelerator: H100
use_gpu: true
runtime:
predict_concurrency: 1
Expand Down
2 changes: 1 addition & 1 deletion stable-diffusion/sdxl-lightning/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ requirements:
- xformers==0.0.22
- accelerate==0.24.1
resources:
accelerator: A100
accelerator: H100
use_gpu: true
secrets: {}
system_packages: []
2 changes: 1 addition & 1 deletion stable-diffusion/sdxl-lora-swapping/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ requirements:
- opencv-python==4.8.0.76
- diffusers==0.21.2
resources:
accelerator: A100
accelerator: H100
cpu: 3500m
memory: 20Gi
use_gpu: true
Expand Down
2 changes: 1 addition & 1 deletion stable-diffusion/stable-diffusion-3-medium/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ requirements:
- sentencepiece
- protobuf
resources:
accelerator: A100
accelerator: H100
use_gpu: true
secrets:
hf_access_token: ""
Expand Down
2 changes: 1 addition & 1 deletion stable-diffusion/stable-diffusion-xl-1.0-trt/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ requirements:
- --extra-index-url https://pypi.nvidia.com
- tensorrt==8.6.1.post1
resources:
accelerator: A100
accelerator: H100
use_gpu: true
runtime:
predict_concurrency: 1
Expand Down
2 changes: 1 addition & 1 deletion stable-diffusion/stable-video-diffusion/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ requirements:
- hf_transfer==0.1.4
- git+https://github.com/Stability-AI/generative-models.git@059d8e9cd9c55aea1ef2ece39abf605efb8b7cc9
resources:
accelerator: A100
accelerator: H100
cpu: '4'
memory: 16Gi
use_gpu: true
Expand Down
2 changes: 1 addition & 1 deletion templates/trt-llm/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ python_version: py311
requirements:
- tritonclient[all]
resources:
accelerator: A100
accelerator: H100
use_gpu: true
runtime:
predict_concurrency: 256
Expand Down
2 changes: 1 addition & 1 deletion text-embeddings-inference/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ base_image:
# select an image: L4
# CPU baseten/text-embeddings-inference-mirror:cpu-1.6
# Turing (T4, ...) baseten/text-embeddings-inference-mirror:turing-1.6
# Ampere 80 (A100, A30) baseten/text-embeddings-inference-mirror:1.6
# Ampere 80 (H100, A30) baseten/text-embeddings-inference-mirror:1.6
# Ampere 86 (A10, A10G, A40, ...) baseten/text-embeddings-inference-mirror:86-1.6
# Ada Lovelace (L4, ...) baseten/text-embeddings-inference-mirror:89-1.6
# Hopper (H100/H100 40GB) baseten/text-embeddings-inference-mirror:hopper-1.6
Expand Down
2 changes: 1 addition & 1 deletion ultravox/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ runtime:
requirements:
- httpx
resources:
accelerator: A100
accelerator: H100
use_gpu: true
secrets: {}
system_packages:
Expand Down
2 changes: 1 addition & 1 deletion vllm/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ model_metadata:
requirements:
- vllm==0.5.4
resources:
accelerator: A100
accelerator: H100
use_gpu: true
runtime:
predict_concurrency: 128
Expand Down
Loading