Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ concurrency:

jobs:
lint:
runs-on: ubuntu-20.04
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/setup-python/
Expand Down
2 changes: 1 addition & 1 deletion llama-cpp-server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@ cd llama.cpp
docker build -t local/llama.cpp:server-cuda --target server -f .devops/cuda.Dockerfile .
```

You can then push this image to a container registry of your choice and then replace the base_image in the config.yaml
You can then push this image to a container registry of your choice and then replace the base_image in the config.yaml
2 changes: 1 addition & 1 deletion llama-cpp-server/config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
base_image:
image: alphatozeta/llama-cpp-server:0.4
build_commands:
build_commands:
- pip install git+https://github.com/huggingface/transformers.git hf-xet
model_metadata:
repo_id: google/gemma-3-27b-it-qat-q4_0-gguf
Expand Down
3 changes: 1 addition & 2 deletions orpheus-best-performance/model/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
import torch
import fastapi
from snac import SNAC
import struct
from pathlib import Path
import numpy as np
from fastapi.responses import StreamingResponse
Expand Down Expand Up @@ -276,7 +275,7 @@ async def predict(

async def audio_stream(req_id: str):
token_gen = await self._engine.predict(model_input, request)

if isinstance(token_gen, StreamingResponse):
token_gen = token_gen.body_iterator

Expand Down
Loading