Skip to content

Commit c4147c8

Browse files
Merge pull request #382 from DefangLabs/lio/fix-for-provider
Fix provider llm sample
2 parents 765fe5c + ae3065e commit c4147c8

File tree

6 files changed

+19
-13
lines changed

6 files changed

+19
-13
lines changed

samples/managed-llm-provider/README.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,12 @@ This sample application demonstrates using Managed LLMs with a Docker Model Prov
66

77
> Note: This version uses a [Docker Model Provider](https://docs.docker.com/compose/how-tos/model-runner/#provider-services) for managing LLMs. For the version with Defang's [OpenAI Access Gateway](https://docs.defang.io/docs/concepts/managed-llms/openai-access-gateway), please see our [*Managed LLM Sample*](https://github.com/DefangLabs/samples/tree/main/samples/managed-llm) instead.
88
9-
The Docker Model Provider allows users to use AWS Bedrock or Google Cloud Vertex AI models with their application. It is a service in the `compose.yaml` file.
9+
The Docker Model Provider allows users to run LLMs locally using `docker compose`. It is a service with `provider:` in the `compose.yaml` file.
10+
Defang will transparently fixup your project to use AWS Bedrock or Google Cloud Vertex AI models during deployment.
1011

11-
You can configure the `MODEL` and `ENDPOINT_URL` for the LLM separately for local development and production environments.
12-
* The `MODEL` is the LLM Model ID you are using.
13-
* The `ENDPOINT_URL` is the bridge that provides authenticated access to the LLM model.
12+
You can configure the `LLM_MODEL` and `LLM_URL` for the LLM separately for local development and production environments.
13+
* The `LLM_MODEL` is the LLM Model ID you are using.
14+
* The `LLM_URL` will be set by Docker and during deployment Defang will provide authenticated access to the LLM model in the cloud.
1415

1516
Ensure you have enabled model access for the model you intend to use. To do this, you can check your [AWS Bedrock model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access-modify.html) or [GCP Vertex AI model access](https://cloud.google.com/vertex-ai/generative-ai/docs/control-model-access).
1617

samples/managed-llm-provider/app/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM public.ecr.aws/docker/library/python:3.12-slim
1+
FROM python:3.12-slim
22

33
# Set working directory
44
WORKDIR /app

samples/managed-llm-provider/app/app.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,9 @@
1212
logging.basicConfig(level=logging.INFO)
1313

1414
# Set the environment variables for the chat model
15-
ENDPOINT_URL = os.getenv("ENDPOINT_URL", "https://api.openai.com/v1/chat/completions")
15+
LLM_URL = os.getenv("LLM_URL", "https://api.openai.com/v1/") + "chat/completions"
1616
# Fallback to OpenAI Model if not set in environment
17-
MODEL_ID = os.getenv("MODEL", "gpt-4-turbo")
17+
MODEL_ID = os.getenv("LLM_MODEL", "gpt-4-turbo")
1818

1919
# Get the API key for the LLM
2020
# For development, you can use your local API key. In production, the LLM gateway service will override the need for it.
@@ -60,14 +60,14 @@ async def ask(prompt: str = Form(...)):
6060
}
6161

6262
# Log request details
63-
logging.info(f"Sending POST to {ENDPOINT_URL}")
63+
logging.info(f"Sending POST to {LLM_URL}")
6464
logging.info(f"Request Headers: {headers}")
6565
logging.info(f"Request Payload: {payload}")
6666

6767
response = None
6868
reply = None
6969
try:
70-
response = requests.post(f"{ENDPOINT_URL}", headers=headers, data=json.dumps(payload))
70+
response = requests.post(f"{LLM_URL}", headers=headers, data=json.dumps(payload))
7171
except requests.exceptions.HTTPError as errh:
7272
reply = f"HTTP error:", errh
7373
except requests.exceptions.ConnectionError as errc:

samples/managed-llm-provider/compose.yaml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,19 +7,22 @@ services:
77
- "8000:8000"
88
restart: always
99
environment:
10-
- ENDPOINT_URL=http://llm/api/v1/chat/completions # endpoint to the Provider Service
11-
- MODEL=default # LLM model ID used in the Provider Service
10+
- LLM_MODEL # LLM model ID used
1211
# For other models, see https://docs.defang.io/docs/concepts/managed-llms/openai-access-gateway#model-mapping
1312
healthcheck:
1413
test: ["CMD", "python3", "-c", "import sys, urllib.request; urllib.request.urlopen(sys.argv[1]).read()", "http://localhost:8000/"]
1514
interval: 30s
1615
timeout: 5s
1716
retries: 3
1817
start_period: 5s
18+
depends_on:
19+
- llm
1920

2021
# Provider Service
2122
# This service is used to route requests to the LLM API
2223
llm:
2324
provider:
2425
type: model
26+
options:
27+
model: ai/smollm2
2528
x-defang-llm: true

samples/managed-llm/app/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM public.ecr.aws/docker/library/python:3.12-slim
1+
FROM python:3.12-slim
22

33
# Set working directory
44
WORKDIR /app

samples/managed-llm/compose.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
services:
22
app:
3-
build:
3+
build:
44
context: ./app
55
dockerfile: Dockerfile
66
ports:
@@ -17,6 +17,8 @@ services:
1717
timeout: 5s
1818
retries: 3
1919
start_period: 5s
20+
depends_on:
21+
- llm
2022

2123
# Defang OpenAI Access Gateway
2224
# This service is used to route requests to the LLM API

0 commit comments

Comments
 (0)