Skip to content

Commit 6ca20e8

Browse files
committed
Fix provider llm sample
1 parent d88cfe4 commit 6ca20e8

File tree

6 files changed

+23
-17
lines changed

6 files changed

+23
-17
lines changed

samples/managed-llm-provider/README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,12 @@ This sample application demonstrates using Managed LLMs with a Docker Model Prov
66

77
> Note: This version uses a [Docker Model Provider](https://docs.docker.com/compose/how-tos/model-runner/#provider-services) for managing LLMs. For the version with Defang's [OpenAI Access Gateway](https://docs.defang.io/docs/concepts/managed-llms/openai-access-gateway), please see our [*Managed LLM Sample*](https://github.com/DefangLabs/samples/tree/main/samples/managed-llm) instead.
88
9-
The Docker Model Provider allows users to use AWS Bedrock or Google Cloud Vertex AI models with their application. It is a service in the `compose.yaml` file.
9+
The Docker Model Provider allows users to run LLMs locally using `docker compose`. It is a service with `provider:` in the `compose.yaml` file.
10+
Defang will transparently fixup your project to use AWS Bedrock or Google Cloud Vertex AI models during deployment.
1011

11-
You can configure the `MODEL` and `ENDPOINT_URL` for the LLM separately for local development and production environments.
12-
* The `MODEL` is the LLM Model ID you are using.
13-
* The `ENDPOINT_URL` is the bridge that provides authenticated access to the LLM model.
12+
You can configure the `LLM_MODEL` and `LLM_URL` for the LLM separately for local development and production environments.
13+
* The `LLM_MODEL` is the LLM Model ID you are using.
14+
* The `LLM_URL` will be set by Docker and during deployment Defang will provide authenticated access to the LLM model in the cloud.
1415

1516
Ensure you have enabled model access for the model you intend to use. To do this, you can check your [AWS Bedrock model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access-modify.html) or [GCP Vertex AI model access](https://cloud.google.com/vertex-ai/generative-ai/docs/control-model-access).
1617

@@ -38,14 +39,14 @@ docker compose -f compose.dev.yaml up --build
3839

3940
## Configuration
4041

41-
For this sample, you will need to provide the following [configuration](https://docs.defang.io/docs/concepts/configuration):
42+
For this sample, you will need to provide the following [configuration](https://docs.defang.io/docs/concepts/configuration):
4243

4344
> Note that if you are using the 1-click deploy option, you can set these values as secrets in your GitHub repository and the action will automatically deploy them for you.
4445
45-
### `MODEL`
46+
### `LLM_MODEL`
4647
The Model ID of the LLM you are using for your application. For example, `anthropic.claude-3-haiku-20240307-v1:0`.
4748
```bash
48-
defang config set MODEL
49+
defang config set LLM_MODEL
4950
```
5051

5152
## Deployment

samples/managed-llm-provider/app/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM public.ecr.aws/docker/library/python:3.12-slim
1+
FROM python:3.12-slim
22

33
# Set working directory
44
WORKDIR /app

samples/managed-llm-provider/app/app.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,9 @@
1212
logging.basicConfig(level=logging.INFO)
1313

1414
# Set the environment variables for the chat model
15-
ENDPOINT_URL = os.getenv("ENDPOINT_URL", "https://api.openai.com/v1/chat/completions")
15+
LLM_URL = os.getenv("LLM_URL", "https://api.openai.com/v1/") + "chat/completions"
1616
# Fallback to OpenAI Model if not set in environment
17-
MODEL_ID = os.getenv("MODEL", "gpt-4-turbo")
17+
MODEL_ID = os.getenv("LLM_MODEL", "gpt-4-turbo")
1818

1919
# Get the API key for the LLM
2020
# For development, you can use your local API key. In production, the LLM gateway service will override the need for it.
@@ -60,14 +60,14 @@ async def ask(prompt: str = Form(...)):
6060
}
6161

6262
# Log request details
63-
logging.info(f"Sending POST to {ENDPOINT_URL}")
63+
logging.info(f"Sending POST to {LLM_URL}")
6464
logging.info(f"Request Headers: {headers}")
6565
logging.info(f"Request Payload: {payload}")
6666

6767
response = None
6868
reply = None
6969
try:
70-
response = requests.post(f"{ENDPOINT_URL}", headers=headers, data=json.dumps(payload))
70+
response = requests.post(f"{LLM_URL}", headers=headers, data=json.dumps(payload))
7171
except requests.exceptions.HTTPError as errh:
7272
reply = f"HTTP error:", errh
7373
except requests.exceptions.ConnectionError as errc:

samples/managed-llm-provider/compose.yaml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,18 +7,21 @@ services:
77
- "8000:8000"
88
restart: always
99
environment:
10-
- ENDPOINT_URL=http://llm/api/v1/chat/completions # endpoint to the Provider Service
11-
- MODEL=anthropic.claude-3-haiku-20240307-v1:0 # LLM model ID used in the Provider Service
10+
- LLM_MODEL
1211
healthcheck:
1312
test: ["CMD", "python3", "-c", "import sys, urllib.request; urllib.request.urlopen(sys.argv[1]).read()", "http://localhost:8000/"]
1413
interval: 30s
1514
timeout: 5s
1615
retries: 3
1716
start_period: 5s
17+
depends_on:
18+
- llm
1819

1920
# Provider Service
2021
# This service is used to route requests to the LLM API
2122
llm:
2223
provider:
2324
type: model
25+
options:
26+
model: ai/smollm2
2427
x-defang-llm: true

samples/managed-llm/app/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM public.ecr.aws/docker/library/python:3.12-slim
1+
FROM python:3.12-slim
22

33
# Set working directory
44
WORKDIR /app

samples/managed-llm/compose.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,23 @@
11
services:
22
app:
3-
build:
3+
build:
44
context: ./app
55
dockerfile: Dockerfile
66
ports:
77
- "8000:8000"
88
restart: always
99
environment:
1010
- ENDPOINT_URL=http://llm/api/v1/chat/completions # endpoint to the gateway service
11-
- MODEL=anthropic.claude-3-haiku-20240307-v1:0 # LLM model ID used for the gateway
11+
- MODEL=us.amazon.nova-micro-v1:0 # LLM model ID used for the gateway
1212
- OPENAI_API_KEY=FAKE_TOKEN # the actual value will be ignored when using the gateway, but it should match the one in the llm service
1313
healthcheck:
1414
test: ["CMD", "python3", "-c", "import sys, urllib.request; urllib.request.urlopen(sys.argv[1]).read()", "http://localhost:8000/"]
1515
interval: 30s
1616
timeout: 5s
1717
retries: 3
1818
start_period: 5s
19+
depends_on:
20+
- llm
1921

2022
# Defang OpenAI Access Gateway
2123
# This service is used to route requests to the LLM API

0 commit comments

Comments
 (0)