Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions samples/managed-llm/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ services:
restart: always
environment:
- ENDPOINT_URL=http://llm/api/v1/chat/completions # endpoint to the gateway service
- MODEL=anthropic.claude-3-haiku-20240307-v1:0 # LLM model ID used for the gateway
- MODEL=us.amazon.nova-micro-v1:0 # LLM model ID used for the gateway
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use a cloud-agnostic model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this. This is a AWS playground only sample so it makes more sense to be clear that the model is AWS only. We will abstract this at the next step where we have a model on all platforms (likely llama or mistral).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ai/llama3.2 is available on all clouds right? Why not use that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am skeptical that an "AWS playground only sample" is reasonable

- OPENAI_API_KEY=FAKE_TOKEN # the actual value will be ignored when using the gateway, but it should match the one in the llm service
healthcheck:
test: ["CMD", "python3", "-c", "import sys, urllib.request; urllib.request.urlopen(sys.argv[1]).read()", "http://localhost:8000/"]
Expand All @@ -29,8 +29,6 @@ services:
mode: host
environment:
- OPENAI_API_KEY=FAKE_TOKEN # this value must match the one in the app service
- USE_MODEL_MAPPING=false
- DEBUG=true
# if using GCP for BYOC deployment, add these environment variables:
# - GCP_PROJECT_ID=${GCP_PROJECT_ID}
# - GCP_REGION=${GCP_REGION}
Loading