Skip to content

Commit 96e7a10

Browse files
sfloresksfloreskjoozero
authored
Triton ec2 examples (#274)
* Add initial files for vLLM with GPU * Add files for triton ec2 example * Fix docs * Remove vllm-gpu files --------- Co-authored-by: sfloresk <sfkanter@amazon.com> Co-authored-by: Jooyoung Kim <59524380+joozero@users.noreply.github.com>
1 parent c029326 commit 96e7a10

File tree

4 files changed

+535
-0
lines changed

4 files changed

+535
-0
lines changed
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Triton inference server in Amazon ECS
2+
3+
This solution blueprint runs a triton server with a vLLM backend on top ECS using a g5.12xlarge instance
4+
5+
## Deployment of TinyLlama/TinyLlama-1.1B-Chat-v1.0
6+
7+
1. Deploy core-infra resources
8+
9+
```shell
10+
cd ./terraform/ec2-examples/core-infra
11+
terraform init
12+
terraform apply -target=module.vpc -target=aws_service_discovery_private_dns_namespace.this
13+
```
14+
15+
2. Deploy this blueprint
16+
17+
```shell
18+
cd ../inference-triton
19+
terraform init
20+
terraform apply
21+
```
22+
23+
## Example: Running TinyLlama/TinyLlama-1.1B-Chat-v1.0
24+
25+
Once the cluster and services are deployed, you can use the load balancer DNS name (output during the deployment) to send requests to the vLLM service. It can take several minutes for the triton task to start, if the following command returns 5xx errors, the task might not have started yet.
26+
27+
```bash
28+
ALB_NAME=$(terraform output -raw load_balancer_dns_name)
29+
30+
curl -X POST http://${ALB_NAME}:8000/v2/models/vllm_model/generate \
31+
-d '{"text_input": "In summary, AWS ECS is", "parameters": {"max_tokens": 200, "temperature": 0}}'
32+
33+
```
34+
35+
Example Response:
36+
```json
37+
{"model_name":"vllm_model","model_version":"1","text_output":"In summary, AWS ECS is a container orchestration service that allows you to manage and scale your containerized applications. It provides a simple and intuitive interface for managing your containerized applications, as well as a range of features to help you manage your infrastructure and scale your applications. AWS ECS is a great choice for developers who want to build and manage containerized applications on AWS."}
38+
```
39+
40+
## Clean up
41+
42+
1. Stop the tasks
43+
```shell
44+
aws ecs update-service --service triton-service \
45+
--desired-count 0 --cluster ecs-demo-triton-inference \
46+
--region us-west-2 --query 'service.serviceName'
47+
48+
sleep 30s
49+
```
50+
51+
2. Destroy this blueprint
52+
53+
```shell
54+
terraform destroy
55+
```
56+
57+
3. Destroy core-infra resources
58+
59+
```shell
60+
cd ../core-infra
61+
terraform destroy
62+
63+
```
64+
65+
## Support
66+
67+
Please open an issue for questions or unexpected behavior

0 commit comments

Comments
 (0)