Skip to content

Commit c19e67c

Browse files
Update README.md
1 parent 93ddf0d commit c19e67c

File tree

1 file changed

+2
-2
lines changed
  • cloud-infrastructure/ai-infra-gpu/AI Infrastructure/triton-mixtral

1 file changed

+2
-2
lines changed

cloud-infrastructure/ai-infra-gpu/AI Infrastructure/triton-mixtral/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -238,7 +238,7 @@ and from within the container start the server by running the following python c
238238
```
239239
python3 scripts/launch_triton_server.py --world_size=8 --model_repo=/tensorrtllm_backend/triton_model_repo
240240
```
241-
where --world_size is the number of GPUs you want to use for serving.
241+
where `--world_size` is the number of GPUs you want to use for serving.
242242
If the deployment is successful you should get something like:
243243
```
244244
I0919 14:52:10.475738 293 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001
@@ -249,7 +249,7 @@ I0919 14:52:10.517138 293 http_server.cc:187] Started Metrics Service at 0.0.0.0
249249

250250
To test the model, one can query the the server endpoint, for example with:
251251
```
252-
curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 512, "bad_words": "", "stop_words": ""}'
252+
curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is cloud computing?", "max_tokens": 512, "bad_words": "", "stop_words": ""}'
253253
```
254254

255255
# Resources

0 commit comments

Comments
 (0)