Skip to content

Commit 6e51324

Browse files
committed
Correct NIM online configurations
1 parent 20cc16d commit 6e51324

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

model-deployment/containers/nim/README-vanilla-containers.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,15 +70,15 @@ Use any zip file to create a dummy model artifact. As we will be downloading mod
7070
* Key: `STORAGE_SIZE_IN_GB`, Value: `120`
7171
* Key: `NCCL_CUMEM_ENABLE`, Value: `0`
7272
* Key: `WEB_CONCURRENCY`, Value: `1`
73+
* Key: `NGC_API_KEY`, Value: `<KEY_GENERATED_FROM_NGC>`
7374
* Under `Models` click on the `Select` button and select the Model Catalog entry we created earlier
7475
* Under `Compute` and then `Specialty and previous generation` select the `VM.GPU.A10.2` instance
7576
* Under `Networking` choose the `Custom Networking` option and bring the VCN and subnet, which allows Internet access.
7677
* Under `Logging` select the Log Group where you've created your predict and access log and select those correspondingly
7778
* Select the custom container option `Use a Custom Container Image` and click `Select`
7879
* Select the OCIR repository and image we pushed earlier
7980
* Leave the ports as the default port is 8080.
80-
* Leave CMD as below and Entrypoint as blank. Use `Add parameter` and populate each text field with comma separated values -
81-
`python3, -m, vllm_nvext.entrypoints.openai.api_server, --enforce-eager, --gpu-memory-utilization, 0.85, --max-model-len, 2048`
81+
* Leave CMD and Entrypoint as blank.
8282
* Click on `Create` button to create the model deployment
8383
8484
* Once the model is deployed and shown as `Active`, you can execute inference against it.

0 commit comments

Comments
 (0)