Skip to content

Commit 8c2886b

Browse files
authored
Merge pull request #505 from oracle-samples/qq/aqua
Update AQUA model deployment to show inference mode.
2 parents 24d88c5 + 514ea8f commit 8c2886b

File tree

2 files changed

+5
-1
lines changed

2 files changed

+5
-1
lines changed

ai-quick-actions/model-deployment-tips.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,11 @@ For a full list of shapes and their definitions see the [compute shape docs](htt
4040
The relationship between model parameter size and GPU memory is roughly 2x parameter count in GB, so for example a model that has 7B parameters will need a minimum of 14 GB for inference. At runtime the
4141
memory is used for both holding the weights, along with the concurrent contexts for the user's requests.
4242

43-
The model will spin up and become available after some time, then you're able to try out the model
43+
The "inference mode" allows you to choose between the default completion endpoint(`/v1/completions`) and the chat endpoint (`/v1/chat/completions`).
44+
* The default completion endpoint is designed for text completion tasks. It’s suitable for generating text based on a given prompt.
45+
* The chat endpoint is tailored for chatbot-like interactions. It allows for more dynamic and interactive conversations by using a list of messages with roles (system, user, assistant). This is ideal for applications requiring back-and-forth dialogue, maintaining context over multiple turns. It is recommended that you deploy chat models (e.g. `meta-llama/Llama-3.1-8B-Instruct`) using the chat endpoint.
46+
47+
Once deployed, the model will spin up and become available after some time, then you're able to try out the model
4448
from the deployments tab using the test model, or programmatically.
4549

4650
![Try Model](web_assets/try-model.png)
146 KB
Loading

0 commit comments

Comments
 (0)