Skip to content

Commit 3ff58a3

Browse files
authored
Update README.md
1 parent 6a3ffa7 commit 3ff58a3

File tree

1 file changed

+19
-0
lines changed

1 file changed

+19
-0
lines changed

README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,25 @@ response = client.chat.completions.create(
163163
)
164164
```
165165

166+
You can also use the alternate decoding techniques like `cot_decoding` and `entropy_decoding` directly with the local inference server.
167+
168+
```python
169+
response = client.chat.completions.create(
170+
model="meta-llama/Llama-3.2-1B-Instruct",
171+
messages=messages,
172+
temperature=0.2,
173+
extra_body={
174+
"decoding": "cot_decoding", # or "entropy_decoding"
175+
# CoT specific params
176+
"k": 10,
177+
"aggregate_paths": True,
178+
# OR Entropy specific params
179+
"top_k": 27,
180+
"min_p": 0.03,
181+
}
182+
)
183+
```
184+
166185
### Starting the optillm proxy with an external server (e.g. llama.cpp or ollama)
167186

168187
- Set the `OPENAI_API_KEY` env variable to a placeholder value

0 commit comments

Comments
 (0)