@@ -215,17 +215,27 @@ Subcommands:
215215- ` fast-agent model llamacpp list` prints discovered models; add `--json` for machine-readable output
216216- ` fast-agent model llamacpp preview <model-id>` prints the generated overlay YAML without writing files
217217- ` fast-agent model llamacpp import <model-id>` writes the overlay; add `--json` for machine-readable output
218+ - ` --include-sampling-defaults` persists the server's current sampling defaults into the overlay or preview output
218219- ` fast-agent model llamacpp import <model-id> --start-now` writes the overlay and immediately launches `fast-agent go --model <overlay>`
219220- ` fast-agent model llamacpp import <model-id> --start-now --with-shell` launches `fast-agent go -x --model <overlay>`
221+ - ` fast-agent model llamacpp import <model-id> --start-now --smart` launches `fast-agent go --smart -x --model <overlay>`
220222
221223The generated overlay :
222224
223225- uses `openresponses` as the provider
224226- stores the normalized `/v1` `base_url`
225227- records the selected auth mode
226- - copies discovered defaults such as `temperature`, `top_k`, `top_p`, `min_p`, and `max_tokens`
228+ - records discovered runtime limits such as `max_tokens`
227229- records discovered metadata such as `context_window`, `max_output_tokens`, and `tokenizes`
228230
231+ By default, the import flow does not persist the server's current sampling defaults. Use
232+ ` --include-sampling-defaults` if you want to freeze the current llama.cpp sampling policy into the
233+ generated `defaults` block.
234+
235+ Repeated unnamed imports of the same llama.cpp model on the same normalized base URL reuse the
236+ existing generated `llamacpp-*` overlay instead of creating another suffixed file. Explicitly named
237+ overlays are left alone.
238+
229239---
230240
231241</ModelOverlays>
0 commit comments