Skip to content

Conversation

@anivar
Copy link

@anivar anivar commented Jul 20, 2025

Summary

Adds support for missing llama.cpp sampling arguments --min-p and --top-k to llamafile.

Problem

Issue #715 reported that llamafile was missing several sampling arguments available in llama.cpp:

  • --min-p
  • --top-k
  • --samplers (noted for future work)

These arguments are needed for models like QwQ-32B to prevent response looping.

Implementation

Following llamafile's established patterns:

Command Line Arguments:

if (\!strcmp(flag, "--min-p")) {
    if (i == argc) missing("--min-p");
    FLAG_min_p = atof(argv[i++]);
    continue;
}

API Integration:

Json& min_p = json["min_p"];
if (\!min_p.isNull()) {
    if (\!min_p.isNumber()) return send_error(400, "min_p must be number");
    params->min_p = min_p.getNumber();
    if (\!(0 <= params->min_p && params->min_p <= 1))
        return send_error(400, "min_p must be between 0 and 1");
}

Files Changed

  • llamafile/llamafile.h: Added flag declarations
  • llamafile/flags.cpp: Added command line parsing and defaults
  • llamafile/server/v1_completions.cpp: Added API parameter support

Testing

Added comprehensive tests covering:

  • Command line argument parsing
  • JSON parameter validation
  • Default value handling
  • Integration testing

Usage Examples

Command Line:

./llamafile -m model.gguf --min-p 0.05 --top-k 40

API Request:

{
  "model": "model.gguf",
  "prompt": "Hello",
  "min_p": 0.05,
  "top_k": 40
}

Notes

  • Conservative defaults: min_p=0.05, top_k=40
  • Proper bounds checking: min_p in [0,1], top_k >= 0
  • --samplers requires complex parsing and is noted for future work

Resolves #715

Resolves issue mozilla-ai#715 by adding support for sampling arguments
that were available in llama.cpp but missing in llamafile.

Changes:
- Add FLAG_min_p and FLAG_top_k declarations and parsing
- Add JSON API parameter support with validation
- Connect command line flags to sampling defaults
- Support both CLI and API usage

Note: --samplers requires complex parsing, noted for future work.
@anivar anivar force-pushed the add-missing-sampling-args-715 branch from dd983cf to 97aa444 Compare July 20, 2025 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add llama.cpp args missing in llamafile.

1 participant