chore: bump llama.cpp to support tool streaming#1438
chore: bump llama.cpp to support tool streaming#1438rhatdan merged 2 commits intocontainers:mainfrom
Conversation
Signed-off-by: Robert Sturla <robertsturla@outlook.com>
Reviewer's GuideThis PR updates the llama.cpp clone target to a newer commit that supports tool streaming (including PR #12379) and reapplies consistent formatting across the build_llama_and_whisper.sh script to standardize indentation, remove extraneous semicolons, and align multiline arrays. Sequence diagram for conceptual tool streaming with updated llama.cppsequenceDiagram
actor User
participant OllamaService as "Ollama Service\n(with updated llama.cpp)"
participant LlamaCppInternal as "llama.cpp (b5499)"
participant ExternalTool as "External Tool\n(e.g., Codex)"
User->>OllamaService: Prompt requiring tool use
OllamaService->>LlamaCppInternal: Process prompt
LlamaCppInternal-->>ExternalTool: Call Tool API (e.g., code interpreter)
ExternalTool-->>LlamaCppInternal: Tool Response
LlamaCppInternal->>OllamaService: Formatted response incorporating tool output
OllamaService->>User: Final Response
File-Level Changes
Assessment against linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey @p5 - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
.github/workflows/ci-images.yml
Outdated
| /usr/share/dotnet /usr/local/lib/android /opt/ghc \ | ||
| /usr/local/share/powershell /usr/share/swift /usr/local/.ghcup \ | ||
| /usr/lib/jvm || true | ||
| /usr/lib/jvm /opt/hostedtoolcache/CodeQL || true |
There was a problem hiding this comment.
Added a potential fix to the storage issues in the runner.
Removing CodeQL (which is only used when you explicitly call it) frees up an additional 5GB.
There was a problem hiding this comment.
Unfortunately it didn't work. I'm unsure whether it helped, or is freeing up space in the wrong disk.
There was a problem hiding this comment.
The newest commit frees up an additional 8GB of storage, which I'm hoping is sufficient.
If not, the next option is probably to matrix these builds into separate jobs.
Signed-off-by: Robert Sturla <robertsturla@outlook.com>
|
LGTM |
|
Awesome, thank you! |
Closes #1431
Bumps llama.cpp to the commit sha in https://github.com/ggml-org/llama.cpp/releases/tag/b5499
These commits include ggml-org/llama.cpp#12379, plus all fixes related to this in order to support running AI code assistants like Codex.
While I was able to call Codex and get semi-sane responses from it after building the cuda container, I am not familiar enough to demonstrate an AI assistant doing it's magic.

And apologies for the unrelated changes - my IDE decided it wanted to format the code too. Checked through these and they don't appear to be functionally different, and just switch to using a consistent number of spaces within the script.
Summary by Sourcery
Bump llama.cpp to the latest commit to enable tool streaming support and update the build script indentation
New Features:
Enhancements:
Chores: