-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Create Pmll.cpp #14981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create Pmll.cpp #14981
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,146 @@ | ||||||
| // SPDX-License-Identifier: MIT | ||||||
| // Persistent Memory Logic Loop adapter for llama.cpp | ||||||
| // | ||||||
| // © 2025 Dr. Josef Kurk Edwards & John Trompeter | ||||||
| // Simplified BSD-style—see LICENSE-PMLL. | ||||||
|
|
||||||
| #include "llama.h" | ||||||
| #include <filesystem> | ||||||
| #include <fstream> | ||||||
| #include <mutex> | ||||||
| #include <stdexcept> | ||||||
| #include <string> | ||||||
| #include <vector> | ||||||
|
|
||||||
| namespace pmll { | ||||||
|
|
||||||
| using fs = std::filesystem; | ||||||
|
|
||||||
| struct LoopHook { | ||||||
| // Override this to inject your own logic each step. | ||||||
| // Return false to abort generation. | ||||||
| virtual bool operator()(const std::string& prompt, | ||||||
| const std::vector<llama_token>& last_out) = 0; | ||||||
| virtual ~LoopHook() = default; | ||||||
| }; | ||||||
|
|
||||||
| class Loop { | ||||||
| public: | ||||||
| Loop(const std::string& model_path, | ||||||
| const std::string& state_dir, | ||||||
| uint32_t n_ctx = 4096, | ||||||
| LoopHook* user_hook = nullptr) | ||||||
| : model_path_(model_path), | ||||||
| state_dir_(state_dir), | ||||||
| user_hook_(user_hook) { | ||||||
|
|
||||||
| fs::create_directories(state_dir_); | ||||||
|
|
||||||
| llama_backend_init(); // init ggml backend | ||||||
| llama_model_params mp = llama_model_default_params(); | ||||||
| model_ = llama_model_load_from_file(model_path_.c_str(), mp); | ||||||
| if (!model_) throw std::runtime_error("model load failed"); | ||||||
|
|
||||||
| llama_context_params cp = llama_context_default_params(); | ||||||
| cp.n_ctx = n_ctx; | ||||||
| ctx_ = llama_init_from_model(model_, cp); | ||||||
| if (!ctx_) throw std::runtime_error("context init failed"); | ||||||
|
|
||||||
| mem_ = llama_get_memory(ctx_); // unified KV handle | ||||||
| } | ||||||
|
|
||||||
| ~Loop() { | ||||||
| llama_free(ctx_); | ||||||
| llama_model_free(model_); | ||||||
| llama_backend_free(); | ||||||
| } | ||||||
|
|
||||||
| /// Generate up to n_predict tokens, persisting state after each decode | ||||||
| std::string generate(const std::string& prompt, | ||||||
| int n_predict = 128, | ||||||
| llama_seq_id seq = 0) { | ||||||
| std::lock_guard<std::mutex> lock(mu_); | ||||||
| restore(seq); // 1⃣ try resume | ||||||
|
|
||||||
| // --- tokenize prompt -------------------------------------------------- | ||||||
| std::vector<llama_token> tokens(prompt.size() + 8); | ||||||
| int n = llama_tokenize(model_, prompt.c_str(), | ||||||
| tokens.data(), tokens.size(), true, true); | ||||||
| tokens.resize(n); | ||||||
|
|
||||||
| llama_batch batch = llama_batch_init(n, 0, 1); | ||||||
| for (int i = 0; i < n; ++i) { | ||||||
| batch.token[i] = tokens[i]; | ||||||
| batch.pos[i] = i; | ||||||
| batch.seq_id[i] = &seq; | ||||||
|
||||||
| batch.seq_id[i] = &seq; | |
| batch.seq_id[i] = seq; |
Copilot
AI
Jul 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as line 75 - taking the address of seq is incorrect. This should be b1.seq_id[0] = seq; and proper sequence ID array handling should be implemented.
| b1.seq_id[0] = &seq; | |
| b1.seq_id[0] = seq; |
Copilot
AI
Jul 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function llama_model_get_vocab() appears to be an incorrect API usage. Based on llama.cpp's API, this should likely be llama_n_vocab(model_) directly, as llama_n_vocab typically takes the model pointer, not a vocab object.
Copilot
AI
Jul 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The buffer size of 8 bytes is insufficient for token-to-string conversion. Some tokens can produce UTF-8 sequences longer than 8 bytes, which could lead to truncated output or buffer overflow. Consider using a larger buffer size (e.g., 32 or 64 bytes) or dynamically allocating based on the token.
Copilot
AI
Jul 31, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mem_ member variable is assigned in the constructor but never used elsewhere in the class. Consider removing it if it's not needed, or document its intended purpose if it will be used in future functionality.
| llama_memory_t mem_ = nullptr; | |
| // Removed unused mem_ member variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The batch allocation and deallocation happens multiple times in the generation loop. Consider reusing batch objects or allocating them once outside the loop to reduce memory allocation overhead during token generation.