refactor: update to the new GitHub org used for llama.cpp

giladgd · giladgd · commit 585abd158432 · 2025-02-16T03:01:58.000+02:00
diff --git a/README.md b/README.md
@@ -94,7 +94,7 @@ console.log("AI: " + a2);
 To contribute to `node-llama-cpp` read the [contribution guide](https://node-llama-cpp.withcat.ai/guide/contributing).
 
 ## Acknowledgements
-* llama.cpp: [ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp)
+* llama.cpp: [ggml-org/llama.cpp](https://github.com/ggml-org/llama.cpp)
 
 
 <br />
diff --git a/docs/blog/v3.md b/docs/blog/v3.md
@@ -15,7 +15,7 @@ image:
 ---
 [`node-llama-cpp`](https://node-llama-cpp.withcat.ai) 3.0 is finally here.
 
-With [`node-llama-cpp`](https://node-llama-cpp.withcat.ai), you can run large language models locally on your machine using the power of [`llama.cpp`](https://github.com/ggerganov/llama.cpp) with a simple and easy-to-use API.
+With [`node-llama-cpp`](https://node-llama-cpp.withcat.ai), you can run large language models locally on your machine using the power of [`llama.cpp`](https://github.com/ggml-org/llama.cpp) with a simple and easy-to-use API.
 
 It includes everything you need, from downloading models, to running them in the most optimized way for your hardware, and integrating them in your projects.
 
@@ -43,7 +43,7 @@ While `llama.cpp` is an amazing project, it's also highly technical and can be c
 `node-llama-cpp` bridge that gap, making `llama.cpp` accessible to everyone, regardless of their experience level.
 
 ### Performance
-[`node-llama-cpp`](https://node-llama-cpp.withcat.ai) is built on top of [`llama.cpp`](https://github.com/ggerganov/llama.cpp), a highly optimized C++ library for running large language models.
+[`node-llama-cpp`](https://node-llama-cpp.withcat.ai) is built on top of [`llama.cpp`](https://github.com/ggml-org/llama.cpp), a highly optimized C++ library for running large language models.
 
 `llama.cpp` supports many compute backends, including Metal, CUDA, and Vulkan. It also uses [Accelerate](https://developer.apple.com/accelerate/) on Mac.
 
@@ -116,7 +116,7 @@ npx -y node-llama-cpp chat
 Check out the [getting started guide](../guide/index.md) to learn how to use `node-llama-cpp`.
 
 ## Thank You
-`node-llama-cpp` is only possible thanks to the amazing work done on [`llama.cpp`](https://github.com/ggerganov/llama.cpp) by [Georgi Gerganov](https://github.com/ggerganov), [Slaren](https://github.com/slaren) and all the contributors from the community.
+`node-llama-cpp` is only possible thanks to the amazing work done on [`llama.cpp`](https://github.com/ggml-org/llama.cpp) by [Georgi Gerganov](https://github.com/ggerganov), [Slaren](https://github.com/slaren) and all the contributors from the community.
 
 ## What's next?
 Version 3.0 is a major milestone, but there's plenty more planned for the future.
diff --git a/docs/guide/Vulkan.md b/docs/guide/Vulkan.md
@@ -135,7 +135,7 @@ watch -d "npx --no node-llama-cpp inspect gpu"
 ```
 
 ## Vulkan Caveats
-[At the moment](https://github.com/ggerganov/llama.cpp/issues/7575),
+[At the moment](https://github.com/ggml-org/llama.cpp/issues/7575),
 Vulkan doesn't work well when using multiple contexts at the same time,
 so it's recommended to use a single context with Vulkan,
 and to manually dispose a context (using [`.dispose()`](../api/classes/LlamaContext.md#dispose)) before creating a new one.
diff --git a/docs/guide/building-from-source.md b/docs/guide/building-from-source.md
@@ -172,7 +172,7 @@ or pass the code snippet that is printed after the build finishes.
 Every new release of `node-llama-cpp` ships with the latest release of `llama.cpp` that was available at the time of the release,
 so relying on the latest version of `node-llama-cpp` should be enough for most use cases.
 
-However, you may want to download a newer release of `llama.cpp` ([`llama.cpp` releases](https://github.com/ggerganov/llama.cpp/releases))
+However, you may want to download a newer release of `llama.cpp` ([`llama.cpp` releases](https://github.com/ggml-org/llama.cpp/releases))
 and build it from source to get the latest features and bug fixes before a new version of `node-llama-cpp` is released.
 
 A new release may contain breaking changes, so it won't necessarily work properly or even compile at all, so do this with caution.
@@ -182,7 +182,7 @@ You can do this by specifying the `--release` option with the release tag you wa
 npx --no node-llama-cpp source download --release "b1350"
 ```
 
-> You can find the release tag on the [`llama.cpp` releases page](https://github.com/ggerganov/llama.cpp/releases):
+> You can find the release tag on the [`llama.cpp` releases page](https://github.com/ggml-org/llama.cpp/releases):
 
 You can also opt to download the latest release available:
 ```shell
diff --git a/docs/guide/choosing-a-model.md b/docs/guide/choosing-a-model.md
@@ -142,7 +142,7 @@ Here are a few concepts to be aware of when choosing a model:
 If you plan to feed the model with a lot of data at once, you'll need a model that supports a large context size.
 The larger the context size is, the more data the model can process at once.
 
-You can only create a context with a size that is smaller or equal to the context size the model was trained on (although there are techniques around that, like [RoPE](https://github.com/ggerganov/llama.cpp/discussions/1965)).
+You can only create a context with a size that is smaller or equal to the context size the model was trained on (although there are techniques around that, like [RoPE](https://github.com/ggml-org/llama.cpp/discussions/1965)).
 The larger the context size is, the more memory the model will require to run.
 If you plan to feed the model with a lot of data at once, you may want to choose a smaller model that uses less memory, so you can create a larger context.
 
diff --git a/docs/guide/cmakeOptions.data.ts b/docs/guide/cmakeOptions.data.ts
@@ -16,7 +16,7 @@ const loader = {
         const clonedRepoReleaseInfo = await getClonedLlamaCppRepoReleaseInfo();
         const release = clonedRepoReleaseInfo?.tag ?? await getBinariesGithubRelease();
 
-        const githubFileUrl = `https://github.com/ggerganov/llama.cpp/blob/${encodeURIComponent(release)}/ggml/CMakeLists.txt`;
+        const githubFileUrl = `https://github.com/ggml-org/llama.cpp/blob/${encodeURIComponent(release)}/ggml/CMakeLists.txt`;
 
         return {
             cmakeOptionsFileUrl: githubFileUrl,
diff --git a/docs/guide/grammar.md b/docs/guide/grammar.md
@@ -26,7 +26,7 @@ so it's recommended to use it together with `maxTokens` set to the context size
 ## Using a Builtin Grammar {#builtin-grammar}
 The [`llama.getGrammarFor("<format>")`](../api/classes/Llama.md#getgrammarfor) method reads a GBNF grammar file that's originally provided by `llama.cpp` and is included inside of `node-llama-cpp`.
 
-You can see the full list of supported grammar files [here](https://github.com/ggerganov/llama.cpp/tree/master/grammars).
+You can see the full list of supported grammar files [here](https://github.com/ggml-org/llama.cpp/tree/master/grammars).
 
 ```typescript
 import {fileURLToPath} from "url";
@@ -174,7 +174,7 @@ so there's no need to explain the schema in the prompt.
 :::
 
 ## Creating Your Own Grammar {#custom-grammar}
-To create your own grammar, read the [GBNF guide](https://github.com/ggerganov/llama.cpp/blob/f5fe98d11bdf9e7797bcfb05c0c3601ffc4b9d26/grammars/README.md) to create a GBNF grammar file.
+To create your own grammar, read the [GBNF guide](https://github.com/ggml-org/llama.cpp/blob/f5fe98d11bdf9e7797bcfb05c0c3601ffc4b9d26/grammars/README.md) to create a GBNF grammar file.
 
 To use your custom grammar file, load it via the [`llama.createGrammar(...)`](../api/classes/Llama.md#creategrammar) method:
 ```typescript
diff --git a/docs/guide/tips-and-tricks.md b/docs/guide/tips-and-tricks.md
@@ -67,8 +67,8 @@ or provide additional information regarding flash attention when used.
 
 OpenMP can help improve inference performance on Linux and Windows, but requires additional installation and setup.
 
-The performance improvement can be [up to 8% faster](https://github.com/ggerganov/llama.cpp/pull/7606) inference times (on specific conditions).
-Setting the `OMP_PROC_BIND` environment variable to `TRUE` on systems that support many threads (assume 36 as the minimum) can improve performance [by up to 23%](https://github.com/ggerganov/llama.cpp/pull/7606).
+The performance improvement can be [up to 8% faster](https://github.com/ggml-org/llama.cpp/pull/7606) inference times (on specific conditions).
+Setting the `OMP_PROC_BIND` environment variable to `TRUE` on systems that support many threads (assume 36 as the minimum) can improve performance [by up to 23%](https://github.com/ggml-org/llama.cpp/pull/7606).
 
 The pre-built binaries are compiled without OpenMP since OpenMP isn't always available on all systems, and has to be installed separately.
 
diff --git a/src/config.ts b/src/config.ts
@@ -36,7 +36,7 @@ export const localXpacksStoreDirectory = path.join(xpackDirectory, "store");
 export const localXpacksCacheDirectory = path.join(xpackDirectory, "cache");
 export const buildMetadataFileName = "_nlcBuildMetadata.json";
 export const xpmVersion = "^0.16.3";
-export const builtinLlamaCppGitHubRepo = "ggerganov/llama.cpp";
+export const builtinLlamaCppGitHubRepo = "ggml-org/llama.cpp";
 export const builtinLlamaCppRelease = await getBinariesGithubRelease();
 
 export const isCI = env.get("CI")
diff --git a/src/evaluator/LlamaGrammar.ts b/src/evaluator/LlamaGrammar.ts
@@ -39,8 +39,8 @@ export class LlamaGrammar {
     /**
      * > GBNF files are supported.
      * > More info here: [
-     * github:ggerganov/llama.cpp:grammars/README.md
-     * ](https://github.com/ggerganov/llama.cpp/blob/f5fe98d11bdf9e7797bcfb05c0c3601ffc4b9d26/grammars/README.md)
+     * github:ggml-org/llama.cpp:grammars/README.md
+     * ](https://github.com/ggml-org/llama.cpp/blob/f5fe98d11bdf9e7797bcfb05c0c3601ffc4b9d26/grammars/README.md)
      *
      * Prefer to create a new instance of this class by using `llama.createGrammar(...)`.
      * @deprecated Use `llama.createGrammar(...)` instead.
diff --git a/src/evaluator/LlamaModel/LlamaModel.ts b/src/evaluator/LlamaModel/LlamaModel.ts
@@ -433,7 +433,7 @@ export class LlamaModel {
      * Recommended for debugging purposes only.
      *
      * > **Note:** there may be additional spaces around special tokens that were not present in the original text - this is not a bug,
-     * this is [how the tokenizer is supposed to work](https://github.com/ggerganov/llama.cpp/pull/7697#issuecomment-2144003246).
+     * this is [how the tokenizer is supposed to work](https://github.com/ggml-org/llama.cpp/pull/7697#issuecomment-2144003246).
      *
      * Defaults to `false`.
      * @param [lastTokens] - the last few tokens that preceded the tokens to detokenize.
diff --git a/src/gguf/types/GgufMetadataTypes.ts b/src/gguf/types/GgufMetadataTypes.ts
@@ -361,7 +361,7 @@ export type GgufMetadataDefaultArchitectureType = {
 //     }
 // };
 
-// source: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#llama
+// source: https://github.com/ggml-org/ggml/blob/master/docs/gguf.md#llama
 export type GgufMetadataLlmLLaMA = {
     readonly context_length: number,
     readonly embedding_length: number,
@@ -381,7 +381,7 @@ export type GgufMetadataLlmLLaMA = {
     readonly tensor_data_layout?: string
 };
 
-// source: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#mpt
+// source: https://github.com/ggml-org/ggml/blob/master/docs/gguf.md#mpt
 export type GgufMetadataMPT = {
     readonly context_length: number,
     readonly embedding_length: number,
@@ -394,7 +394,7 @@ export type GgufMetadataMPT = {
     }
 };
 
-// source: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gpt-neox
+// source: https://github.com/ggml-org/ggml/blob/master/docs/gguf.md#gpt-neox
 export type GgufMetadataGPTNeoX = {
     readonly context_length: number,
     readonly embedding_length: number,
@@ -411,7 +411,7 @@ export type GgufMetadataGPTNeoX = {
     }
 };
 
-// source: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gpt-j
+// source: https://github.com/ggml-org/ggml/blob/master/docs/gguf.md#gpt-j
 export type GgufMetadataGPTJ = {
     readonly context_length: number,
     readonly embedding_length: number,
@@ -426,7 +426,7 @@ export type GgufMetadataGPTJ = {
     }
 };
 
-// source: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gpt-2
+// source: https://github.com/ggml-org/ggml/blob/master/docs/gguf.md#gpt-2
 export type GgufMetadataGPT2 = {
     readonly context_length: number,
     readonly embedding_length: number,
@@ -437,7 +437,7 @@ export type GgufMetadataGPT2 = {
     }
 };
 
-// source: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#bloom
+// source: https://github.com/ggml-org/ggml/blob/master/docs/gguf.md#bloom
 export type GgufMetadataBloom = {
     readonly context_length: number,
     readonly embedding_length: number,
@@ -449,7 +449,7 @@ export type GgufMetadataBloom = {
     }
 };
 
-// source: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#falcon
+// source: https://github.com/ggml-org/ggml/blob/master/docs/gguf.md#falcon
 export type GgufMetadataFalcon = {
     readonly context_length: number,
     readonly embedding_length: number,
@@ -463,7 +463,7 @@ export type GgufMetadataFalcon = {
     readonly tensor_data_layout?: string
 };
 
-// source: https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#mamba
+// source: https://github.com/ggml-org/ggml/blob/master/docs/gguf.md#mamba
 export type GgufMetadataMamba = {
     readonly context_length: number,
     readonly embedding_length: number,
diff --git a/src/utils/parseModelUri.ts b/src/utils/parseModelUri.ts
@@ -213,7 +213,7 @@ async function fetchHuggingFaceModelManifest({
                     ...headers,
 
                     // we need this to get the `ggufFile` field in the response
-                    // https://github.com/ggerganov/llama.cpp/pull/11195
+                    // https://github.com/ggml-org/llama.cpp/pull/11195
                     "User-Agent": "llama-cpp"
                 },
                 signal