Skip to content

Commit 0429cdc

Browse files
ngxsonVaibhavs10pcuenca
authored
Add @huggingface/ollama-utils (#1111)
With the ollama compatibility layer already been up and running on HF hub for a while now, we want to open-source part of the integration for (1) provide more transparency, and (2) for encouraging the community to contribute. Therefore, we decided to publish `@huggingface/ollama-utils`, a package containing tools that power this integration. For now, the only tool that we provide is `chat-template.ts`, a tool that takes a parsed GGUF config from `@huggingface/gguf` as input, then returns the ollama Go template. # chat-template.ts This module expose one single `convertGGUFTemplateToOllama` function. It works by trying (by order) these mechanisms: 1. Try to find a matched template already exist on ollama hub. To make this possible, we regularly dump a list of public templates from ollama.com using `scripts/generate-automap.ts` 2. If no exact match is find, we still check against the list of official templates, but now only match a list of special tokens. These tokens are extracted using regex `RE_SPECIAL_TOKEN` 3. If we still can't find a match, we search inside `CUSTOM_TEMPLATE_MAPPING`, a list of hand-picked template mapping 4. If all above still failed, we try parsing jinja template and try converting it to Go template. This works almost all of the time, except if the template is not supported by `@huggingface/jinja`. See `convertJinjaToGoTemplate` --------- Co-authored-by: vb <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]>
1 parent 38d13dd commit 0429cdc

17 files changed

+1678
-2
lines changed

.github/workflows/documentation.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,6 @@ jobs:
2222
package: huggingface.js
2323
path_to_docs: huggingface.js/docs
2424
additional_args: --not_python_module
25-
pre_command: corepack enable && cd huggingface.js && pnpm install && pnpm -r build && pnpm --filter doc-internal start
25+
pre_command: npm install -g corepack@latest && corepack enable && cd huggingface.js && pnpm install && pnpm -r build && pnpm --filter doc-internal start
2626
secrets:
2727
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}

.github/workflows/pr-documentation.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,5 @@ jobs:
2222
pr_number: ${{ github.event.number }}
2323
package: huggingface.js
2424
path_to_docs: huggingface.js/docs
25-
pre_command: corepack enable && cd huggingface.js && pnpm install && pnpm -r build && pnpm --filter doc-internal start
25+
pre_command: npm install -g corepack@latest && corepack enable && cd huggingface.js && pnpm install && pnpm -r build && pnpm --filter doc-internal start
2626
additional_args: --not_python_module

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ This is a collection of JS libraries to interact with the Hugging Face API, with
6262
- [@huggingface/tasks](packages/tasks/README.md): The definition files and source-of-truth for the Hub's main primitives like pipeline tasks, model libraries, etc.
6363
- [@huggingface/jinja](packages/jinja/README.md): A minimalistic JS implementation of the Jinja templating engine, to be used for ML chat templates.
6464
- [@huggingface/space-header](packages/space-header/README.md): Use the Space `mini_header` outside Hugging Face
65+
- [@huggingface/ollama-utils](packages/ollama-utils/README.md): Various utilities for maintaining Ollama compatibility with models on the Hugging Face Hub.
6566

6667

6768
We use modern features to avoid polyfills and dependencies, so the libraries will only work on modern browsers / Node.js >= 18 / Bun / Deno.

packages/ollama-utils/.eslintignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
dist

packages/ollama-utils/.prettierignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pnpm-lock.yaml
2+
# In order to avoid code samples to have tabs, they don't display well on npm
3+
README.md
4+
dist
5+
src/automap.ts

packages/ollama-utils/README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# `@huggingface/ollama-utils`
2+
3+
Various utilities for maintaining [Ollama compatibility with GGUF models on the Hugging Face Hub](https://huggingface.co/docs/hub/en/ollama).
4+
5+
For now, we are exposing chat template conversion to the Go format used by Ollama.
6+
7+
## Chat template converter
8+
9+
```ts
10+
import { convertJinjaToGoTemplate } from "@huggingface/ollama-utils";
11+
12+
const MODEL_INFO_URL = "https://huggingface.co/api/models/bartowski/Llama-3.2-3B-Instruct-GGUF?expand[]=gguf";
13+
const modelInfo = await (await fetch(MODEL_INFO_URL)).json();
14+
console.log(modelInfo);
15+
/**
16+
* {
17+
* gguf: {
18+
* chat_template: "here is the Jinja chat template",
19+
* bos_token: "...",
20+
* eos_token: "...",
21+
* [...]
22+
* }
23+
* }
24+
*/
25+
const convertedTemplate = convertJinjaToGoTemplate(modelInfo.gguf);
26+
if (convertedTemplate) {
27+
console.log(convertedTemplate.ollama);
28+
/**
29+
* {
30+
* template: "this is the converted template, compatible with Ollama",
31+
* tokens: [... list of special tokens],
32+
* params: {
33+
* stop: [... list of stop tokens or stop words]
34+
* }
35+
* }
36+
*/
37+
} else {
38+
console.error("Conversion is not successful");
39+
}
40+
```
41+
42+
## How can I add a custom template?
43+
44+
Most templates will be converted automatically. You can debug the output template using:
45+
- This space to retrieve the converted template: https://huggingface.co/spaces/ngxson/debug_ollama_manifest
46+
- And this space to apply the Go template into a list of messages: https://huggingface.co/spaces/ngxson/ollama_template_test
47+
48+
Please only add a new template when the conversion process above is not successful. Cases that are acceptable include:
49+
- The converted template is wrong
50+
- The Jinja template is not compatible with `@huggingface/jinja`
51+
- The Jinja template is not "linear," meaning it can modify the content of other messages or append dynamic postfixes. For instance, the DeepSeek template removes `<think>...</think>` from previous messages in a conversation, making it non-linear. Another example is a template that adds the EOS token `</s>` when `add_generation_prompt=False`.
52+
53+
To add a new custom handler:
54+
1. Edit the list of `CUSTOM_TEMPLATE_MAPPING` inside `chat-template.ts`
55+
2. Add a new test case in `chat-template.spec.ts`
56+
3. Push your change to a new PR.

packages/ollama-utils/package.json

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
{
2+
"name": "@huggingface/ollama-utils",
3+
"packageManager": "[email protected]",
4+
"version": "0.0.1",
5+
"description": "Various utilities for maintaining Ollama compatibility with models on Hugging Face hub",
6+
"repository": "https://github.com/huggingface/huggingface.js.git",
7+
"publishConfig": {
8+
"access": "public"
9+
},
10+
"main": "./dist/index.js",
11+
"module": "./dist/index.mjs",
12+
"types": "./dist/index.d.ts",
13+
"exports": {
14+
".": {
15+
"types": "./dist/index.d.ts",
16+
"require": "./dist/index.js",
17+
"import": "./dist/index.mjs"
18+
}
19+
},
20+
"browser": {
21+
"./src/utils/FileBlob.ts": false,
22+
"./dist/index.js": "./dist/browser/index.js",
23+
"./dist/index.mjs": "./dist/browser/index.mjs"
24+
},
25+
"engines": {
26+
"node": ">=20"
27+
},
28+
"source": "index.ts",
29+
"scripts": {
30+
"lint": "eslint --quiet --fix --ext .cjs,.ts .",
31+
"lint:check": "eslint --ext .cjs,.ts .",
32+
"format": "prettier --write .",
33+
"format:check": "prettier --check .",
34+
"prepublishOnly": "pnpm run build",
35+
"build": "tsup src/index.ts --format cjs,esm --clean && tsc --emitDeclarationOnly --declaration",
36+
"build:automap": "tsx scripts/generate-automap.ts && prettier --write ./src/chat-template-automap.ts",
37+
"test": "vitest run",
38+
"check": "tsc"
39+
},
40+
"files": [
41+
"dist",
42+
"src",
43+
"tsconfig.json"
44+
],
45+
"keywords": [
46+
"huggingface",
47+
"hub",
48+
"gguf"
49+
],
50+
"author": "Hugging Face",
51+
"license": "MIT",
52+
"dependencies": {
53+
"@huggingface/jinja": "workspace:^"
54+
},
55+
"devDependencies": {
56+
"@types/node": "^20.12.8"
57+
}
58+
}

packages/ollama-utils/pnpm-lock.yaml

Lines changed: 27 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
/**
2+
* Script for generating llm.ts
3+
* The source data is taken from llama.cpp
4+
*/
5+
6+
import type { GGUFParseOutput } from "../../gguf/src/gguf";
7+
import { gguf } from "../../gguf/src/gguf";
8+
import { appendFileSync, writeFileSync, existsSync } from "node:fs";
9+
import path from "node:path";
10+
11+
const DEBUG = process.env.DEBUG;
12+
const RE_SPECIAL_TOKEN = /<[|_A-Za-z0-9]+>|\[[A-Z]+\]|<\uFF5C[\u2581A-Za-z]+\uFF5C>/g;
13+
const MAX_NUMBER_OF_TAGS_PER_MODEL = 5;
14+
const N_WORKERS = 16;
15+
const OUTPUT_FILE = path.join(__dirname, "../src/chat-template-automap.ts");
16+
const BLACKLISTED_MODELS = (model: string, tag: string) => {
17+
// some models are know to give ServiceUnavailable
18+
return model === "library/deepseek-r1" && tag === "7b";
19+
};
20+
21+
interface OutputItem {
22+
model: string;
23+
gguf: string;
24+
ollama: {
25+
template: string;
26+
tokens: string[];
27+
// eslint-disable-next-line
28+
params?: any;
29+
};
30+
}
31+
32+
interface OllamaManifestLayer {
33+
digest: string;
34+
mediaType: string;
35+
size: number;
36+
}
37+
38+
interface OllamaManifest {
39+
layers: OllamaManifestLayer[];
40+
}
41+
42+
const getSpecialTokens = (tmpl: string): string[] => {
43+
const matched = tmpl.match(RE_SPECIAL_TOKEN);
44+
const tokens = Array.from(matched || []);
45+
return Array.from(new Set(tokens)); // deduplicate
46+
};
47+
48+
(async () => {
49+
if (DEBUG) writeFileSync("ollama_tmp.jsonl", ""); // clear the file
50+
51+
const models: string[] = [];
52+
const output: OutputItem[] = [];
53+
54+
const html = await (await fetch("https://ollama.com/library")).text();
55+
const matched = html.match(/href="\/library\/[^"]+/g);
56+
if (!matched) {
57+
throw new Error("cannot find any model url");
58+
}
59+
for (let i = 0; i < matched.length; i++) {
60+
models.push(matched[i].replace('href="/', ""));
61+
}
62+
console.log({ models });
63+
64+
//////// Get tags ////////
65+
66+
let nDoing = 0;
67+
let nAll = models.length;
68+
const modelsWithTag: string[] = [];
69+
const workerGetTags = async () => {
70+
while (true) {
71+
const model = models.shift();
72+
if (!model) return;
73+
nDoing++;
74+
console.log(`Getting tags ${nDoing} / ${nAll}`);
75+
const html = await (await fetch(`https://ollama.com/${model}`)).text();
76+
const matched = html.match(/href="\/library\/[^"]+/g);
77+
if (!matched) {
78+
throw new Error("cannot find any tag url");
79+
}
80+
for (let i = 0; i < matched.length && i < MAX_NUMBER_OF_TAGS_PER_MODEL; i++) {
81+
const midAndTag: string = matched[i].replace('href="/', "");
82+
if (midAndTag.match(/:/) && !midAndTag.match(/\/blobs/)) {
83+
modelsWithTag.push(midAndTag);
84+
}
85+
}
86+
}
87+
};
88+
await Promise.all(
89+
Array(N_WORKERS)
90+
.fill(null)
91+
.map(() => workerGetTags())
92+
);
93+
console.log({ modelsWithTag });
94+
95+
//////// merging with old file if necessary ////////
96+
97+
const seenGGUFTemplate = new Set<string>();
98+
if (existsSync(OUTPUT_FILE)) {
99+
const oldOutput = await import(OUTPUT_FILE);
100+
oldOutput.OLLAMA_CHAT_TEMPLATE_MAPPING.forEach((item: OutputItem) => {
101+
seenGGUFTemplate.add(item.gguf);
102+
output.push(item);
103+
});
104+
}
105+
106+
//////// Get template ////////
107+
108+
nDoing = 0;
109+
nAll = modelsWithTag.length;
110+
const workerGetTemplate = async () => {
111+
while (true) {
112+
const modelWithTag = modelsWithTag.shift();
113+
if (!modelWithTag) return;
114+
115+
nDoing++;
116+
const [model, tag] = modelWithTag.split(":");
117+
console.log(`Fetch template ${nDoing} / ${nAll} | model=${model} tag=${tag}`);
118+
const getBlobUrl = (digest: string) => `https://registry.ollama.com/v2/${model}/blobs/${digest}`;
119+
const manifest: OllamaManifest = await (
120+
await fetch(`https://registry.ollama.com/v2/${model}/manifests/${tag}`)
121+
).json();
122+
if (!manifest.layers) {
123+
console.log(" --> [X] No layers");
124+
continue;
125+
}
126+
const layerModelUrl = manifest.layers.find((l) => l.mediaType.match(/\.model/));
127+
if (!layerModelUrl) {
128+
console.log(" --> [X] No model is found");
129+
continue;
130+
}
131+
const modelUrl = getBlobUrl(layerModelUrl.digest);
132+
let ggufData: GGUFParseOutput;
133+
if (BLACKLISTED_MODELS(model, tag)) {
134+
console.log(" --> [X] Blacklisted model, skip");
135+
continue;
136+
}
137+
try {
138+
ggufData = await gguf(modelUrl);
139+
} catch (e) {
140+
console.log(" --> [X] FATAL: GGUF error", { model, tag, modelUrl });
141+
throw e; // rethrow
142+
}
143+
const { metadata } = ggufData;
144+
const ggufTmpl = metadata["tokenizer.chat_template"];
145+
if (ggufTmpl) {
146+
if (seenGGUFTemplate.has(ggufTmpl)) {
147+
console.log(" --> Already seen this GGUF template, skip...");
148+
continue;
149+
}
150+
seenGGUFTemplate.add(ggufTmpl);
151+
console.log(" --> GGUF chat template OK");
152+
const tmplBlob = manifest.layers.find((l) => l.mediaType.match(/\.template/));
153+
if (!tmplBlob) continue;
154+
const ollamaTmplUrl = getBlobUrl(tmplBlob.digest);
155+
if (!ollamaTmplUrl) {
156+
console.log(" --> [X] No ollama template");
157+
continue;
158+
}
159+
const ollamaTmpl = await (await fetch(ollamaTmplUrl)).text();
160+
console.log(" --> All OK");
161+
const record: OutputItem = {
162+
model: modelWithTag,
163+
gguf: ggufTmpl,
164+
ollama: {
165+
template: ollamaTmpl,
166+
tokens: getSpecialTokens(ggufTmpl),
167+
},
168+
};
169+
// get params
170+
const ollamaParamsBlob = manifest.layers.find((l) => l.mediaType.match(/\.params/));
171+
const ollamaParamsUrl = ollamaParamsBlob ? getBlobUrl(ollamaParamsBlob.digest) : null;
172+
if (ollamaParamsUrl) {
173+
console.log(" --> Got params");
174+
record.ollama.params = await (await fetch(ollamaParamsUrl)).json();
175+
}
176+
output.push(record);
177+
if (DEBUG) appendFileSync("ollama_tmp.jsonl", JSON.stringify(record) + "\n");
178+
} else {
179+
console.log(" --> [X] No GGUF template");
180+
continue;
181+
}
182+
//console.log({modelUrl, ggufData});
183+
//break;
184+
}
185+
};
186+
187+
await Promise.all(
188+
Array(N_WORKERS)
189+
.fill(null)
190+
.map(() => workerGetTemplate())
191+
);
192+
193+
console.log("DONE");
194+
output.sort((a, b) => a.model.localeCompare(b.model));
195+
196+
writeFileSync(
197+
OUTPUT_FILE,
198+
`
199+
// This file is auto generated, please do not modify manually
200+
// To update it, run "pnpm run build:automap"
201+
202+
import { OllamaChatTemplateMapEntry } from "./types";
203+
204+
export const OLLAMA_CHAT_TEMPLATE_MAPPING: OllamaChatTemplateMapEntry[] = ${JSON.stringify(output, null, "\t")};
205+
`.trim()
206+
);
207+
})();

0 commit comments

Comments
 (0)