Skip to content

Commit 8db9fd9

Browse files
committed
Merge branch 'main' into xsn/ollama_utils
2 parents f259715 + cf160c7 commit 8db9fd9

File tree

161 files changed

+6994
-1916
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

161 files changed

+6994
-1916
lines changed

.github/workflows/inference-publish.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ jobs:
5454
git tag "inference-v$BUMPED_VERSION"
5555
5656
- name: "Check Deps are published before publishing this package"
57-
run: pnpm -w check-deps gguf
57+
run: pnpm -w check-deps tasks
5858

5959
- run: pnpm publish --no-git-checks .
6060
env:

.github/workflows/test.yml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,10 @@ jobs:
4141
run: VCR_MODE=playback pnpm --filter ...[${{ steps.since.outputs.SINCE }}] test
4242
env:
4343
HF_TOKEN: ${{ secrets.HF_TOKEN }}
44+
HF_FAL_KEY: dummy
45+
HF_REPLICATE_KEY: dummy
46+
HF_SAMBANOVA_KEY: dummy
47+
HF_TOGETHER_KEY: dummy
4448

4549
browser:
4650
runs-on: ubuntu-latest
@@ -77,6 +81,10 @@ jobs:
7781
run: VCR_MODE=playback pnpm --filter ...[${{ steps.since.outputs.SINCE }}] test:browser
7882
env:
7983
HF_TOKEN: ${{ secrets.HF_TOKEN }}
84+
HF_FAL_KEY: dummy
85+
HF_REPLICATE_KEY: dummy
86+
HF_SAMBANOVA_KEY: dummy
87+
HF_TOGETHER_KEY: dummy
8088

8189
e2e:
8290
runs-on: ubuntu-latest
@@ -140,3 +148,7 @@ jobs:
140148
env:
141149
NPM_CONFIG_REGISTRY: http://localhost:4874/
142150
HF_TOKEN: ${{ secrets.HF_TOKEN }}
151+
HF_FAL_KEY: dummy
152+
HF_REPLICATE_KEY: dummy
153+
HF_SAMBANOVA_KEY: dummy
154+
HF_TOGETHER_KEY: dummy

CODEOWNERS

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Ownership for the Inference Package
22

3-
/packages/inference/ @vvmnnnkv @radames
3+
/packages/inference/ @julien-c @hanouticelina @SBrandeis @coyotte508
44

55
# Ownership for the Tasks Package
66

README.md

Lines changed: 32 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
// Programatically interact with the Hub
1414

1515
await createRepo({
16-
repo: {type: "model", name: "my-user/nlp-model"},
16+
repo: { type: "model", name: "my-user/nlp-model" },
1717
accessToken: HF_TOKEN
1818
});
1919

@@ -27,7 +27,7 @@ await uploadFile({
2727
}
2828
});
2929

30-
// Use Inference API
30+
// Use HF Inference API, or external Inference Providers!
3131

3232
await inference.chatCompletion({
3333
model: "meta-llama/Llama-3.1-8B-Instruct",
@@ -39,6 +39,7 @@ await inference.chatCompletion({
3939
],
4040
max_tokens: 512,
4141
temperature: 0.5,
42+
provider: "sambanova", // or together, fal-ai, replicate, …
4243
});
4344

4445
await inference.textToImage({
@@ -53,11 +54,13 @@ await inference.textToImage({
5354

5455
This is a collection of JS libraries to interact with the Hugging Face API, with TS types included.
5556

56-
- [@huggingface/inference](packages/inference/README.md): Use Inference Endpoints (dedicated) and Inference API (serverless) to make calls to 100,000+ Machine Learning models
57+
- [@huggingface/inference](packages/inference/README.md): Use Inference API (serverless), Inference Endpoints (dedicated) and third-party Inference providers to make calls to 100,000+ Machine Learning models
5758
- [@huggingface/hub](packages/hub/README.md): Interact with huggingface.co to create or delete repos and commit / download files
5859
- [@huggingface/agents](packages/agents/README.md): Interact with HF models through a natural language interface
5960
- [@huggingface/gguf](packages/gguf/README.md): A GGUF parser that works on remotely hosted files.
61+
- [@huggingface/dduf](packages/dduf/README.md): Similar package for DDUF (DDUF Diffusers Unified Format)
6062
- [@huggingface/tasks](packages/tasks/README.md): The definition files and source-of-truth for the Hub's main primitives like pipeline tasks, model libraries, etc.
63+
- [@huggingface/jinja](packages/jinja/README.md): A minimalistic JS implementation of the Jinja templating engine, to be used for ML chat templates.
6164
- [@huggingface/space-header](packages/space-header/README.md): Use the Space `mini_header` outside Hugging Face
6265
- [@huggingface/ollama-utils](packages/ollama-utils/README.md): Various utilities for maintaining Ollama compatibility with models on Hugging Face hub.
6366

@@ -93,7 +96,7 @@ You can run our packages with vanilla JS, without any bundler, by using a CDN or
9396

9497
```html
9598
<script type="module">
96-
import { HfInference } from 'https://cdn.jsdelivr.net/npm/@huggingface/inference@2.8.1/+esm';
99+
import { HfInference } from 'https://cdn.jsdelivr.net/npm/@huggingface/inference@3.1.2/+esm';
97100
import { createRepo, commit, deleteRepo, listFiles } from "https://cdn.jsdelivr.net/npm/@huggingface/[email protected]/+esm";
98101
</script>
99102
```
@@ -143,6 +146,22 @@ for await (const chunk of inference.chatCompletionStream({
143146
console.log(chunk.choices[0].delta.content);
144147
}
145148

149+
/// Using a third-party provider:
150+
await inference.chatCompletion({
151+
model: "meta-llama/Llama-3.1-8B-Instruct",
152+
messages: [{ role: "user", content: "Hello, nice to meet you!" }],
153+
max_tokens: 512,
154+
provider: "sambanova", // or together, fal-ai, replicate, …
155+
})
156+
157+
await inference.textToImage({
158+
model: "black-forest-labs/FLUX.1-dev",
159+
inputs: "a picture of a green bird",
160+
provider: "fal-ai",
161+
})
162+
163+
164+
146165
// You can also omit "model" to use the recommended model for the task
147166
await inference.translation({
148167
inputs: "My name is Wolfgang and I live in Amsterdam",
@@ -152,28 +171,24 @@ await inference.translation({
152171
},
153172
});
154173

155-
await inference.textToImage({
156-
model: 'black-forest-labs/FLUX.1-dev',
157-
inputs: 'a picture of a green bird',
158-
})
159-
174+
// pass multimodal files or URLs as inputs
160175
await inference.imageToText({
176+
model: 'nlpconnect/vit-gpt2-image-captioning',
161177
data: await (await fetch('https://picsum.photos/300/300')).blob(),
162-
model: 'nlpconnect/vit-gpt2-image-captioning',
163178
})
164179

165180
// Using your own dedicated inference endpoint: https://hf.co/docs/inference-endpoints/
166181
const gpt2 = inference.endpoint('https://xyz.eu-west-1.aws.endpoints.huggingface.cloud/gpt2');
167182
const { generated_text } = await gpt2.textGeneration({inputs: 'The answer to the universe is'});
168183

169-
//Chat Completion
184+
// Chat Completion
170185
const llamaEndpoint = inference.endpoint(
171186
"https://api-inference.huggingface.co/models/meta-llama/Llama-3.1-8B-Instruct"
172187
);
173188
const out = await llamaEndpoint.chatCompletion({
174-
model: "meta-llama/Llama-3.1-8B-Instruct",
175-
messages: [{ role: "user", content: "Hello, nice to meet you!" }],
176-
max_tokens: 512,
189+
model: "meta-llama/Llama-3.1-8B-Instruct",
190+
messages: [{ role: "user", content: "Hello, nice to meet you!" }],
191+
max_tokens: 512,
177192
});
178193
console.log(out.choices[0].message);
179194
```
@@ -186,7 +201,7 @@ import { createRepo, uploadFile, deleteFiles } from "@huggingface/hub";
186201
const HF_TOKEN = "hf_...";
187202

188203
await createRepo({
189-
repo: "my-user/nlp-model", // or {type: "model", name: "my-user/nlp-test"},
204+
repo: "my-user/nlp-model", // or { type: "model", name: "my-user/nlp-test" },
190205
accessToken: HF_TOKEN
191206
});
192207

@@ -201,7 +216,7 @@ await uploadFile({
201216
});
202217

203218
await deleteFiles({
204-
repo: {type: "space", name: "my-user/my-space"}, // or "spaces/my-user/my-space"
219+
repo: { type: "space", name: "my-user/my-space" }, // or "spaces/my-user/my-space"
205220
accessToken: HF_TOKEN,
206221
paths: ["README.md", ".gitattributes"]
207222
});
@@ -210,7 +225,7 @@ await deleteFiles({
210225
### @huggingface/agents example
211226

212227
```ts
213-
import {HfAgent, LLMFromHub, defaultTools} from '@huggingface/agents';
228+
import { HfAgent, LLMFromHub, defaultTools } from '@huggingface/agents';
214229

215230
const HF_TOKEN = "hf_...";
216231

packages/agents/pnpm-lock.yaml

Lines changed: 12 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

packages/gguf/src/gguf.spec.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,4 +283,9 @@ describe("gguf", () => {
283283
expect(parseGGUFQuantLabel("Codestral-22B-v0.1-IQ3_XS.gguf")).toEqual(undefined); // TODO: investigate IQ3_XS
284284
expect(parseGGUFQuantLabel("Codestral-22B-v0.1-Q4_0_4_4.gguf")).toEqual("Q4_0"); // TODO: investigate Q4_0_4_4
285285
});
286+
287+
it("calculate tensor data offset", async () => {
288+
const { tensorDataOffset } = await gguf(URL_LLAMA);
289+
expect(tensorDataOffset).toEqual(741056n);
290+
});
286291
});

packages/gguf/src/gguf.ts

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ export { parseGGUFQuantLabel, GGUF_QUANT_RE, GGUF_QUANT_RE_GLOBAL } from "@huggi
1010

1111
export const RE_GGUF_FILE = /\.gguf$/;
1212
export const RE_GGUF_SHARD_FILE = /^(?<prefix>.*?)-(?<shard>\d{5})-of-(?<total>\d{5})\.gguf$/;
13+
const GGUF_DEFAULT_ALIGNMENT = 32; // defined in ggml.h
14+
const GGML_PAD = (x: number, n: number) => (x + n - 1) & ~(n - 1); // defined in ggml.h
1315
const PARALLEL_DOWNLOADS = 20;
1416

1517
export interface GgufShardFileInfo {
@@ -384,14 +386,18 @@ export async function gguf(
384386
});
385387
}
386388

389+
// calculate absolute offset of tensor data
390+
const alignment: number = Number(metadata["general.alignment"] ?? GGUF_DEFAULT_ALIGNMENT);
391+
const tensorDataOffset = BigInt(GGML_PAD(offset, alignment));
392+
387393
if (params?.computeParametersCount) {
388394
const parameterCount = tensorInfos
389395
.map(({ shape }) => shape.reduce((acc, val) => acc * Number(val), 1))
390396
.reduce((acc, val) => acc + val, 0);
391397

392-
return { metadata, tensorInfos, parameterCount };
398+
return { metadata, tensorInfos, tensorDataOffset, parameterCount };
393399
} else {
394-
return { metadata, tensorInfos };
400+
return { metadata, tensorInfos, tensorDataOffset };
395401
}
396402
}
397403

@@ -429,7 +435,10 @@ export async function ggufAllShards(
429435
parameterCount: shards.map(({ parameterCount }) => parameterCount).reduce((acc, val) => acc + val, 0),
430436
};
431437
} else {
432-
const { metadata, tensorInfos, parameterCount } = await gguf(url, { ...params, computeParametersCount: true });
433-
return { shards: [{ metadata, tensorInfos }], parameterCount };
438+
const { metadata, tensorInfos, tensorDataOffset, parameterCount } = await gguf(url, {
439+
...params,
440+
computeParametersCount: true,
441+
});
442+
return { shards: [{ metadata, tensorInfos, tensorDataOffset }], parameterCount };
434443
}
435444
}

packages/gguf/src/types.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,4 +155,5 @@ export interface GGUFTensorInfo {
155155
export interface GGUFParseOutput<Options extends GGUFMetadataOptions = { strict: true }> {
156156
metadata: GGUFMetadata<Options>;
157157
tensorInfos: GGUFTensorInfo[];
158+
tensorDataOffset: bigint;
158159
}

packages/inference/LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2022 Tim Mikeladze
3+
Copyright (c) 2022 Tim Mikeladze and the Hugging Face team
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

packages/inference/README.md

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
# 🤗 Hugging Face Inference Endpoints
1+
# 🤗 Hugging Face Inference
22

3-
A Typescript powered wrapper for the Hugging Face Inference Endpoints API. Learn more about Inference Endpoints at [Hugging Face](https://huggingface.co/inference-endpoints).
4-
It works with both [Inference API (serverless)](https://huggingface.co/docs/api-inference/index) and [Inference Endpoints (dedicated)](https://huggingface.co/docs/inference-endpoints/index).
3+
A Typescript powered wrapper for the Hugging Face Inference API (serverless), Inference Endpoints (dedicated), and third-party Inference Providers.
4+
It works with [Inference API (serverless)](https://huggingface.co/docs/api-inference/index) and [Inference Endpoints (dedicated)](https://huggingface.co/docs/inference-endpoints/index), and even with supported third-party Inference Providers.
55

66
Check out the [full documentation](https://huggingface.co/docs/huggingface.js/inference/README).
77

@@ -42,7 +42,40 @@ const hf = new HfInference('your access token')
4242

4343
Your access token should be kept private. If you need to protect it in front-end applications, we suggest setting up a proxy server that stores the access token.
4444

45-
#### Tree-shaking
45+
### Third-party inference providers
46+
47+
You can send inference requests to third-party providers with the inference client.
48+
49+
Currently, we support the following providers: [Fal.ai](https://fal.ai), [Replicate](https://replicate.com), [Together](https://together.xyz) and [Sambanova](https://sambanova.ai).
50+
51+
To send requests to a third-party provider, you have to pass the `provider` parameter to the inference function. Make sure your request is authenticated with an access token.
52+
```ts
53+
const accessToken = "hf_..."; // Either a HF access token, or an API key from the third-party provider (Replicate in this example)
54+
55+
const client = new HfInference(accessToken);
56+
await client.textToImage({
57+
provider: "replicate",
58+
model:"black-forest-labs/Flux.1-dev",
59+
inputs: "A black forest cake"
60+
})
61+
```
62+
63+
When authenticated with a Hugging Face access token, the request is routed through https://huggingface.co.
64+
When authenticated with a third-party provider key, the request is made directly against that provider's inference API.
65+
66+
Only a subset of models are supported when requesting third-party providers. You can check the list of supported models per pipeline tasks here:
67+
- [Fal.ai supported models](./src/providers/fal-ai.ts)
68+
- [Replicate supported models](./src/providers/replicate.ts)
69+
- [Sambanova supported models](./src/providers/sambanova.ts)
70+
- [Together supported models](./src/providers/together.ts)
71+
- [HF Inference API (serverless)](https://huggingface.co/models?inference=warm&sort=trending)
72+
73+
**Important note:** To be compatible, the third-party API must adhere to the "standard" shape API we expect on HF model pages for each pipeline task type.
74+
This is not an issue for LLMs as everyone converged on the OpenAI API anyways, but can be more tricky for other tasks like "text-to-image" or "automatic-speech-recognition" where there exists no standard API. Let us know if any help is needed or if we can make things easier for you!
75+
76+
👋**Want to add another provider?** Get in touch if you'd like to add support for another Inference provider, and/or request it on https://huggingface.co/spaces/huggingface/HuggingDiscussions/discussions/49
77+
78+
### Tree-shaking
4679

4780
You can import the functions you need directly from the module instead of using the `HfInference` class.
4881

0 commit comments

Comments
 (0)