Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ const { generated_text } = await gpt2.textGeneration({inputs: 'The answer to the

// Chat Completion
const llamaEndpoint = inference.endpoint(
"https://api-inference.huggingface.co/models/meta-llama/Llama-3.1-8B-Instruct"
"https://router.huggingface.co/together/models/meta-llama/Llama-3.1-8B-Instruct"
Copy link
Contributor

@hanouticelina hanouticelina Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hf-inference instead of together here? since we have models/

);
const out = await llamaEndpoint.chatCompletion({
model: "meta-llama/Llama-3.1-8B-Instruct",
Expand Down
4 changes: 2 additions & 2 deletions packages/inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ for await (const output of hf.textGenerationStream({

### Text Generation (Chat Completion API Compatible)

Using the `chatCompletion` method, you can generate text with models compatible with the OpenAI Chat Completion API. All models served by [TGI](https://api-inference.huggingface.co/framework/text-generation-inference) on Hugging Face support Messages API.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda killed that URL (the /framework one), prob not really relevant anymore anyways cc @XciD @Wauplin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep it for a bit at least for backward compatibility 🙏 (but yeah, no need for it anymore)

huggingface_hub has a method relying on it

Using the `chatCompletion` method, you can generate text with models compatible with the OpenAI Chat Completion API. All models served by [TGI](https://huggingface.co/docs/text-generation-inference/) on Hugging Face support Messages API.

[Demo](https://huggingface.co/spaces/huggingfacejs/streaming-chat-completion)

Expand Down Expand Up @@ -611,7 +611,7 @@ const { generated_text } = await gpt2.textGeneration({inputs: 'The answer to the

// Chat Completion Example
const ep = hf.endpoint(
"https://api-inference.huggingface.co/models/meta-llama/Llama-3.1-8B-Instruct"
"https://router.huggingface.co/together/models/meta-llama/Llama-3.1-8B-Instruct"
);
const stream = ep.chatCompletionStream({
model: "tgi",
Expand Down
1 change: 1 addition & 0 deletions packages/inference/src/config.ts
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
export const HF_HUB_URL = "https://huggingface.co";
export const HF_ROUTER_URL = "https://router.huggingface.co";
4 changes: 2 additions & 2 deletions packages/inference/src/lib/makeRequestOptions.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { HF_HUB_URL } from "../config";
import { HF_HUB_URL, HF_ROUTER_URL } from "../config";
import { FAL_AI_API_BASE_URL } from "../providers/fal-ai";
import { REPLICATE_API_BASE_URL } from "../providers/replicate";
import { SAMBANOVA_API_BASE_URL } from "../providers/sambanova";
Expand All @@ -9,7 +9,7 @@ import { isUrl } from "./isUrl";
import { version as packageVersion, name as packageName } from "../../package.json";
import { getProviderModelId } from "./getProviderModelId";

const HF_HUB_INFERENCE_PROXY_TEMPLATE = `${HF_HUB_URL}/api/inference-proxy/{{PROVIDER}}`;
const HF_HUB_INFERENCE_PROXY_TEMPLATE = `${HF_ROUTER_URL}/{{PROVIDER}}`;

/**
* Lazy-loaded from huggingface.co/api/tasks when needed
Expand Down
4 changes: 2 additions & 2 deletions packages/inference/test/vcr.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { omit } from "../src/utils/omit";
import { HF_HUB_URL } from "../src/config";
import { HF_HUB_URL, HF_ROUTER_URL } from "../src/config";
import { isBackend } from "../src/utils/isBackend";
import { isFrontend } from "../src/utils/isFrontend";

Expand Down Expand Up @@ -117,7 +117,7 @@ async function vcr(

const { default: tapes } = await import(TAPES_FILE);

const cacheCandidate = !url.startsWith(HF_HUB_URL) || url.startsWith(`${HF_HUB_URL}/api/inference-proxy/`);
const cacheCandidate = !url.startsWith(HF_HUB_URL) || url.startsWith(HF_ROUTER_URL);

if (VCR_MODE === MODE.PLAYBACK && cacheCandidate) {
if (!tapes[hash]) {
Expand Down
Loading