-
Notifications
You must be signed in to change notification settings - Fork 529
[Inference Providers] revamp documentation #1457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2. [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index): a product to easily deploy models to production. Inference is run by Hugging Face in a dedicated, fully managed infrastructure on a cloud provider of your choice. | ||
3. Local endpoints: you can also run inference with local inference servers like [llama.cpp](https://github.com/ggerganov/llama.cpp), [Ollama](https://ollama.com/), [vLLM](https://github.com/vllm-project/vllm), [LiteLLM](https://docs.litellm.ai/docs/simple_proxy), or [Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) by connecting the client to these local endpoints. | ||
|
||
You can also try out a live [interactive notebook](https://observablehq.com/@huggingface/hello-huggingface-js-inference), see some demos on [hf.co/huggingfacejs](https://huggingface.co/huggingfacejs), or watch a [Scrimba tutorial that explains how Inference Endpoints works](https://scrimba.com/scrim/cod8248f5adfd6e129582c523). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed broken/outdated demos link
This task reads some text and outputs raw float values, that are usually consumed as part of a semantic database/semantic search. | ||
|
||
```typescript | ||
const openai = new InferenceClient(OPENAI_TOKEN).endpoint("https://api.openai.com"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this imo + i added a tip line 635
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great!
## Using local endpoints | ||
|
||
You can use `InferenceClient` to run chat completion with local inference servers (llama.cpp, vllm, litellm server, TGI, mlx, etc.) running on your own machine. The API should be OpenAI API-compatible. | ||
|
||
```typescript | ||
import { InferenceClient } from '@huggingface/inference'; | ||
|
||
const hf = new InferenceClient(undefined, { | ||
endpointUrl: "http://localhost:8080", | ||
}); | ||
|
||
const response = await hf.chatCompletion({ | ||
messages: [ | ||
{ | ||
role: "user", | ||
content: "What is the capital of France?", | ||
}, | ||
], | ||
}); | ||
|
||
console.log(response.choices[0].message.content); | ||
``` | ||
|
||
<Tip> | ||
|
||
Similarily to the OpenAI JS client, `InferenceClient` can be used to run Chat Completion inference with any OpenAI REST API-compatible endpoint. | ||
|
||
</Tip> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice improvement !!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
Similar to huggingface/huggingface_hub#3085.