Skip to content

Conversation

@Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Jun 4, 2025

Follow-up PR after huggingface/huggingface.js#1514

SBrandeis added a commit to huggingface/huggingface.js that referenced this pull request Jun 4, 2025
Solve #1361.

Long awaited feature for @gary149. I did not go for the cleanest
solution but it works well and should be robust/flexible enough if we
need to fix something in the future.

## EDIT: breaking change => access token should be passed as
`opts.accessToken` now in `snippets.getInferenceSnippets`

## TODO

once merged:
- [ ] adapt in moon-landing for snippets on model page
huggingface-internal/moon-landing#13964
- [ ] adapt in doc-builder for `<inferencesnippet>` html tag (used in
hub-docs) huggingface/doc-builder#570
- [ ] hardcoded examples in hub-docs
huggingface/hub-docs#1764

## Some examples:

### JS client
```js
import { InferenceClient } from "@huggingface/inference";

const client = new InferenceClient(process.env.HF_TOKEN);

const chatCompletion = await client.chatCompletion({
    provider: "hf-inference",
    model: "meta-llama/Llama-3.1-8B-Instruct",
    messages: [
        {
            role: "user",
            content: "What is the capital of France?",
        },
    ],
});

console.log(chatCompletion.choices[0].message);
```

### Python client
```py
import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hf-inference",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ],
)

print(completion.choices[0].message)
```

### openai client
```py
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/hf-inference/models/meta-llama/Llama-3.1-8B-Instruct/v1",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ],
)

print(completion.choices[0].message)
```

### curl
```sh
curl https://router.huggingface.co/hf-inference/models/meta-llama/Llama-3.1-8B-Instruct/v1/chat/completions \
    -H "Authorization: Bearer $HF_TOKEN" \
    -H 'Content-Type: application/json' \
    -d '{
        "messages": [
            {
                "role": "user",
                "content": "What is the capital of France?"
            }
        ],
        "model": "meta-llama/Llama-3.1-8B-Instruct",
        "stream": false
    }'
```

### check out PR diff for more examples

---------

Co-authored-by: Simon Brandeis <[email protected]>
@Wauplin Wauplin marked this pull request as ready for review June 4, 2025 12:22
@Wauplin
Copy link
Contributor Author

Wauplin commented Jun 4, 2025

Let's merge now that @huggingace/inference is shipped :)

@Wauplin Wauplin merged commit 0967fa8 into main Jun 4, 2025
5 checks passed
@Wauplin Wauplin deleted the access-token-from-env-in-snippets branch June 4, 2025 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants