diff --git a/docs/source/en/guides/inference.md b/docs/source/en/guides/inference.md index b1af63fa0c..4e145e4a1a 100644 --- a/docs/source/en/guides/inference.md +++ b/docs/source/en/guides/inference.md @@ -248,36 +248,36 @@ You might wonder why using [`InferenceClient`] instead of OpenAI's client? There [`InferenceClient`]'s goal is to provide the easiest interface to run inference on Hugging Face models, on any provider. It has a simple API that supports the most common tasks. Here is a table showing which providers support which tasks: -| Domain | Task | Black Forest Labs | Cohere | fal-ai | Fireworks AI | HF Inference | Hyperbolic | Nebius AI Studio | Novita AI | Replicate | Sambanova | Together | -| ------------------- | --------------------------------------------------- | ----------------- | ------ | ------ | ------------ | ------------ | ---------- | ---------------- | --------- | --------- | --------- | -------- | -| **Audio** | [`~InferenceClient.audio_classification`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.audio_to_audio`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.automatic_speech_recognition`] | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.text_to_speech`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | -| **Computer Vision** | [`~InferenceClient.image_classification`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.image_segmentation`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.image_to_image`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.image_to_text`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.object_detection`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.text_to_image`] | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | -| | [`~InferenceClient.text_to_video`] | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | -| | [`~InferenceClient.zero_shot_image_classification`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| **Multimodal** | [`~InferenceClient.document_question_answering`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.visual_question_answering`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| **NLP** | [`~InferenceClient.chat_completion`] | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | -| | [`~InferenceClient.feature_extraction`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.fill_mask`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.question_answering`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.sentence_similarity`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.summarization`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.table_question_answering`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.text_classification`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.text_generation`] | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | -| | [`~InferenceClient.token_classification`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.translation`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.zero_shot_classification`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| **Tabular** | [`~InferenceClient.tabular_classification`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | -| | [`~InferenceClient.tabular_regression`] | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| Domain | Task | Black Forest Labs | Cerebras | Cohere | fal-ai | Fireworks AI | HF Inference | Hyperbolic | Nebius AI Studio | Novita AI | Replicate | Sambanova | Together | +| ------------------- | --------------------------------------------------- | ----------------- | -------- | ------ | ------ | ------------ | ------------ | ---------- | ---------------- | --------- | --------- | --------- | -------- | +| **Audio** | [`~InferenceClient.audio_classification`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.audio_to_audio`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.automatic_speech_recognition`] | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.text_to_speech`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | +| **Computer Vision** | [`~InferenceClient.image_classification`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.image_segmentation`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.image_to_image`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.image_to_text`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.object_detection`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.text_to_image`] | ✅ | ❌ | ❌ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | +| | [`~InferenceClient.text_to_video`] | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | +| | [`~InferenceClient.zero_shot_image_classification`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| **Multimodal** | [`~InferenceClient.document_question_answering`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.visual_question_answering`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| **NLP** | [`~InferenceClient.chat_completion`] | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | +| | [`~InferenceClient.feature_extraction`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.fill_mask`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.question_answering`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.sentence_similarity`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.summarization`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.table_question_answering`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.text_classification`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.text_generation`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | +| | [`~InferenceClient.token_classification`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.translation`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.zero_shot_classification`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| **Tabular** | [`~InferenceClient.tabular_classification`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | +| | [`~InferenceClient.tabular_regression`] | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | diff --git a/src/huggingface_hub/inference/_client.py b/src/huggingface_hub/inference/_client.py index 90ab19c7f0..1ff056952d 100644 --- a/src/huggingface_hub/inference/_client.py +++ b/src/huggingface_hub/inference/_client.py @@ -133,7 +133,7 @@ class InferenceClient: path will be appended to the base URL (see the [TGI Messages API](https://huggingface.co/docs/text-generation-inference/en/messages_api) documentation for details). When passing a URL as `model`, the client will not append any suffix path to it. provider (`str`, *optional*): - Name of the provider to use for inference. Can be `"black-forest-labs"`, `"cohere"`, `"fal-ai"`, `"fireworks-ai"`, `"hf-inference"`, `"hyperbolic"`, `"nebius"`, `"novita"`, `"replicate"`, "sambanova"` or `"together"`. + Name of the provider to use for inference. Can be `"black-forest-labs"`, `"cerebras"`, `"cohere"`, `"fal-ai"`, `"fireworks-ai"`, `"hf-inference"`, `"hyperbolic"`, `"nebius"`, `"novita"`, `"replicate"`, "sambanova"` or `"together"`. defaults to hf-inference (Hugging Face Serverless Inference API). If model is a URL or `base_url` is passed, then `provider` is not used. token (`str`, *optional*): diff --git a/src/huggingface_hub/inference/_generated/_async_client.py b/src/huggingface_hub/inference/_generated/_async_client.py index 1625b0749f..3f05d130e8 100644 --- a/src/huggingface_hub/inference/_generated/_async_client.py +++ b/src/huggingface_hub/inference/_generated/_async_client.py @@ -121,7 +121,7 @@ class AsyncInferenceClient: path will be appended to the base URL (see the [TGI Messages API](https://huggingface.co/docs/text-generation-inference/en/messages_api) documentation for details). When passing a URL as `model`, the client will not append any suffix path to it. provider (`str`, *optional*): - Name of the provider to use for inference. Can be `"black-forest-labs"`, `"cohere"`, `"fal-ai"`, `"fireworks-ai"`, `"hf-inference"`, `"hyperbolic"`, `"nebius"`, `"novita"`, `"replicate"`, "sambanova"` or `"together"`. + Name of the provider to use for inference. Can be `"black-forest-labs"`, `"cerebras"`, `"cohere"`, `"fal-ai"`, `"fireworks-ai"`, `"hf-inference"`, `"hyperbolic"`, `"nebius"`, `"novita"`, `"replicate"`, "sambanova"` or `"together"`. defaults to hf-inference (Hugging Face Serverless Inference API). If model is a URL or `base_url` is passed, then `provider` is not used. token (`str`, *optional*): diff --git a/src/huggingface_hub/inference/_providers/__init__.py b/src/huggingface_hub/inference/_providers/__init__.py index cfb1a6985d..3400312553 100644 --- a/src/huggingface_hub/inference/_providers/__init__.py +++ b/src/huggingface_hub/inference/_providers/__init__.py @@ -2,6 +2,7 @@ from ._common import TaskProviderHelper from .black_forest_labs import BlackForestLabsTextToImageTask +from .cerebras import CerebrasConversationalTask from .cohere import CohereConversationalTask from .fal_ai import ( FalAIAutomaticSpeechRecognitionTask, @@ -21,6 +22,7 @@ PROVIDER_T = Literal[ "black-forest-labs", + "cerebras", "cohere", "fal-ai", "fireworks-ai", @@ -37,6 +39,9 @@ "black-forest-labs": { "text-to-image": BlackForestLabsTextToImageTask(), }, + "cerebras": { + "conversational": CerebrasConversationalTask(), + }, "cohere": { "conversational": CohereConversationalTask(), }, diff --git a/src/huggingface_hub/inference/_providers/_common.py b/src/huggingface_hub/inference/_providers/_common.py index 58bfe5a830..a30b5cf3b9 100644 --- a/src/huggingface_hub/inference/_providers/_common.py +++ b/src/huggingface_hub/inference/_providers/_common.py @@ -17,6 +17,7 @@ # # Example: # "Qwen/Qwen2.5-Coder-32B-Instruct": "Qwen2.5-Coder-32B-Instruct", + "cerebras": {}, "cohere": {}, "fal-ai": {}, "fireworks-ai": {}, diff --git a/src/huggingface_hub/inference/_providers/cerebras.py b/src/huggingface_hub/inference/_providers/cerebras.py new file mode 100644 index 0000000000..12b1815832 --- /dev/null +++ b/src/huggingface_hub/inference/_providers/cerebras.py @@ -0,0 +1,6 @@ +from huggingface_hub.inference._providers._common import BaseConversationalTask + + +class CerebrasConversationalTask(BaseConversationalTask): + def __init__(self): + super().__init__(provider="cerebras", base_url="https://api.cerebras.ai") diff --git a/tests/cassettes/TestInferenceClient.test_chat_completion_no_stream[cerebras,conversational].yaml b/tests/cassettes/TestInferenceClient.test_chat_completion_no_stream[cerebras,conversational].yaml new file mode 100644 index 0000000000..050850a4db --- /dev/null +++ b/tests/cassettes/TestInferenceClient.test_chat_completion_no_stream[cerebras,conversational].yaml @@ -0,0 +1,140 @@ +interactions: +- request: + body: null + headers: + Accept: + - '*/*' + Accept-Encoding: + - gzip, deflate, br + Connection: + - keep-alive + X-Amzn-Trace-Id: + - b575973c-6ae8-4bb8-a0eb-5271474af638 + method: GET + uri: https://huggingface.co/api/models/meta-llama/Llama-3.3-70B-Instruct?expand=inferenceProviderMapping + response: + body: + string: '{"_id":"6745f28f9333dfcc06268b1e","id":"meta-llama/Llama-3.3-70B-Instruct","inferenceProviderMapping":{"fireworks-ai":{"status":"live","providerId":"accounts/fireworks/models/llama-v3p3-70b-instruct","task":"conversational"},"sambanova":{"status":"live","providerId":"Meta-Llama-3.3-70B-Instruct","task":"conversational"},"together":{"status":"live","providerId":"meta-llama/Llama-3.3-70B-Instruct-Turbo","task":"conversational"},"hf-inference":{"status":"live","providerId":"meta-llama/Llama-3.3-70B-Instruct","task":"conversational"},"nebius":{"status":"live","providerId":"meta-llama/Llama-3.3-70B-Instruct-fast","task":"conversational"},"novita":{"status":"live","providerId":"meta-llama/llama-3.3-70b-instruct","task":"conversational"},"hyperbolic":{"status":"live","providerId":"meta-llama/Llama-3.3-70B-Instruct","task":"conversational"},"cerebras":{"status":"live","providerId":"llama-3.3-70b","task":"conversational"}}}' + headers: + Access-Control-Allow-Origin: + - https://huggingface.co + Access-Control-Expose-Headers: + - X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range,X-Xet-Access-Token,X-Xet-Token-Expiration,X-Xet-Refresh-Route,X-Xet-Cas-Url,X-Xet-Hash + Connection: + - keep-alive + Content-Length: + - '928' + Content-Type: + - application/json; charset=utf-8 + Date: + - Mon, 10 Mar 2025 17:28:50 GMT + ETag: + - W/"3a0-pTpGiEPtnOCnGv7CMdM+ILBj9Zg" + Referrer-Policy: + - strict-origin-when-cross-origin + Vary: + - Origin + Via: + - 1.1 a462d9473c62e045cd7ca3144781eb10.cloudfront.net (CloudFront) + X-Amz-Cf-Id: + - Kazv7qUyDEkhlL6xy2ktWDuC38NCV1swSx2Hhc4StAOfxAAVG5bEFQ== + X-Amz-Cf-Pop: + - CDG52-P4 + X-Cache: + - Miss from cloudfront + X-Powered-By: + - huggingface-moon + X-Request-Id: + - Root=1-67cf2152-59357293520f047c7cf179c0;b575973c-6ae8-4bb8-a0eb-5271474af638 + cross-origin-opener-policy: + - same-origin + status: + code: 200 + message: OK +- request: + body: '{"messages": [{"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "What is deep learning?"}], "model": "llama-3.3-70b", + "stream": false}' + headers: + Accept: + - '*/*' + Accept-Encoding: + - gzip, deflate, br + Connection: + - keep-alive + Content-Length: + - '175' + Content-Type: + - application/json + X-Amzn-Trace-Id: + - fc4502a6-fe0f-45d0-b7ae-27638a6283ac + method: POST + uri: https://router.huggingface.co/cerebras/v1/chat/completions + response: + body: + string: '{"id":"chatcmpl-dd4a6858-2b21-4c71-bf21-39380aacc820","choices":[{"finish_reason":"stop","index":0,"message":{"content":"**Deep Learning: An Overview**\\n================================\\n\\nDeep learning is a subset of machine learning that involves the use of artificial neural networks to analyze and interpret data.","role":"assistant"}}],"created":1741627730,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion","usage":{"prompt_tokens":46,"completion_tokens":637,"total_tokens":683},"time_info":{"queue_time":9.163e-05,"prompt_time":0.002692986,"completion_time":0.302867283,"total_time":0.30690574645996094,"created":1741627730}}' + headers: + Access-Control-Allow-Origin: + - '*' + Access-Control-Expose-Headers: + - X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range,X-Xet-Access-Token,X-Xet-Token-Expiration,X-Xet-Refresh-Route,X-Xet-Cas-Url,X-Xet-Hash + Connection: + - keep-alive + Content-Type: + - application/json + Date: + - Mon, 10 Mar 2025 17:28:51 GMT + Transfer-Encoding: + - chunked + Vary: + - Origin + Via: + - 1.1 4de8cc07f214b77e50bb78828ddb1362.cloudfront.net (CloudFront) + X-Amz-Cf-Id: + - L04j1Fze8sxg0XZni-BqnC9Hhn26aGFTJU8Uh6FYesKJ93fI-d1hQw== + X-Amz-Cf-Pop: + - CDG55-P3 + X-Cache: + - Miss from cloudfront + X-Powered-By: + - huggingface-moon + X-Robots-Tag: + - none + cf-cache-status: + - DYNAMIC + cf-ray: + - 91e487e45df9c979-IAD + content-encoding: + - chunked + cross-origin-opener-policy: + - same-origin + referrer-policy: + - strict-origin-when-cross-origin + server: + - cloudflare + set-cookie: + - __cf_bm=Ex.P6kz3ZWxSm3_7XXpS_MfBLqtWCdLbVvZJFXuEKqc-1741627731-1.0.1.1-AXhI9crMi6CwtvwvyzS0x6gyV5A47ajIdoGLTg6_8KaDYFUhpH4t_0emcb6SQEVXbDjj26xmN82_Od5RBdP8XAoQqzpPq.QnmJxGhBQjxus; + path=/; expires=Mon, 10-Mar-25 17:58:51 GMT; domain=.api.cerebras.ai; HttpOnly; + Secure; SameSite=None + strict-transport-security: + - max-age=3600; includeSubDomains + x-content-type-options: + - nosniff + x-ratelimit-limit-requests-day: + - '216000' + x-ratelimit-limit-tokens-minute: + - '160000' + x-ratelimit-remaining-requests-day: + - '215964' + x-ratelimit-remaining-tokens-minute: + - '159468' + x-ratelimit-reset-requests-day: + - '23468.998601913452' + x-ratelimit-reset-tokens-minute: + - '8.998601913452148' + x-request-id: + - 91e487e45df9c979-IAD + status: + code: 200 + message: OK +version: 1 diff --git a/tests/cassettes/TestInferenceClient.test_chat_completion_with_stream[cerebras,conversational].yaml b/tests/cassettes/TestInferenceClient.test_chat_completion_with_stream[cerebras,conversational].yaml new file mode 100644 index 0000000000..bf9724a226 --- /dev/null +++ b/tests/cassettes/TestInferenceClient.test_chat_completion_with_stream[cerebras,conversational].yaml @@ -0,0 +1,220 @@ +interactions: +- request: + body: null + headers: + Accept: + - '*/*' + Accept-Encoding: + - gzip, deflate, br + Connection: + - keep-alive + X-Amzn-Trace-Id: + - 0fbd6ec7-0e60-42fa-9b1b-2a577ed44f23 + method: GET + uri: https://huggingface.co/api/models/meta-llama/Llama-3.3-70B-Instruct?expand=inferenceProviderMapping + response: + body: + string: '{"_id":"6745f28f9333dfcc06268b1e","id":"meta-llama/Llama-3.3-70B-Instruct","inferenceProviderMapping":{"fireworks-ai":{"status":"live","providerId":"accounts/fireworks/models/llama-v3p3-70b-instruct","task":"conversational"},"sambanova":{"status":"live","providerId":"Meta-Llama-3.3-70B-Instruct","task":"conversational"},"together":{"status":"live","providerId":"meta-llama/Llama-3.3-70B-Instruct-Turbo","task":"conversational"},"hf-inference":{"status":"live","providerId":"meta-llama/Llama-3.3-70B-Instruct","task":"conversational"},"nebius":{"status":"live","providerId":"meta-llama/Llama-3.3-70B-Instruct-fast","task":"conversational"},"novita":{"status":"live","providerId":"meta-llama/llama-3.3-70b-instruct","task":"conversational"},"hyperbolic":{"status":"live","providerId":"meta-llama/Llama-3.3-70B-Instruct","task":"conversational"},"cerebras":{"status":"live","providerId":"llama-3.3-70b","task":"conversational"}}}' + headers: + Access-Control-Allow-Origin: + - https://huggingface.co + Access-Control-Expose-Headers: + - X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range,X-Xet-Access-Token,X-Xet-Token-Expiration,X-Xet-Refresh-Route,X-Xet-Cas-Url,X-Xet-Hash + Connection: + - keep-alive + Content-Length: + - '928' + Content-Type: + - application/json; charset=utf-8 + Date: + - Mon, 10 Mar 2025 17:25:25 GMT + ETag: + - W/"3a0-pTpGiEPtnOCnGv7CMdM+ILBj9Zg" + Referrer-Policy: + - strict-origin-when-cross-origin + Vary: + - Origin + Via: + - 1.1 52804153974851170879aec22b7dcd28.cloudfront.net (CloudFront) + X-Amz-Cf-Id: + - 3ivBuK5GjgD6tci85JcHJCryWN-Pj75Dd1IFUxH6xwBU9zBdC5GK4w== + X-Amz-Cf-Pop: + - CDG52-P4 + X-Cache: + - Miss from cloudfront + X-Powered-By: + - huggingface-moon + X-Request-Id: + - Root=1-67cf2085-4ca2cff1156a1e9a614a548e;0fbd6ec7-0e60-42fa-9b1b-2a577ed44f23 + cross-origin-opener-policy: + - same-origin + status: + code: 200 + message: OK +- request: + body: '{"messages": [{"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "What is deep learning?"}], "model": "llama-3.3-70b", + "max_tokens": 20, "stream": true}' + headers: + Accept: + - '*/*' + Accept-Encoding: + - gzip, deflate, br + Connection: + - keep-alive + Content-Length: + - '192' + Content-Type: + - application/json + X-Amzn-Trace-Id: + - 11dd08f3-fa62-4be5-8165-1d307b2911c1 + method: POST + uri: https://router.huggingface.co/cerebras/v1/chat/completions + response: + body: + string: 'data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"role":"assistant"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":"Deep"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + learning"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + is"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + a"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + subset"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + of"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + machine"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + learning"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":","},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + which"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + is"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + a"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + field"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + of"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + artificial"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + intelligence"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + ("},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":"AI"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":")"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{"content":" + that"},"index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk"} + + + data: {"id":"chatcmpl-deed8a97-8f08-4f41-bdba-d6b9015b1090","choices":[{"delta":{},"finish_reason":"length","index":0}],"created":1741627526,"model":"llama-3.3-70b","system_fingerprint":"fp_5fb719f1e4","object":"chat.completion.chunk","usage":{"prompt_tokens":46,"completion_tokens":20,"total_tokens":66},"time_info":{"queue_time":0.000094581,"prompt_time":0.002329582,"completion_time":0.01709422,"total_time":0.02123236656188965,"created":1741627526}} + + + ' + headers: + Access-Control-Allow-Origin: + - '*' + Access-Control-Expose-Headers: + - X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range,X-Xet-Access-Token,X-Xet-Token-Expiration,X-Xet-Refresh-Route,X-Xet-Cas-Url,X-Xet-Hash + Connection: + - keep-alive + Content-Type: + - text/event-stream; charset=utf-8 + Date: + - Mon, 10 Mar 2025 17:25:26 GMT + Transfer-Encoding: + - chunked + Vary: + - Origin + Via: + - 1.1 1fa1875b2f656fdf295eee39e2e48938.cloudfront.net (CloudFront) + X-Amz-Cf-Id: + - ku7zGnpRLe-R76-88kUVc35GBPsLmIueitahq3tzcqsj92yTzY0RNA== + X-Amz-Cf-Pop: + - CDG55-P3 + X-Cache: + - Miss from cloudfront + X-Powered-By: + - huggingface-moon + X-Robots-Tag: + - none + cf-cache-status: + - DYNAMIC + cf-ray: + - 91e482e57f974286-EWR + cross-origin-opener-policy: + - same-origin + referrer-policy: + - strict-origin-when-cross-origin + server: + - cloudflare + set-cookie: + - __cf_bm=kRg5YuliGeK9lIRxkpA_eHZuZi.W7WemGqHmoQj9ww4-1741627526-1.0.1.1-KkiCPhrz_3ZCzIBu08GEiGPQAnysgeWnZAcpttYB4D56is1kX0_Yqa2Q1BRv2Y8RxyHHQZ5SS1zx_cOkn36iM.yvMZU0nl_.xl8yRiXL_eU; + path=/; expires=Mon, 10-Mar-25 17:55:26 GMT; domain=.api.cerebras.ai; HttpOnly; + Secure; SameSite=None + strict-transport-security: + - max-age=3600; includeSubDomains + x-content-type-options: + - nosniff + x-ratelimit-limit-requests-day: + - '216000' + x-ratelimit-limit-tokens-minute: + - '160000' + x-ratelimit-remaining-requests-day: + - '215967' + x-ratelimit-remaining-tokens-minute: + - '159327' + x-ratelimit-reset-requests-day: + - '23673.933695316315' + x-ratelimit-reset-tokens-minute: + - '33.9336953163147' + x-request-id: + - 91e482e57f974286-SJC + status: + code: 200 + message: OK +version: 1 diff --git a/tests/test_inference_client.py b/tests/test_inference_client.py index 07648ef618..da72ae8979 100644 --- a/tests/test_inference_client.py +++ b/tests/test_inference_client.py @@ -63,6 +63,9 @@ "black-forest-labs": { "text-to-image": "black-forest-labs/FLUX.1-dev", }, + "cerebras": { + "conversational": "meta-llama/Llama-3.3-70B-Instruct", + }, "together": { "conversational": "meta-llama/Meta-Llama-3-8B-Instruct", "text-generation": "meta-llama/Llama-2-70b-hf",