diff --git a/docs/inference-providers/_toctree.yml b/docs/inference-providers/_toctree.yml index e1a2c9591..7b9093114 100644 --- a/docs/inference-providers/_toctree.yml +++ b/docs/inference-providers/_toctree.yml @@ -11,6 +11,13 @@ - local: security title: Security +- title: Guides + sections: + - local: guides/first-api-call + title: Your First API Call + - local: guides/building-first-app + title: Building Your First AI App + - title: Providers sections: - local: providers/cerebras diff --git a/docs/inference-providers/guides/building-first-app.md b/docs/inference-providers/guides/building-first-app.md new file mode 100644 index 000000000..b69d567ee --- /dev/null +++ b/docs/inference-providers/guides/building-first-app.md @@ -0,0 +1,581 @@ +# Building Your First AI App with Inference Providers + +You've learned the basics and understand the provider ecosystem. Now let's build something practical: an **AI Meeting Notes** app that transcribes audio files and generates summaries with action items. + +This project demonstrates real-world AI orchestration using multiple specialized providers within a single application. + +## Project Overview + +Our app will: +1. **Accept audio** as a microphone input through a web interface +2. **Transcribe speech** using a fast speech-to-text model +3. **Generate summaries** using a powerful language model +4. **Deploy to the web** for easy sharing + + + + +**Tech Stack**: Gradio (for the UI) + Inference Providers (for the AI) + + + + +**Tech Stack**: HTML/JavaScript (for the UI) + Inference Providers (for the AI) + +We'll use HTML and JavaScript for the UI just to keep things simple and agnostic, but if you want to see more mature examples, you can check out the [Hugging Face JS spaces](https://huggingface.co/huggingfacejs/spaces) page. + + + + +## Step 1: Set Up Authentication + + + + +Before we start coding, authenticate with Hugging Face using the CLI: + +```bash +pip install huggingface_hub +huggingface-cli login +``` + +When prompted, paste your Hugging Face token. This handles authentication automatically for all your inference calls. You can generate one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained). + + + + +You'll need your Hugging Face token. Get one from [your settings page](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained). We can set it as an environment variable in our app. + +```bash +export HF_TOKEN="your_token_here" +``` + +```javascript +// Add your token at the top of your script +const HF_TOKEN = process.env.HF_TOKEN; +``` + + + +When we deploy our app to Hugging Face Spaces, we'll need to add our token as a secret. This is a secure way to handle the token and avoid exposing it in the code. + + + + + + +## Step 2: Build the User Interface + + + + +Now let's create a simple web interface using Gradio: + +```python +import gradio as gr +from huggingface_hub import InferenceClient + +def process_meeting_audio(audio_file): + """Process uploaded audio file and return transcript + summary""" + if audio_file is None: + return "Please upload an audio file.", "" + + # We'll implement the AI logic next + return "Transcript will appear here...", "Summary will appear here..." + +# Create the Gradio interface +app = gr.Interface( + fn=process_meeting_audio, + inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"), + outputs=[ + gr.Textbox(label="Transcript", lines=10), + gr.Textbox(label="Summary & Action Items", lines=8) + ], + title="🎤 AI Meeting Notes", + description="Upload an audio file to get an instant transcript and summary with action items." +) + +if __name__ == "__main__": + app.launch() +``` + +Here we're using Gradio's `gr.Audio` component to either upload an audio file or use the microphone input. We're keeping things simple with two outputs: a transcript and a summary with action items. + + + + +For JavaScript, we'll create a clean HTML interface with native file upload and a simple loading state: + +```html + +

🎤 AI Meeting Notes

+ +
+ +

Upload audio file

+ +
+ +
Processing...
+ +
+
+

📝 Transcript

+
+
+
+

📋 Summary

+
+
+
+ +``` + +This creates a clean drag-and-drop interface with styled results sections for the transcript and summary. + +Our application can then use the `InferenceClient` from `huggingface.js` to call the transcription and summarization functions. + +```javascript +import { InferenceClient } from 'https://esm.sh/@huggingface/inference'; + +// Access the token from Hugging Face Spaces secrets +const HF_TOKEN = window.huggingface?.variables?.HF_TOKEN; +// Or if you're running locally, you can set it as an environment variable +// const HF_TOKEN = process.env.HF_TOKEN; + +document.getElementById('file').onchange = async (e) => { + if (!e.target.files[0]) return; + + const file = e.target.files[0]; + + show(document.getElementById('loading')); + hide(document.getElementById('results'), document.getElementById('error')); + + try { + const transcript = await transcribe(file); + const summary = await summarize(transcript); + + document.getElementById('transcript').textContent = transcript; + document.getElementById('summary').textContent = summary; + + hide(document.getElementById('loading')); + show(document.getElementById('results')); + } catch (error) { + hide(document.getElementById('loading')); + showError(`Error: ${error.message}`); + } +}; +``` + +We'll also need to implement the `transcribe` and `summarize` functions. + +
+
+ +## Step 3: Add Speech Transcription + + + + +Now let's implement the transcription using OpenAI's `whisper-large-v3` model for fast, reliable speech processing. + +We'll use the `auto` provider to automatically select the first available provider for the model. You can define your own priority list of providers in the [Inference Providers](https://huggingface.co/settings/inference-providers) page. + + +```python +def transcribe_audio(audio_file_path): + """Transcribe audio using fal.ai for speed""" + client = InferenceClient(provider="auto") + + # Pass the file path directly - the client handles file reading + transcript = client.automatic_speech_recognition( + audio=audio_file_path, + model="openai/whisper-large-v3" + ) + + return transcript.text +``` + + + + + +Now let's implement the transcription using OpenAI's `whisper-large-v3` model for fast, reliable speech processing. + +We'll use the `auto` provider to automatically select the first available provider for the model. You can define your own priority list of providers in the [Inference Providers](https://huggingface.co/settings/inference-providers) page. + + +```javascript +import { InferenceClient } from 'https://esm.sh/@huggingface/inference'; + +async function transcribe(file) { + const client = new InferenceClient(HF_TOKEN); + + const output = await client.automaticSpeechRecognition({ + data: file, + model: "openai/whisper-large-v3-turbo", + provider: "auto" + }); + + return output.text || output || 'Transcription completed'; +} +``` + + + + + +## Step 4: Add AI Summarization + + + + +Next, we'll use a powerful language model like `deepseek-ai/DeepSeek-R1-0528` from DeepSeek via an Inference Provider, and just like in the previous step, we'll use the `auto` provider to automatically select the first available provider for the model. +We will define a custom prompt to ensure the output is formatted as a summary with action items and decisions made: + +```python +def generate_summary(transcript): + """Generate summary using an Inference Provider""" + client = InferenceClient(provider="auto") + + prompt = f""" + Analyze this meeting transcript and provide: + 1. A concise summary of key points + 2. Action items with responsible parties + 3. Important decisions made + + Transcript: {transcript} + + Format with clear sections: + ## Summary + ## Action Items + ## Decisions Made + """ + + response = client.chat.completions.create( + model="deepseek-ai/DeepSeek-R1-0528", + messages=[{"role": "user", "content": prompt}], + max_tokens=1000 + ) + + return response.choices[0].message.content +``` + + + + + +Next, we'll use a powerful language model like `deepseek-ai/DeepSeek-R1-0528` from DeepSeek via an Inference Provider, and just like in the previous step, we'll use the `auto` provider to automatically select the first available provider for the model. +We will define a custom prompt to ensure the output is formatted as a summary with action items and decisions made: + +```javascript +async function summarize(transcript) { + const client = new InferenceClient(HF_TOKEN); + + const prompt = `Analyze this meeting transcript and provide: + 1. A concise summary of key points + 2. Action items with responsible parties + 3. Important decisions made + + Transcript: ${transcript} + + Format with clear sections: + ## Summary + ## Action Items + ## Decisions Made`; + + const response = await client.chatCompletion({ + model: "deepseek-ai/DeepSeek-R1-0528", + messages: [ + { + role: "user", + content: prompt + } + ], + max_tokens: 1000 + }, { + provider: "auto" + }); + + return response.choices?.[0]?.message?.content || response || 'No summary available'; +} +``` + + + + + +## Step 5: Deploy on Hugging Face Spaces + + + + +To deploy, we'll need to create an `app.py` file and upload it to Hugging Face Spaces. + +
+📋 Click to view the complete app.py file + +```python +import gradio as gr +from huggingface_hub import InferenceClient + + +def transcribe_audio(audio_file_path): + """Transcribe audio using an Inference Provider""" + client = InferenceClient(provider="auto") + + # Pass the file path directly - the client handles file reading + transcript = client.automatic_speech_recognition( + audio=audio_file_path, model="openai/whisper-large-v3" + ) + + return transcript.text + + +def generate_summary(transcript): + """Generate summary using an Inference Provider""" + client = InferenceClient(provider="auto") + + prompt = f""" + Analyze this meeting transcript and provide: + 1. A concise summary of key points + 2. Action items with responsible parties + 3. Important decisions made + + Transcript: {transcript} + + Format with clear sections: + ## Summary + ## Action Items + ## Decisions Made + """ + + response = client.chat.completions.create( + model="deepseek-ai/DeepSeek-R1-0528", + messages=[{"role": "user", "content": prompt}], + max_tokens=1000, + ) + + return response.choices[0].message.content + + +def process_meeting_audio(audio_file): + """Main processing function""" + if audio_file is None: + return "Please upload an audio file.", "" + + try: + # Step 1: Transcribe + transcript = transcribe_audio(audio_file) + + # Step 2: Summarize + summary = generate_summary(transcript) + + return transcript, summary + + except Exception as e: + return f"Error processing audio: {str(e)}", "" + + +# Create Gradio interface +app = gr.Interface( + fn=process_meeting_audio, + inputs=gr.Audio(label="Upload Meeting Audio", type="filepath"), + outputs=[ + gr.Textbox(label="Transcript", lines=10), + gr.Textbox(label="Summary & Action Items", lines=8), + ], + title="🎤 AI Meeting Notes", + description="Upload audio to get instant transcripts and summaries.", +) + +if __name__ == "__main__": + app.launch() +``` + +Our app will run on port 7860 and look like this: + +![Gradio app](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/gradio-app.png) + +
+ +To deploy, we'll need to create a new Space and upload our files. + +1. **Create a new Space**: Go to [huggingface.co/new-space](https://huggingface.co/new-space) +2. **Choose Gradio SDK** and make it public +3. **Upload your files**: Upload `app.py` +4. **Add your token**: In Space settings, add `HF_TOKEN` as a secret (get it from [your settings](https://huggingface.co/settings/tokens)) +5. **Launch**: Your app will be live at `https://huggingface.co/spaces/your-username/your-space-name` + +> **Note**: While we used CLI authentication locally, Spaces requires the token as a secret for the deployment environment. + +
+ + +For JavaScript deployment, create a simple static HTML file: + +
+📋 Click to view the complete index.html file + +```html + + + + + 🎤 AI Meeting Notes + + + +

🎤 AI Meeting Notes

+ +
+ +

Upload audio file

+ +
+ +
Processing...
+
+ +
+
+

📝 Transcript

+
+
+
+

📋 Summary

+
+
+
+ + + + +``` + +We can run our app locally by going to the file from our browser. + +![Local app](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/js-app.png) + +
+ +To deploy: + +1. **Create a new Space**: Go to [huggingface.co/new-space](https://huggingface.co/new-space) +2. **Choose Static SDK** and make it public +3. **Upload your file**: Upload `index.html` +4. **Add your token as a secret**: In Space settings, add `HF_TOKEN` as a **Secret** +5. **Launch**: Your app will be live at `https://huggingface.co/spaces/your-username/your-space-name` + +> **Note**: The token is securely managed by Hugging Face Spaces and accessed via `window.huggingface.variables.HF_TOKEN`. + +
+
+ +## Next Steps + +Congratulations! You've created a production-ready AI application that: handles real-world tasks, provides a professional interface, scales automatically, and costs efficiently. If you want to explore more providers, you can check out the [Inference Providers](https://huggingface.co/inference-providers) page. Or here are some ideas for next steps: + +- **Improve your prompt**: Try different prompts to improve the quality for your use case +- **Try different models**: Experiment with various speech and text models +- **Compare performance**: Benchmark speed vs. accuracy across providers diff --git a/docs/inference-providers/guides/first-api-call.md b/docs/inference-providers/guides/first-api-call.md new file mode 100644 index 000000000..24dd39649 --- /dev/null +++ b/docs/inference-providers/guides/first-api-call.md @@ -0,0 +1,240 @@ +# Your First Inference Provider Call + +In this guide we're going to help you make your first API call with Inference Providers. + +Many developers avoid using open source AI models because they assume deployment is complex. This guide will show you how to use a state-of-the-art model in under five minutes, with no infrastructure setup required. + +We're going to use the [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) model, which is a powerful text-to-image model. + + + +This guide assumes you have a Hugging Face account. If you don't have one, you can create one for free at [huggingface.co](https://huggingface.co). + + + +## Step 1: Find a Model on the Hub + +Visit the [Hugging Face Hub](https://huggingface.co/models?pipeline_tag=text-to-image&inference_provider=fal-ai,hf-inference,nebius,nscale,replicate,together&sort=trending) and look for models with the "Inference Providers" filter, you can select the provider that you want. We'll go with `fal`. + +![search image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/search.png) + +For this example, we'll use [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell), a powerful text-to-image model. Next, navigate to the model page and scroll down to find the inference widget on the right side. + +## Step 2: Try the Interactive Widget + +Before writing any code, try the widget directly on the [model page](https://huggingface.co/black-forest-labs/FLUX.1-dev?inference_provider=fal-ai): + +![widget image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/widget.png) + +Here, you can test the model directly in the browser from any of the available providers. You can also copy relevant code snippets to use in your own projects. + +1. Enter a prompt like "A serene mountain landscape at sunset" +2. Click **"Generate"** +3. Watch as the model creates an image in seconds + +This widget uses the same endpoint you're about to implement in code. + + + +You'll need a Hugging Face account (free at [huggingface.co](https://huggingface.co)) and remaining credits to use the model. + + + +## Step 3: From Clicks to Code + +Now let's replicate this with Python. Click the **"View Code Snippets"** button in the widget to see the [generated code snippets](https://huggingface.co/black-forest-labs/FLUX.1-dev?inference_api=true&language=python&inference_provider=auto). + +![code snippets image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/inference-providers-guides/code-snippets.png) + +You will need to populate this snippet with a valid Hugging Face User Access Token. You can find your User Access Token in your [settings page](https://huggingface.co/settings/tokens). + +Set your token as an environment variable: + +```bash +export HF_TOKEN="your_token_here" +``` + +The Python or TypeScript code snippet will use the token from the environment variable. + + + + + +Install the required package: + +```bash +pip install huggingface_hub +``` + +You can now use the code snippet to generate an image in your app. + +```python +import os +from huggingface_hub import InferenceClient + +client = InferenceClient( + provider="auto", + api_key=os.environ["HF_TOKEN"], +) + +# output is a PIL.Image object +image = client.text_to_image( + "Astronaut riding a horse", + model="black-forest-labs/FLUX.1-schnell", +) +``` + + + + + +Install the required package: + +```bash +npm install @huggingface/inference +``` + +Then, you can use the code snippet to generate an image in your app. + +```typescript +import { InferenceClient } from "@huggingface/inference"; + +const client = new InferenceClient(process.env.HF_TOKEN); + +const image = await client.textToImage({ + provider: "auto", + model: "black-forest-labs/FLUX.1-schnell", + inputs: "Astronaut riding a horse", + parameters: { num_inference_steps: 5 }, +}); +/// Use the generated image (it's a Blob) +``` + + + + + +## What Just Happened? + +Nice work! You've successfully used a production-grade AI model without any complex setup. In just a few lines of code, you: + +- Connected to a powerful text-to-image model +- Generated a custom image from text +- Saved the result locally + +The model you just used runs on professional infrastructure, handling scaling, optimization, and reliability automatically. + +## Dive Deeper: Provider Selection + +You might have noticed the `provider="auto"` parameter in the code examples above. This is a key feature of Inference Providers that gives you control over which infrastructure provider handles your request. + +`auto` is powerful because: + +1. It makes it easy to switch between providers, and to test different providers' performance for your use case. +2. It also gives a fallback mechanism in case a provider is unavailable. + +But if you want to be more specific, you can also specify a provider. Let's see how. + +### Understanding Provider Selection + +When you use `provider="auto"` (which is the default), the system automatically selects the first available provider for your chosen model based on your preference order in your [Inference Provider settings](https://hf.co/settings/inference-providers). This provides: + +- **Automatic failover**: If one provider is unavailable, the system tries the next one +- **Simplified setup**: No need to research which providers support your model +- **Optimal routing**: The system handles provider selection for you + +### Specifying a Specific Provider + +Alternatively, you can explicitly choose a provider if you have specific requirements: + + + + + +```python +import os +from huggingface_hub import InferenceClient + +client = InferenceClient(api_key=os.environ["HF_TOKEN"]) + +# Using automatic provider selection (default) +image_auto = client.text_to_image( + "Astronaut riding a horse", + model="black-forest-labs/FLUX.1-schnell", + provider="auto" # This is the default +) + +# Using a specific provider +image_fal = client.text_to_image( + "Astronaut riding a horse", + model="black-forest-labs/FLUX.1-schnell", + provider="fal-ai" # Explicitly use Fal AI +) + +# Using another specific provider +image_replicate = client.text_to_image( + "Astronaut riding a horse", + model="black-forest-labs/FLUX.1-schnell", + provider="replicate" # Explicitly use Replicate +) +``` + + + + + +```typescript +import { InferenceClient } from "@huggingface/inference"; + +const client = new InferenceClient(process.env.HF_TOKEN); + +// Using automatic provider selection (default) +const imageAuto = await client.textToImage({ + model: "black-forest-labs/FLUX.1-schnell", + inputs: "Astronaut riding a horse", + provider: "auto", // This is the default + parameters: { num_inference_steps: 5 }, +}); + +// Using a specific provider +const imageFal = await client.textToImage({ + model: "black-forest-labs/FLUX.1-schnell", + inputs: "Astronaut riding a horse", + provider: "fal-ai", // Explicitly use Fal AI + parameters: { num_inference_steps: 5 }, +}); + +// Using another specific provider +const imageReplicate = await client.textToImage({ + model: "black-forest-labs/FLUX.1-schnell", + inputs: "Astronaut riding a horse", + provider: "replicate", // Explicitly use Replicate + parameters: { num_inference_steps: 5 }, +}); +``` + + + + + +### When to Use Each Approach + +**Use `provider="auto"` when:** +- You're just getting started with Inference Providers +- You want the simplest setup and maximum reliability +- You don't have specific infrastructure requirements +- You want automatic failover if a provider is unavailable + +**Use a specific provider when:** +- You need consistent performance characteristics +- You have specific billing or cost requirements +- You want to test different providers' performance for your use case + +## Next Steps + +Now that you've seen how easy it is to use AI models, you might wonder: +- What was that "provider" system doing behind the scenes? +- How does billing work? +- What other models can you use? + +Continue to the next guide to understand the provider ecosystem and make informed choices about authentication and billing. \ No newline at end of file diff --git a/docs/inference-providers/index.md b/docs/inference-providers/index.md index c3970c044..adec7b7a9 100644 --- a/docs/inference-providers/index.md +++ b/docs/inference-providers/index.md @@ -13,22 +13,22 @@ To learn more about the launch of Inference Providers, check out our [announceme Here is the complete list of partners integrated with Inference Providers, and the supported tasks for each of them: -| Provider | Chat completion (LLM) | Chat completion (VLM) | Feature Extraction | Text to Image | Text to video | -| ---------------------------------------- | :-------------------: | :-------------------: | :----------------: | :-----------: | :-----------: | -| [Cerebras](./providers/cerebras) | ✅ | | | | | -| [Cohere](./providers/cohere) | ✅ | ✅ | | | | -| [Fal AI](./providers/fal-ai) | | | | ✅ | ✅ | -| [Featherless AI](./providers/featherless-ai) | ✅ | ✅ | | | | -| [Fireworks](./providers/fireworks-ai) | ✅ | ✅ | | | | -| [Groq](./providers/groq) | ✅ | | | | | -| [HF Inference](./providers/hf-inference) | ✅ | ✅ | ✅ | ✅ | | -| [Hyperbolic](./providers/hyperbolic) | ✅ | ✅ | | | | -| [Nebius](./providers/nebius) | ✅ | ✅ | ✅ | ✅ | | -| [Novita](./providers/novita) | ✅ | ✅ | | | ✅ | -| [Nscale](./providers/nscale) | ✅ | ✅ | | ✅ | | -| [Replicate](./providers/replicate) | | | | ✅ | ✅ | -| [SambaNova](./providers/sambanova) | ✅ | | ✅ | | | -| [Together](./providers/together) | ✅ | ✅ | | ✅ | | +| Provider | Chat completion (LLM) | Chat completion (VLM) | Feature Extraction | Text to Image | Text to video | Speech to text | +| ---------------------------------------- | :-------------------: | :-------------------: | :----------------: | :-----------: | :-----------: | :-----------: | +| [Cerebras](./providers/cerebras) | ✅ | | | | | | +| [Cohere](./providers/cohere) | ✅ | ✅ | | | | | +| [Fal AI](./providers/fal-ai) | | | | ✅ | ✅ | ✅ | +| [Featherless AI](./providers/featherless-ai) | ✅ | ✅ | | | | | +| [Fireworks](./providers/fireworks-ai) | ✅ | ✅ | | | | | +| [Groq](./providers/groq) | ✅ | | | | | | +| [HF Inference](./providers/hf-inference) | ✅ | ✅ | ✅ | ✅ | | ✅ | +| [Hyperbolic](./providers/hyperbolic) | ✅ | ✅ | | | | | +| [Nebius](./providers/nebius) | ✅ | ✅ | ✅ | ✅ | | | +| [Novita](./providers/novita) | ✅ | ✅ | | | ✅ | | +| [Nscale](./providers/nscale) | ✅ | ✅ | | ✅ | | | +| [Replicate](./providers/replicate) | | | | ✅ | ✅ | ✅ | +| [SambaNova](./providers/sambanova) | ✅ | | ✅ | | | | +| [Together](./providers/together) | ✅ | ✅ | | ✅ | | | ## Why use Inference Providers?