gradio.App is a lower-level entrypoint for Gradio that gives you direct access to the underlying FastAPI application. Use it when you need full control: add custom routes that return pages entirely in HTML/JS/CSS, define REST endpoints with Pydantic validation and dependency injection, or mix standard web routes with Gradio's backend features like GPU management, request queuing, batching, and streaming.
from gradio import App
app = App()
@app.gpu()
@app.api(name="generate", concurrency_limit=2)
async def generate(prompt: str):
for token in model.generate(prompt):
yield token # streams via SSE
@app.get("/")
async def root(prompt: str):
tokens = [token async for token in generate(prompt)]
return {"message": tokens}
app.launch()Since App extends FastAPI, everything you know works: @app.get(), @app.post(), path params, query params, Depends(), APIRouter, Pydantic models, /docs, /openapi.json.
Serve full HTML pages alongside your Gradio backend:
from gradio import App
from fastapi.responses import HTMLResponse
app = App()
@app.get("/", response_class=HTMLResponse)
async def homepage():
return """
<html>
<head><script src="/static/app.js"></script></head>
<body>
<h1>My App</h1>
<div id="root"></div>
</body>
</html>
"""
@app.gpu()
@app.api(name="predict", concurrency_limit=2)
def predict(text: str):
return model(text)
app.launch()Your HTML/JS frontend can call the generated /api/predict endpoint directly, giving you full control over the UI while leveraging Gradio's backend for GPU management and queuing.
@app.gpu(batch_size=8, batch_timeout=0.05)
@app.api(name="predict")
def predict(images: list[bytes]) -> list[str]:
return model(images) # called once with up to 8 inputs@app.gpu(device=0)
@app.api(name="model_a")
def model_a(text: str):
...
@app.gpu(device=1)
@app.api(name="model_b")
def model_b(text: str):
...All decorated functions go through the queue with position tracking and ETA:
@app.gpu()
@app.api(name="predict", concurrency_limit=2)
def predict(text: str):
return model(text)Clients interact with the queue via two endpoints:
# 1. Join the queue
POST /queue/join {"endpoint": "predict", "data": {"text": "hello"}}
# Returns: {"event_id": "abc123"}
# 2. Listen for updates via SSE
GET /queue/data?event_id=abc123
# Stream of events:
# {"msg": "estimation", "rank": 2, "queue_size": 5, "rank_eta": 3.4}
# {"msg": "process_starts", "eta": 1.2}
# {"msg": "process_completed", "output": {"data": "result"}, "success": true}
The direct endpoint (POST /api/predict) still works for non-queued access.
Mount a Gradio app alongside your custom routes:
import gradio as gr
from gradio import App
app = App()
@app.gpu()
@app.api(name="predict", concurrency_limit=2)
def predict(text: str):
return model(text)
demo = gr.Interface(predict, "text", "text")
gr.mount_gradio_app(app, demo, path="/demo")
app.launch()Mark functions as MCP tools with @app.mcp() so AI agents and LLM clients can discover and call them:
from gradio import App
app = App()
@app.gpu()
@app.mcp(name="generate")
async def generate(prompt: str):
result = model.generate(prompt)
return result
@app.gpu()
@app.mcp(name="summarize")
def summarize(text: str) -> str:
return model.summarize(text)
app.launch()Each @app.mcp() function becomes an MCP tool. The function name, parameters, and type hints are used to generate the tool schema automatically.