sidebar-title	Creating Your First AIPerf Plugin

Creating Your First AIPerf Plugin

This tutorial walks you through creating a custom AIPerf endpoint plugin from scratch. By the end, you'll have a working plugin package that can benchmark any custom API.

**Contributing directly to AIPerf?** The endpoint class (Step 2) and manifest format (Step 3) are the same, but you can skip the external packaging: - Add your class under `src/aiperf/` instead of a separate package - Register it in the existing `src/aiperf/plugin/plugins.yaml` instead of creating a new one - Skip: Project Structure, Step 1 (pyproject.toml/entry points), Step 4 (install)

What You'll Build

We'll create a plugin for a hypothetical "Echo API" that returns the input text with some metadata. This simple example demonstrates all the core concepts you need to build more complex plugins.

Prerequisites

Python 3.10+
AIPerf installed (pip install aiperf)
Basic understanding of Python async/await and Pydantic

Key Concepts

Before diving in, understand the plugin system terminology:

Term	What It Is
Package	Your Python package that provides plugins (e.g., `my-aiperf-plugins`)
Manifest	The `plugins.yaml` file declaring your plugins
Category	A type of plugin (e.g., `endpoint`, `transport`, `timing_strategy`)
Entry	A single registered plugin within a category
Class	The Python class implementing your plugin
Metadata	Configuration describing your plugin's capabilities

What you're building:

Package (my-aiperf-plugins)
└── Manifest (plugins.yaml)
    └── Category (endpoint)
        └── Entry (echo)
            ├── Class (EchoEndpoint)
            └── Metadata (supports_streaming: true, ...)

For complete plugin system documentation, see the Plugin System Reference.

Project Structure

Create a new directory for your plugin package:

PKG=my-aiperf-plugins
SRC=$PKG/src/my_plugins

mkdir -p $SRC/endpoints $PKG/tests
touch $PKG/pyproject.toml \
      $PKG/echo_server.py \
      $SRC/__init__.py \
      $SRC/plugins.yaml \
      $SRC/endpoints/__init__.py \
      $SRC/endpoints/echo_endpoint.py \
      $PKG/tests/test_echo_endpoint.py
tree $PKG
cd $PKG

You should see:

my-aiperf-plugins/
├── echo_server.py
├── pyproject.toml
├── src/
│   └── my_plugins/
│       ├── __init__.py
│       ├── plugins.yaml
│       └── endpoints/
│           ├── __init__.py
│           └── echo_endpoint.py
└── tests/
    └── test_echo_endpoint.py

Now fill in each file in the steps below.

Step 1: Create the Project Files

pyproject.toml

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "my-aiperf-plugins"
version = "0.1.0"
description = "Custom AIPerf plugins for my use case"
requires-python = ">=3.10"
dependencies = [
    "aiperf",
]

[project.entry-points."aiperf.plugins"]
my-plugins = "my_plugins:plugins.yaml"

[tool.hatch.build.targets.wheel]
packages = ["src/my_plugins"]

The key part is the [project.entry-points."aiperf.plugins"] section - this tells AIPerf where to find your plugin manifest.

src/my_plugins/init.py

"""My custom AIPerf plugins."""

src/my_plugins/endpoints/init.py

"""Custom endpoint implementations."""

from my_plugins.endpoints.echo_endpoint import EchoEndpoint

__all__ = ["EchoEndpoint"]

Step 2: Create the Endpoint Class

src/my_plugins/endpoints/echo_endpoint.py

Your endpoint needs two methods: format_payload() and parse_response().

"""Echo endpoint for demonstration purposes."""
from __future__ import annotations
from typing import Any

from aiperf.common.models import ParsedResponse, RequestInfo, TextResponseData, InferenceServerResponse
from aiperf.endpoints.base_endpoint import BaseEndpoint


class EchoEndpoint(BaseEndpoint):
    """Echo endpoint that sends text and receives it back."""

    # ─────────────────────────────────────────────────────────────────────────
    # REQUIRED: Format outgoing request
    # ─────────────────────────────────────────────────────────────────────────
    def format_payload(self, request_info: RequestInfo) -> dict[str, Any]:
        turn = request_info.turns[-1]
        model_endpoint = request_info.model_endpoint
        texts = [content for text in turn.texts for content in text.contents if content]
        return {
            "text": texts[0] if texts else "",
            "model": turn.model or model_endpoint.primary_model_name,
            "max_tokens": turn.max_tokens,
            "stream": model_endpoint.endpoint.streaming,
        }

    # ─────────────────────────────────────────────────────────────────────────
    # REQUIRED: Parse incoming response
    # ─────────────────────────────────────────────────────────────────────────
    def parse_response(self, response: InferenceServerResponse) -> ParsedResponse | None:
        if json_obj := response.get_json():
            if text := json_obj.get("echo") or json_obj.get("text"):
                return ParsedResponse(perf_ns=response.perf_ns, data=TextResponseData(text=text))
            # Fallback: auto-detect common response formats
            if data := self.auto_detect_and_extract(json_obj):
                return ParsedResponse(perf_ns=response.perf_ns, data=data)
        if text := response.get_text():
            return ParsedResponse(perf_ns=response.perf_ns, data=TextResponseData(text=text))
        return None

What's happening: format_payload() converts AIPerf's RequestInfo into your API's format. parse_response() extracts the response text into a ParsedResponse.

Step 3: Create the Plugin Manifest

src/my_plugins/plugins.yaml

# yaml-language-server: $schema=https://raw.githubusercontent.com/ai-dynamo/aiperf/refs/heads/main/src/aiperf/plugin/schema/plugins.schema.json
schema_version: "1.0"

# Register your endpoint
# Note: Package metadata (name, version, author) comes from pyproject.toml,
# not from this file. AIPerf reads it via importlib.metadata.
endpoint:
  echo:
    class: my_plugins.endpoints.echo_endpoint:EchoEndpoint
    description: |
      Echo endpoint for testing. Sends text to an Echo API and receives it back.
      Useful for testing connectivity and basic benchmarking.
    metadata:
      endpoint_path: /echo
      supports_streaming: true
      produces_tokens: true
      tokenizes_input: true
      metrics_title: Echo Metrics

Step 4: Install Your Plugin

From your plugin directory, install into the same Python environment where AIPerf is installed. AIPerf discovers plugins via entry points, which only works when both packages share the same environment.

pip install -e .

You should see:

Successfully installed my-aiperf-plugins-0.1.0

Important: If you use uv, virtual environments, or conda, make sure you activate the environment where AIPerf is installed before running pip install.

Step 5: Verify Installation

Confirm both packages are installed in the same environment:

pip show aiperf my-aiperf-plugins

You should see both packages listed in the same environment:

Name: aiperf
Version: 0.7.0
Location: ...
Requires: ...
Required-by: my-aiperf-plugins
---
Name: my-aiperf-plugins
Version: 0.1.0
Location: ...
Requires: aiperf
Required-by:

Check that AIPerf discovers your plugin:

# List all plugins - your echo endpoint should appear
aiperf plugins endpoint

You should see your plugin in the table:

Endpoint Types
┌──────────────┬──────────────────────────────────────────────────────────────┐
│ Type         │ Description                                                  │
├──────────────┼──────────────────────────────────────────────────────────────┤
│ chat         │ OpenAI Chat Completions endpoint...                          │
│ ...          │ ...                                                          │
│ echo         │ Echo endpoint for testing. Sends text to an Echo API...      │
└──────────────┴──────────────────────────────────────────────────────────────┘

# View details about your endpoint
aiperf plugins endpoint echo

You should see:

╭──────────────────────────── endpoint:echo ─────────────────────────────╮
│ Type: echo                                                             │
│ Category: endpoint                                                     │
│ Package: my-plugins                                                    │
│ Class: my_plugins.endpoints.echo_endpoint:EchoEndpoint                 │
│                                                                        │
│ Echo endpoint for testing. Sends text to an Echo API and receives it   │
│ back. Useful for testing connectivity and basic benchmarking.          │
╰────────────────────────────────────────────────────────────────────────╯

# Validate your plugin
aiperf plugins --validate

You should see:

Validating plugins...

✓ Class paths

All checks passed

Step 6: Create a Test Server

To test your plugin end-to-end, create a minimal Echo API server. Save this as echo_server.py in your project root:

"""Minimal Echo API server for testing the EchoEndpoint plugin."""
from __future__ import annotations

import asyncio

import cyclopts
import orjson
import uvicorn
from fastapi import FastAPI
from fastapi.responses import ORJSONResponse, StreamingResponse

app = FastAPI()
cli = cyclopts.App()

@app.post("/echo", response_model=None)
async def echo(body: dict) -> ORJSONResponse | StreamingResponse:
    echo_text = f"[echo] {body.get('text', '')}"
    model = body.get("model", "echo-model")

    if not body.get("stream"):
        return ORJSONResponse({"echo": echo_text, "model": model})

    async def sse():
        for i, word in enumerate(echo_text.split()):
            chunk = orjson.dumps({"echo": word if i == 0 else f" {word}", "model": model})
            yield b"data: " + chunk + b"\n\n"
            await asyncio.sleep(0.02)
        yield b"data: [DONE]\n\n"

    return StreamingResponse(sse(), media_type="text/event-stream")


@cli.default
def main(host: str = "127.0.0.1", port: int = 8000) -> None:
    uvicorn.run(app, host=host, port=port)


if __name__ == "__main__":
    cli()

Start the server:

pip install fastapi uvicorn orjson cyclopts
python echo_server.py &

You should see:

INFO:     Started server process
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

Step 7: Use Your Plugin

With the test server running, use your endpoint with AIPerf:

# Basic usage (endpoint_path: /echo from metadata is appended automatically)
aiperf profile \
  --model echo-model \
  --url http://localhost:8000 \
  --endpoint-type echo \
  --tokenizer gpt2 \
  --synthetic-input-tokens-mean 100 \
  --request-count 10

# With custom configuration
aiperf profile \
  --model echo-model \
  --url http://localhost:8000 \
  --endpoint-type echo \
  --tokenizer gpt2 \
  --extra-inputs echo_prefix:"[ECHO] " \
  --synthetic-input-tokens-mean 100 \
  --concurrency 4 \
  --request-count 100

You should see:

                            NVIDIA AIPerf | Echo Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━┓
┃                           Metric ┃       avg ┃    min ┃    max ┃    p99 ┃  std ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━┩
│             Request Latency (ms) │      2.05 │   0.29 │  15.42 │  14.18 │ 4.47 │
│  Output Sequence Length (tokens) │    104.00 │ 104.00 │ 104.00 │ 104.00 │ 0.00 │
│   Input Sequence Length (tokens) │    100.00 │ 100.00 │ 100.00 │ 100.00 │ 0.00 │
│          Output Token Throughput │ 40,850.61 │    N/A │    N/A │    N/A │  N/A │
│                     (tokens/sec) │           │        │        │        │      │
│               Request Throughput │    392.79 │    N/A │    N/A │    N/A │  N/A │
│                   (requests/sec) │           │        │        │        │      │
│         Request Count (requests) │     10.00 │    N/A │    N/A │    N/A │  N/A │
└──────────────────────────────────┴───────────┴────────┴────────┴────────┴──────┘

Step 8: Add Tests

tests/test_echo_endpoint.py

"""Tests for the Echo endpoint."""
import pytest
from my_plugins.endpoints.echo_endpoint import EchoEndpoint


class TestEchoEndpoint:
    def test_format_payload(self, mock_model_endpoint, mock_request_info):
        endpoint = EchoEndpoint(model_endpoint=mock_model_endpoint)
        payload = endpoint.format_payload(mock_request_info)
        assert "text" in payload and "model" in payload

    def test_parse_response(self, mock_model_endpoint, mock_response):
        endpoint = EchoEndpoint(model_endpoint=mock_model_endpoint)
        result = endpoint.parse_response(mock_response)
        assert result is not None and result.data.text

Fixtures: Create conftest.py with mock_model_endpoint, mock_request_info, and mock_response fixtures. See AIPerf's test utilities for examples.

Understanding the Code

Component Summary

Component	What It Does	You Provide
`BaseEndpoint`	Logging, `auto_detect_and_extract()`, config access	Inherit from it
`format_payload()`	Converts `RequestInfo` → API request	Your API format
`parse_response()`	Converts API response → `ParsedResponse`	Your parsing logic

Data Flow

RequestInfo.turns[-1]  →  format_payload()  →  HTTP Request  →  Your API
                                                                    ↓
ParsedResponse         ←  parse_response()  ←  HTTP Response ←────┘

Response Types

Type	Use Case	Key Field
`TextResponseData`	LLM completions	`text: str`
`EmbeddingResponseData`	Embeddings	`embeddings: list[list[float]]`
`RankingsResponseData`	Reranking	`rankings: list[dict[str, Any]]`

Metadata Fields

Field	Required	Purpose
`endpoint_path`	Yes (nullable)	Default API path (e.g., `/v1/chat/completions`)
`supports_streaming`	Yes	SSE streaming support
`produces_tokens`	Yes	Enables token metrics
`tokenizes_input`	Yes	Enables input tokenization
`metrics_title`	Yes	Dashboard display name (nullable)

Next Steps

Goal	Action
Multiple endpoints	Add more entries under `endpoint:` in `plugins.yaml`
Other plugin types	Use same pattern for `timing_strategy`, `data_exporter`, `dataset_composer`
Publish	`python -m build && twine upload dist/*` to PyPI

Troubleshooting

Plugin not found

TypeNotFoundError: Type 'echo' not found for category 'endpoint'.

Solutions:

Ensure pip install -e . completed successfully
Check the entry point in pyproject.toml matches your package structure
Run aiperf plugins --validate to check for errors

Import errors

ImportError: Failed to import module for endpoint:echo from 'my_plugins.endpoints.echo_endpoint:EchoEndpoint'
Reason: ...
Tip: Check that the module is installed and importable

Solutions:

Verify the class path format: module.path:ClassName
Check all imports in your endpoint file work: python -c "from my_plugins.endpoints.echo_endpoint import EchoEndpoint"
Ensure all dependencies are installed

Response parsing fails

Solutions:

Use -vv flag to see raw responses in debug logs
Check that your parse_response handles your API's actual response format
Use auto_detect_and_extract() as a fallback for unknown formats

Reference

Plugin System Documentation - Complete plugin system reference
Template Endpoint Tutorial - Using templates for custom payloads

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating Your First AIPerf Plugin

What You'll Build

Prerequisites

Key Concepts

Project Structure

Step 1: Create the Project Files

pyproject.toml

src/my_plugins/init.py

src/my_plugins/endpoints/init.py

Step 2: Create the Endpoint Class

src/my_plugins/endpoints/echo_endpoint.py

Step 3: Create the Plugin Manifest

src/my_plugins/plugins.yaml

Step 4: Install Your Plugin

Step 5: Verify Installation

Step 6: Create a Test Server

Step 7: Use Your Plugin

Step 8: Add Tests

tests/test_echo_endpoint.py

Understanding the Code

Component Summary

Data Flow

Response Types

Metadata Fields

Next Steps

Troubleshooting

Plugin not found

Import errors

Response parsing fails

Reference

FilesExpand file tree

creating-your-first-plugin.md

Latest commit

History

creating-your-first-plugin.md

File metadata and controls

Creating Your First AIPerf Plugin

What You'll Build

Prerequisites

Key Concepts

Project Structure

Step 1: Create the Project Files

pyproject.toml

src/my_plugins/init.py

src/my_plugins/endpoints/init.py

Step 2: Create the Endpoint Class

src/my_plugins/endpoints/echo_endpoint.py

Step 3: Create the Plugin Manifest

src/my_plugins/plugins.yaml

Step 4: Install Your Plugin

Step 5: Verify Installation

Step 6: Create a Test Server

Step 7: Use Your Plugin

Step 8: Add Tests

tests/test_echo_endpoint.py

Understanding the Code

Component Summary

Data Flow

Response Types

Metadata Fields

Next Steps

Troubleshooting

Plugin not found

Import errors

Response parsing fails

Reference