Skip to content

Tool Middleware / Interceptor API #151

@christophercolumbusdog

Description

@christophercolumbusdog

Problem

When building MCP servers that wrap external APIs (REST services, databases, GraphQL endpoints), the raw responses frequently contain far more data than an LLM needs or can effectively use. This leads to context pollution — wasted tokens on irrelevant fields, verbose nested structures, or responses large enough to overwhelm the model's context window.

Today, Helidon MCP provides no mechanism to intercept or transform tool results. The tool function (Function<McpRequest, McpToolResult>) is invoked directly and its return value is serialized as-is. Every tool author must independently implement their own truncation, field filtering, and response size management — leading to duplicated boilerplate across every tool in every server.

Concrete scenarios where this hurts

  1. Response size explosion — A tool queries an API that returns 200KB of JSON. The LLM only needs a summary. Without framework-level truncation, each tool must manually implement size checks, and if a developer forgets, a single tool call can blow the context window.

  2. Cross-cutting response concerns — Logging tool inputs/outputs, adding timing metadata, enforcing response size limits, stripping internal fields — these are concerns shared by all tools on a server, yet each tool must implement them individually.

  3. Schema augmentation — A common pattern is adding a max_results or max_chars parameter to every tool so the LLM can control response verbosity. Today this must be manually added to every @Mcp.Tool method signature, and the truncation logic manually applied in every method body.

  4. Response normalization — Different upstream APIs return data in different shapes. A middleware layer could normalize responses into consistent, LLM-friendly formats without polluting tool business logic.

Proposed Solution

1. McpToolInterceptor interface

A composable interceptor that can wrap tool execution:

@FunctionalInterface
public interface McpToolInterceptor {
    McpToolResult intercept(McpRequest request, McpTool tool, McpToolChain chain);
}

public interface McpToolChain {
    McpToolResult proceed(McpRequest request);
}

This follows the classic chain-of-responsibility pattern (similar to servlet filters, gRPC interceptors, and Helidon's own Handler chain in WebServer). Interceptors can:

  • Modify the request before execution (pre-processing)
  • Modify the result after execution (post-processing)
  • Short-circuit execution entirely (e.g., caching, validation)
  • Wrap execution with cross-cutting concerns (timing, logging, error handling)

2. Registration via McpServerConfig.Builder

McpServerConfig.builder()
    .addInterceptor(new ResponseTruncationInterceptor(50_000))
    .addInterceptor(new TimingMetadataInterceptor())
    .addTool(myTool)
    .build();

Interceptors execute in registration order, wrapping the actual tool invocation.

3. Declarative support via @Mcp.Interceptor

For the annotation-driven (declarative) API, interceptors should be discoverable and composable:

@Mcp.Server("my-server")
@Mcp.Interceptor(ResponseTruncationInterceptor.class)
@Mcp.Interceptor(TimingMetadataInterceptor.class)
public class MyMcpServer {

    @Mcp.Tool("Search records")
    public List<McpToolContent> search(String query) {
        // Business logic only — truncation handled by interceptor
        return McpToolContents.textContent(apiClient.search(query));
    }
}

Ideally, @Mcp.Interceptor could also be placed on individual @Mcp.Tool methods for per-tool interceptors, with server-level interceptors applying to all tools.

4. Schema augmentation hook

Interceptors that need to inject additional parameters into a tool's JSON schema should be able to do so:

public interface McpToolInterceptor {
    McpToolResult intercept(McpRequest request, McpTool tool, McpToolChain chain);

    /** Override to augment the tool's input schema. */
    default String augmentSchema(String schema, McpTool tool) {
        return schema;
    }
}

This enables patterns like: an interceptor adds max_chars (with a default of 50,000) to every tool's schema, then in its intercept() method reads that parameter from the request and truncates the result accordingly — all without the tool author writing a single line of truncation code.

Example: Response Truncation Interceptor

To illustrate the value, here's what a truncation interceptor would look like:

public class ResponseTruncationInterceptor implements McpToolInterceptor {
    private static final String MAX_CHARS_PARAM = "max_chars";
    private static final int DEFAULT_MAX_CHARS = 50_000;

    @Override
    public McpToolResult intercept(McpRequest request, McpTool tool, McpToolChain chain) {
        McpToolResult result = chain.proceed(request);
        int maxChars = request.parameters()
                .get(MAX_CHARS_PARAM).asInt().orElse(DEFAULT_MAX_CHARS);
        return truncateIfNeeded(result, maxChars);
    }

    @Override
    public String augmentSchema(String schema, McpTool tool) {
        // Inject max_chars parameter into the tool's JSON schema
        return SchemaUtils.addIntegerProperty(schema, MAX_CHARS_PARAM,
                "Maximum characters in response before truncation", DEFAULT_MAX_CHARS);
    }
}

Alternatives Considered

Manual wrapping in each tool method

This is what we do today. It works but violates DRY — every tool repeats the same truncation/logging/timing boilerplate. It's error-prone (easy to forget) and couples cross-cutting concerns to business logic.

Helidon WebServer filters at the HTTP level

Helidon's HTTP filters operate on the raw HTTP request/response, not on the typed MCP domain objects (McpRequest, McpToolResult). They can't inspect or modify tool parameters, can't augment JSON schemas, and can't distinguish between different MCP methods (tool call vs. prompt get vs. resource read). The interception needs to happen at the MCP tool execution layer, not the HTTP transport layer.

Wrapping Function<McpRequest, McpToolResult> manually

One could wrap each tool's function with a decorator before passing it to the builder. This works for the programmatic API but doesn't help with the declarative (@Mcp.Tool) API at all, since the codegen generates the function directly. It also requires each server author to build their own composition mechanism.

Impact

This would make Helidon MCP significantly more practical for production MCP servers that wrap real APIs. Context pollution is one of the top operational challenges in MCP server development — LLM performance degrades sharply when tool responses are too large or too noisy. A framework-level solution benefits the entire ecosystem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions