kubernetes-sigs · shaneutt · Sep 26, 2025 · Oct 6, 2025 · Oct 6, 2025 · rikatz
diff --git a/proposals/7-payload-processing.md b/proposals/7-payload-processing.md
@@ -0,0 +1,113 @@
+# Payload Processing
+
+* Authors: @shaneutt, @kflynn
+
+# What?
+
+Define standards for declaratively adding processing steps to HTTP requests and
+responses in Kubernetes across the entire payload, including the body.
+
+# Why?
+
+Modern workloads require the ability to process the full payload of an HTTP
+request and response, including both header and body:
+
+* **AI Inference Security**: Guard against bad prompts for inference requests,
+  or misaligned responses.
+* **AI Inference Optimization**: Route requests based on semantics. Enable
+  caching based on semantic similarity to reduce inference costs and enable
+  faster response times for common requests. Enable RAG systems to supplement
+  inference requests with additional context to get better results.
+* **Web Application Security**: Enforce signature-based detection rules, anomaly
+  detection systems, scan uploads, call external auth with payload data, etc.
+
+Payload processing can also encompass various use cases outside of AI, such as
+external authorization or rate limiting. Despite these use cases, though,
+payload processing is not standardized in Kubernetes today.
+
+## Definitions
+
+* **Payload Processors**: Features capable of processing the full payload of
+  requests and/or responses (including headers and body). Payload processors
+  may be implemented natively or as extensions. Many existing API gateways
+  (including Envoy and NGINX) include filter mechanisms which fit this
+  definition, but we are not limiting discussion to only these existing
+  mechanisms.
+
+## User Stories
+
+* As a developer of an application that performs AI inference as part of its
+  function:
+
+   * I want routing decisions for inference requests to be able to be
+     dynamically adapted based on the content of each request, targeting the
+     most suitable models to improve the quality of inference results that my
+     application receives.
+
+   * I want declarative configuration of failure modes for processing steps
+     (fail-open, fail-closed, fallback, etc) to ensure safe and efficient
+     runtime behavior of my application.
+
+   * I want predictable ordering of all payload processing steps to ensure
+     safe and consistent runtime behavior.
+
+* As a security engineer, I want to be able to add a detection engine which
+  scans requests to identify malicious or anomalous request payloads and
+  block, sanitize, and/or report them before they reach backends.
+
+* As a cluster admin, I want to be able to add semantic caching to inference
+  requests in order to detect repeated requests and return cached results,
+  reducing overall inference costs and improving latency for common requests.
+
+* As a compliance officer:
+
+   * I want to be able to add processors that examine inference **requests**
+     for personally identifiable information (PII) so that any PII can result
+     in the request being blocked, sanitized, or reported before sending it to
+     the inference backend.
+
+   * I want to be able to add processors that examine inference **responses**
+     for malicious or misaligned results so that any such results can be
+     dropped, sanitized, or reported before the response is sent to the
+     requester.
+
+## Goals
+
+* Ensure that declarative APIs, standards, and guidance on best practices
+  exist for adding Payload Processors to HTTP requests and responses on
+  Kubernetes.
+* Ensure that there is adequate documentation for developers to be able to
+  easily build implementations of Payload Processors according to the
+  standards.
+* Support composability, pluggability, and ordered processing of Payload
+  Processors.
+* Ensure the APIs can provide clear and easily observable defaulting behavior.
+* Ensure the APIs can provide clear and obvious runtime behavior.
+* Provide failure mode options for Payload Processors.
+
+## Non-Goals
+
+* Requiring every request or response to be processed by a payload processor.
+  The mechanisms described in this proposal are intended to be optional
+  extensions.
+
+# How?
+
+TODO in a later PR.
+
+> This should be left blank until the "What?" and "Why?" are agreed upon,
+> as defining "How?" the goals are accomplished is not important unless we can
+> first even agree on what the problem is, and why we want to solve it.
+>
+> This section is fairly freeform, because (again) these proposals will
+> eventually find there way into any number of different final proposal formats
+> in other projects. However, the general guidance is to break things down into
+> highly focused sections as much as possible to help make things easier to
+> read and review. Long, unbroken walls of code and YAML in this document are
+> not advisable as that may increase the time it takes to review.
+
+# Relevant Links
+
+* [Original Slack Discussion](https://kubernetes.slack.com/archives/C09EJTE0LV9/p1757621006832049)
+* [Document: Extended Body-Based Routing (BBR) in Gateway API Inference Extension](https://docs.google.com/document/d/1So9uRjZrLUHf7Rjv13xy_ip3_5HSI1cn1stS3EsXLWg)
+