Skip to content

Inference Extension: Documentation #3844

@sjberman

Description

@sjberman

As a user, I want to know how to use NGF with the inference extension, so I can route traffic intelligently to my AI workloads in Kubernetes.

Acceptance Criteria:

  • Add a user guide on how to route traffic to AI workloads using NGF
  • Should cover how to install the Gateway API Inference Extension CRDs, and deploy NGF with the feature flag enabled
  • Should cover how to deploy an InferencePool and EPP, and how to configure an HTTPRoute to reference the InferencePool
  • Explain how to secure traffic between the NGINX pod and the EPP using cert-manager (mentioning that by default, we create self-signed certs)
  • Link to Gateway API inference extension docs where it makes sense (for example, these docs may better describe the InferenceObjective CRD and how a user should handle those)

Additional acceptance criteria update:

  • Legal text specifying NGF is not responsible for any threats or risks associated with using a third-party EPP.
  • Document insecure gRPC connection shortfall, and how the gateway inference is in Alpha and should not be used in production environments.

Metadata

Metadata

Assignees

Labels

area/inference-extensionRelated to the Gateway API Inference ExtensiondocumentationImprovements or additions to documentationrefinedRequirements are refined and the issue is ready to be implemented.size/mediumEstimated to be completed within a week

Type

No type

Projects

Status

🏗 In Progress

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions