Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
293 changes: 293 additions & 0 deletions deploy/kubernetes/crds/vllm.ai_semanticroutes.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,293 @@
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.19.0
name: semanticroutes.vllm.ai
spec:
group: vllm.ai
names:
kind: SemanticRoute
listKind: SemanticRouteList
plural: semanticroutes
shortNames:
- sr
singular: semanticroute
scope: Namespaced
versions:
- additionalPrinterColumns:
- description: Number of routing rules
jsonPath: .spec.rules
name: Rules
type: integer
- jsonPath: .metadata.creationTimestamp
name: Age
type: date
name: v1alpha1
schema:
openAPIV3Schema:
description: SemanticRoute defines a semantic routing rule for LLM requests
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: SemanticRouteSpec defines the desired state of SemanticRoute
properties:
rules:
description: Rules defines the routing rules to be applied
items:
description: RouteRule defines a single routing rule
properties:
defaultModel:
description: DefaultModel defines the fallback model if no modelRefs
are available
properties:
address:
description: Address defines the endpoint address
maxLength: 255
minLength: 1
type: string
modelName:
description: ModelName defines the name of the model
maxLength: 100
minLength: 1
type: string
port:
description: Port defines the endpoint port
format: int32
maximum: 65535
minimum: 1
type: integer
priority:
description: Priority defines the priority of this model
reference (higher values = higher priority)
format: int32
maximum: 1000
minimum: 0
type: integer
weight:
default: 100
description: Weight defines the traffic weight for this
model (0-100)
format: int32
maximum: 100
minimum: 0
type: integer
required:
- address
- modelName
- port
type: object
filters:
description: Filters defines the optional filters to be applied
to requests matching this rule
items:
description: Filter defines a filter to be applied to requests
properties:
config:
description: Config defines the filter-specific configuration
type: object
x-kubernetes-preserve-unknown-fields: true
enabled:
default: true
description: Enabled defines whether this filter is enabled
type: boolean
type:
allOf:
- enum:
- PIIDetection
- PromptGuard
- SemanticCache
- ReasoningControl
- ToolSelection
- enum:
- PIIDetection
- PromptGuard
- SemanticCache
- ReasoningControl
description: Type defines the filter type
type: string
required:
- type
type: object
maxItems: 20
type: array
intents:
description: Intents defines the intent categories that this
rule should match
items:
description: Intent defines an intent category for routing
properties:
category:
description: Category defines the intent category name
(e.g., "math", "computer science", "creative")
maxLength: 100
minLength: 1
pattern: ^[a-zA-Z0-9\s\-_]+$
type: string
description:
description: Description provides an optional description
of this intent category
maxLength: 500
type: string
threshold:
default: 0.7
description: Threshold defines the confidence threshold
for this intent (0.0-1.0)
maximum: 1
minimum: 0
type: number
required:
- category
type: object
maxItems: 50
minItems: 1
type: array
modelRefs:
description: ModelRefs defines the target models for this routing
rule
items:
description: ModelRef defines a reference to a model endpoint
properties:
address:
description: Address defines the endpoint address
maxLength: 255
minLength: 1
type: string
modelName:
description: ModelName defines the name of the model
maxLength: 100
minLength: 1
type: string
port:
description: Port defines the endpoint port
format: int32
maximum: 65535
minimum: 1
type: integer
priority:
description: Priority defines the priority of this model
reference (higher values = higher priority)
format: int32
maximum: 1000
minimum: 0
type: integer
weight:
default: 100
description: Weight defines the traffic weight for this
model (0-100)
format: int32
maximum: 100
minimum: 0
type: integer
required:
- address
- modelName
- port
type: object
maxItems: 10
minItems: 1
type: array
required:
- intents
- modelRefs
type: object
maxItems: 100
minItems: 1
type: array
required:
- rules
type: object
status:
description: SemanticRouteStatus defines the observed state of SemanticRoute
properties:
activeRules:
description: ActiveRules indicates the number of currently active
routing rules
format: int32
type: integer
conditions:
description: Conditions represent the latest available observations
of the SemanticRoute's current state
items:
description: Condition contains details for one aspect of the current
state of this API Resource.
properties:
lastTransitionTime:
description: |-
lastTransitionTime is the last time the condition transitioned from one status to another.
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
format: date-time
type: string
message:
description: |-
message is a human readable message indicating details about the transition.
This may be an empty string.
maxLength: 32768
type: string
observedGeneration:
description: |-
observedGeneration represents the .metadata.generation that the condition was set based upon.
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
with respect to the current state of the instance.
format: int64
minimum: 0
type: integer
reason:
description: |-
reason contains a programmatic identifier indicating the reason for the condition's last transition.
Producers of specific condition types may define expected values and meanings for this field,
and whether the values are considered a guaranteed API.
The value should be a CamelCase string.
This field may not be empty.
maxLength: 1024
minLength: 1
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
type: string
status:
description: status of the condition, one of True, False, Unknown.
enum:
- "True"
- "False"
- Unknown
type: string
type:
description: type of condition in CamelCase or in foo.example.com/CamelCase.
maxLength: 316
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
type: string
required:
- lastTransitionTime
- message
- reason
- status
- type
type: object
type: array
observedGeneration:
description: ObservedGeneration reflects the generation of the most
recently observed SemanticRoute
format: int64
type: integer
type: object
type: object
served: true
storage: true
subresources:
status: {}
Loading
Loading