Skip to content

Commit a09c988

Browse files
authored
api: add semantic route support (#147)
Signed-off-by: bitliu <[email protected]>
1 parent f90fbb6 commit a09c988

18 files changed

+2099
-5
lines changed
Lines changed: 293 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,293 @@
1+
---
2+
apiVersion: apiextensions.k8s.io/v1
3+
kind: CustomResourceDefinition
4+
metadata:
5+
annotations:
6+
controller-gen.kubebuilder.io/version: v0.19.0
7+
name: semanticroutes.vllm.ai
8+
spec:
9+
group: vllm.ai
10+
names:
11+
kind: SemanticRoute
12+
listKind: SemanticRouteList
13+
plural: semanticroutes
14+
shortNames:
15+
- sr
16+
singular: semanticroute
17+
scope: Namespaced
18+
versions:
19+
- additionalPrinterColumns:
20+
- description: Number of routing rules
21+
jsonPath: .spec.rules
22+
name: Rules
23+
type: integer
24+
- jsonPath: .metadata.creationTimestamp
25+
name: Age
26+
type: date
27+
name: v1alpha1
28+
schema:
29+
openAPIV3Schema:
30+
description: SemanticRoute defines a semantic routing rule for LLM requests
31+
properties:
32+
apiVersion:
33+
description: |-
34+
APIVersion defines the versioned schema of this representation of an object.
35+
Servers should convert recognized schemas to the latest internal value, and
36+
may reject unrecognized values.
37+
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
38+
type: string
39+
kind:
40+
description: |-
41+
Kind is a string value representing the REST resource this object represents.
42+
Servers may infer this from the endpoint the client submits requests to.
43+
Cannot be updated.
44+
In CamelCase.
45+
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
46+
type: string
47+
metadata:
48+
type: object
49+
spec:
50+
description: SemanticRouteSpec defines the desired state of SemanticRoute
51+
properties:
52+
rules:
53+
description: Rules defines the routing rules to be applied
54+
items:
55+
description: RouteRule defines a single routing rule
56+
properties:
57+
defaultModel:
58+
description: DefaultModel defines the fallback model if no modelRefs
59+
are available
60+
properties:
61+
address:
62+
description: Address defines the endpoint address
63+
maxLength: 255
64+
minLength: 1
65+
type: string
66+
modelName:
67+
description: ModelName defines the name of the model
68+
maxLength: 100
69+
minLength: 1
70+
type: string
71+
port:
72+
description: Port defines the endpoint port
73+
format: int32
74+
maximum: 65535
75+
minimum: 1
76+
type: integer
77+
priority:
78+
description: Priority defines the priority of this model
79+
reference (higher values = higher priority)
80+
format: int32
81+
maximum: 1000
82+
minimum: 0
83+
type: integer
84+
weight:
85+
default: 100
86+
description: Weight defines the traffic weight for this
87+
model (0-100)
88+
format: int32
89+
maximum: 100
90+
minimum: 0
91+
type: integer
92+
required:
93+
- address
94+
- modelName
95+
- port
96+
type: object
97+
filters:
98+
description: Filters defines the optional filters to be applied
99+
to requests matching this rule
100+
items:
101+
description: Filter defines a filter to be applied to requests
102+
properties:
103+
config:
104+
description: Config defines the filter-specific configuration
105+
type: object
106+
x-kubernetes-preserve-unknown-fields: true
107+
enabled:
108+
default: true
109+
description: Enabled defines whether this filter is enabled
110+
type: boolean
111+
type:
112+
allOf:
113+
- enum:
114+
- PIIDetection
115+
- PromptGuard
116+
- SemanticCache
117+
- ReasoningControl
118+
- ToolSelection
119+
- enum:
120+
- PIIDetection
121+
- PromptGuard
122+
- SemanticCache
123+
- ReasoningControl
124+
description: Type defines the filter type
125+
type: string
126+
required:
127+
- type
128+
type: object
129+
maxItems: 20
130+
type: array
131+
intents:
132+
description: Intents defines the intent categories that this
133+
rule should match
134+
items:
135+
description: Intent defines an intent category for routing
136+
properties:
137+
category:
138+
description: Category defines the intent category name
139+
(e.g., "math", "computer science", "creative")
140+
maxLength: 100
141+
minLength: 1
142+
pattern: ^[a-zA-Z0-9\s\-_]+$
143+
type: string
144+
description:
145+
description: Description provides an optional description
146+
of this intent category
147+
maxLength: 500
148+
type: string
149+
threshold:
150+
default: 0.7
151+
description: Threshold defines the confidence threshold
152+
for this intent (0.0-1.0)
153+
maximum: 1
154+
minimum: 0
155+
type: number
156+
required:
157+
- category
158+
type: object
159+
maxItems: 50
160+
minItems: 1
161+
type: array
162+
modelRefs:
163+
description: ModelRefs defines the target models for this routing
164+
rule
165+
items:
166+
description: ModelRef defines a reference to a model endpoint
167+
properties:
168+
address:
169+
description: Address defines the endpoint address
170+
maxLength: 255
171+
minLength: 1
172+
type: string
173+
modelName:
174+
description: ModelName defines the name of the model
175+
maxLength: 100
176+
minLength: 1
177+
type: string
178+
port:
179+
description: Port defines the endpoint port
180+
format: int32
181+
maximum: 65535
182+
minimum: 1
183+
type: integer
184+
priority:
185+
description: Priority defines the priority of this model
186+
reference (higher values = higher priority)
187+
format: int32
188+
maximum: 1000
189+
minimum: 0
190+
type: integer
191+
weight:
192+
default: 100
193+
description: Weight defines the traffic weight for this
194+
model (0-100)
195+
format: int32
196+
maximum: 100
197+
minimum: 0
198+
type: integer
199+
required:
200+
- address
201+
- modelName
202+
- port
203+
type: object
204+
maxItems: 10
205+
minItems: 1
206+
type: array
207+
required:
208+
- intents
209+
- modelRefs
210+
type: object
211+
maxItems: 100
212+
minItems: 1
213+
type: array
214+
required:
215+
- rules
216+
type: object
217+
status:
218+
description: SemanticRouteStatus defines the observed state of SemanticRoute
219+
properties:
220+
activeRules:
221+
description: ActiveRules indicates the number of currently active
222+
routing rules
223+
format: int32
224+
type: integer
225+
conditions:
226+
description: Conditions represent the latest available observations
227+
of the SemanticRoute's current state
228+
items:
229+
description: Condition contains details for one aspect of the current
230+
state of this API Resource.
231+
properties:
232+
lastTransitionTime:
233+
description: |-
234+
lastTransitionTime is the last time the condition transitioned from one status to another.
235+
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
236+
format: date-time
237+
type: string
238+
message:
239+
description: |-
240+
message is a human readable message indicating details about the transition.
241+
This may be an empty string.
242+
maxLength: 32768
243+
type: string
244+
observedGeneration:
245+
description: |-
246+
observedGeneration represents the .metadata.generation that the condition was set based upon.
247+
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
248+
with respect to the current state of the instance.
249+
format: int64
250+
minimum: 0
251+
type: integer
252+
reason:
253+
description: |-
254+
reason contains a programmatic identifier indicating the reason for the condition's last transition.
255+
Producers of specific condition types may define expected values and meanings for this field,
256+
and whether the values are considered a guaranteed API.
257+
The value should be a CamelCase string.
258+
This field may not be empty.
259+
maxLength: 1024
260+
minLength: 1
261+
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
262+
type: string
263+
status:
264+
description: status of the condition, one of True, False, Unknown.
265+
enum:
266+
- "True"
267+
- "False"
268+
- Unknown
269+
type: string
270+
type:
271+
description: type of condition in CamelCase or in foo.example.com/CamelCase.
272+
maxLength: 316
273+
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
274+
type: string
275+
required:
276+
- lastTransitionTime
277+
- message
278+
- reason
279+
- status
280+
- type
281+
type: object
282+
type: array
283+
observedGeneration:
284+
description: ObservedGeneration reflects the generation of the most
285+
recently observed SemanticRoute
286+
format: int64
287+
type: integer
288+
type: object
289+
type: object
290+
served: true
291+
storage: true
292+
subresources:
293+
status: {}

0 commit comments

Comments
 (0)