Skip to content

Commit 2097aeb

Browse files
committed
Refactor failure injection and update simulator error handling
Failure injection is now controlled by a dedicated 'failure-injection-rate' parameter instead of a separate 'failure' mode. Failure type constants are centralized, and error handling in the simulator is refactored to use a unified method for sending error responses. Documentation and tests are updated to reflect these changes, and the OpenAI error response format now includes an 'object' field. Signed-off-by: Sergey Marunich <[email protected]>
1 parent 09ab37b commit 2097aeb

File tree

9 files changed

+133
-111
lines changed

9 files changed

+133
-111
lines changed

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ COPY . .
2323

2424
# HuggingFace tokenizer bindings
2525
RUN mkdir -p lib
26-
RUN curl -L https://github.com/daulet/tokenizers/releases/download/v1.20.2/libtokenizers.${TARGETOS}-${TARGETARCH}.tar.gz | tar -xz -C lib
26+
RUN curl -L https://github.com/daulet/tokenizers/releases/download/v1.22.1/libtokenizers.${TARGETOS}-${TARGETARCH}.tar.gz | tar -xz -C lib
2727
RUN ranlib lib/*.a
2828

2929
# Build

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,10 +29,11 @@ In addition, it supports a subset of vLLM's Prometheus metrics. These metrics ar
2929

3030
The simulated inference has no connection with the model and LoRA adapters specified in the command line parameters or via the /v1/load_lora_adapter HTTP REST endpoint. The /v1/models endpoint returns simulated results based on those same command line parameters and those loaded via the /v1/load_lora_adapter HTTP REST endpoint.
3131

32-
The simulator supports three modes of operation:
32+
The simulator supports two modes of operation:
3333
- `echo` mode: the response contains the same text that was received in the request. For `/v1/chat/completions` the last message for the role=`user` is used.
3434
- `random` mode: the response is randomly chosen from a set of pre-defined sentences.
35-
- `failure` mode: randomly injects OpenAI API compatible error responses for testing error handling.
35+
36+
Additionally, the simulator can inject OpenAI API compatible error responses for testing error handling using the `failure-injection-rate` parameter.
3637

3738
Timing of the response is defined by the `time-to-first-token` and `inter-token-latency` parameters. In case P/D is enabled for a request, `kv-cache-transfer-latency` will be used instead of `time-to-first-token`.
3839

@@ -102,7 +103,6 @@ For more details see the <a href="https://docs.vllm.ai/en/stable/getting_started
102103
- `mode`: the simulator mode, optional, by default `random`
103104
- `echo`: returns the same text that was sent in the request
104105
- `random`: returns a sentence chosen at random from a set of pre-defined sentences
105-
- `failure`: randomly injects OpenAI API compatible error responses
106106
- `time-to-first-token`: the time to the first token (in milliseconds), optional, by default zero
107107
- `time-to-first-token-std-dev`: standard deviation for time before the first token will be returned, in milliseconds, optional, default is 0, can't be more than 30% of `time-to-first-token`, will not cause the actual time to first token to differ by more than 70% from `time-to-first-token`
108108
- `inter-token-latency`: the time to 'generate' each additional token (in milliseconds), optional, by default zero

go.sum

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
github.com/alicebob/miniredis/v2 v2.35.0 h1:QwLphYqCEAo1eu1TqPRN2jgVMPBweeQcR21jeqDCONI=
2+
github.com/alicebob/miniredis/v2 v2.35.0/go.mod h1:TcL7YfarKPGDAthEtl5NBeHZfeUQj6OXMm/+iu5cLMM=
13
github.com/andybalholm/brotli v1.1.1 h1:PR2pgnyFznKEugtsUo0xLdDop5SKXd5Qf5ysW+7XdTA=
24
github.com/andybalholm/brotli v1.1.1/go.mod h1:05ib4cKhjx3OQYUY22hTVd34Bc8upXjOLL2rKwwZBoA=
35
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
@@ -11,8 +13,6 @@ github.com/buaazp/fasthttprouter v0.1.1/go.mod h1:h/Ap5oRVLeItGKTVBb+heQPks+HdIU
1113
github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs=
1214
github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
1315
github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
14-
github.com/daulet/tokenizers v1.20.2 h1:tlq/vIOiBTKDPets3596aFvmJYLn3XI6LFKq4q9LKhQ=
15-
github.com/daulet/tokenizers v1.20.2/go.mod h1:tGnMdZthXdcWY6DGD07IygpwJqiPvG85FQUnhs/wSCs=
1616
github.com/daulet/tokenizers v1.22.1 h1:3wzAFIxfgRuqGKka8xdkeTbctDmmqOOs12GofqdorpM=
1717
github.com/daulet/tokenizers v1.22.1/go.mod h1:tGnMdZthXdcWY6DGD07IygpwJqiPvG85FQUnhs/wSCs=
1818
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
@@ -68,8 +68,6 @@ github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
6868
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
6969
github.com/kylelemons/godebug v1.1.0 h1:RPNrshWIDI6G2gRW9EHilWtl7Z6Sb1BR0xunSBf0SNc=
7070
github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=
71-
github.com/llm-d/llm-d-kv-cache-manager v0.2.0 h1:7MXFPjy3P8nZ7HbB1LWhhVLHvNTLbZglkD/ZcT7UU1k=
72-
github.com/llm-d/llm-d-kv-cache-manager v0.2.0/go.mod h1:ZTqwsnIVC6R5YuTUrYofPIUnCeZ9RvXn1UQAdxLYl1Y=
7371
github.com/llm-d/llm-d-kv-cache-manager v0.2.2-0.20250810103202-0adf0940f60a h1:PXR37HLgYYfolzWQA2uQOEiJlj3IV9YSvgaEFqCRSa8=
7472
github.com/llm-d/llm-d-kv-cache-manager v0.2.2-0.20250810103202-0adf0940f60a/go.mod h1:g2UlYKNJ4S860SAQ/QoRnytAFfnp8f1luW4IuZSMwCE=
7573
github.com/mailru/easyjson v0.7.7 h1:UGYAvKxe3sBsEDzO8ZeWOSlIQfWFlxbzLZe7hwFURr0=
@@ -147,6 +145,8 @@ github.com/xyproto/randomstring v1.0.5 h1:YtlWPoRdgMu3NZtP45drfy1GKoojuR7hmRcnhZ
147145
github.com/xyproto/randomstring v1.0.5/go.mod h1:rgmS5DeNXLivK7YprL0pY+lTuhNQW3iGxZ18UQApw/E=
148146
github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
149147
github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
148+
github.com/yuin/gopher-lua v1.1.1 h1:kYKnWBjvbNP4XLT3+bPEwAXJx262OhaHDWDVOPjL46M=
149+
github.com/yuin/gopher-lua v1.1.1/go.mod h1:GBR0iDaNXjAgGg9zfCvksxSRnQx76gclCIb7kdAd1Pw=
150150
go.uber.org/automaxprocs v1.6.0 h1:O3y2/QNTOdbF+e/dpXNNW7Rx2hZ4sTIPyybbxyNqTUs=
151151
go.uber.org/automaxprocs v1.6.0/go.mod h1:ifeIMSnPZuznNm6jmdzmU3/bfk01Fe2fotchwEFJ8r8=
152152
go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto=

pkg/common/config.go

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,14 @@ const (
3434
vLLMDefaultPort = 8000
3535
ModeRandom = "random"
3636
ModeEcho = "echo"
37-
ModeFailure = "failure"
37+
38+
// Failure type constants
39+
FailureTypeRateLimit = "rate_limit"
40+
FailureTypeInvalidAPIKey = "invalid_api_key"
41+
FailureTypeContextLength = "context_length"
42+
FailureTypeServerError = "server_error"
43+
FailureTypeInvalidRequest = "invalid_request"
44+
FailureTypeModelNotFound = "model_not_found"
3845
)
3946

4047
type Configuration struct {
@@ -221,8 +228,8 @@ func (c *Configuration) validate() error {
221228
c.ServedModelNames = []string{c.Model}
222229
}
223230

224-
if c.Mode != ModeEcho && c.Mode != ModeRandom && c.Mode != ModeFailure {
225-
return fmt.Errorf("invalid mode '%s', valid values are 'random', 'echo', and 'failure'", c.Mode)
231+
if c.Mode != ModeEcho && c.Mode != ModeRandom {
232+
return fmt.Errorf("invalid mode '%s', valid values are 'random' and 'echo'", c.Mode)
226233
}
227234
if c.Port <= 0 {
228235
return fmt.Errorf("invalid port '%d'", c.Port)
@@ -313,12 +320,12 @@ func (c *Configuration) validate() error {
313320
}
314321

315322
validFailureTypes := map[string]bool{
316-
"rate_limit": true,
317-
"invalid_api_key": true,
318-
"context_length": true,
319-
"server_error": true,
320-
"invalid_request": true,
321-
"model_not_found": true,
323+
FailureTypeRateLimit: true,
324+
FailureTypeInvalidAPIKey: true,
325+
FailureTypeContextLength: true,
326+
FailureTypeServerError: true,
327+
FailureTypeInvalidRequest: true,
328+
FailureTypeModelNotFound: true,
322329
}
323330
for _, failureType := range c.FailureTypes {
324331
if !validFailureTypes[failureType] {
@@ -353,7 +360,7 @@ func ParseCommandParamsAndLoadConfig() (*Configuration, error) {
353360
f.IntVar(&config.MaxCPULoras, "max-cpu-loras", config.MaxCPULoras, "Maximum number of LoRAs to store in CPU memory")
354361
f.IntVar(&config.MaxModelLen, "max-model-len", config.MaxModelLen, "Model's context window, maximum number of tokens in a single request including input and output")
355362

356-
f.StringVar(&config.Mode, "mode", config.Mode, "Simulator mode: echo - returns the same text that was sent in the request, for chat completion returns the last message; random - returns random sentence from a bank of pre-defined sentences; failure - randomly injects API errors")
363+
f.StringVar(&config.Mode, "mode", config.Mode, "Simulator mode: echo - returns the same text that was sent in the request, for chat completion returns the last message; random - returns random sentence from a bank of pre-defined sentences")
357364
f.IntVar(&config.InterTokenLatency, "inter-token-latency", config.InterTokenLatency, "Time to generate one token (in milliseconds)")
358365
f.IntVar(&config.TimeToFirstToken, "time-to-first-token", config.TimeToFirstToken, "Time to first token (in milliseconds)")
359366
f.IntVar(&config.KVCacheTransferLatency, "kv-cache-transfer-latency", config.KVCacheTransferLatency, "Time for KV-cache transfer from a remote vLLM (in milliseconds)")

pkg/common/failures.go renamed to pkg/llm-d-inference-sim/failures.go

Lines changed: 26 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,18 @@ See the License for the specific language governing permissions and
1414
limitations under the License.
1515
*/
1616

17-
package common
17+
package llmdinferencesim
1818

1919
import (
2020
"fmt"
21-
"math/rand"
22-
"time"
21+
22+
"github.com/llm-d/llm-d-inference-sim/pkg/common"
23+
)
24+
25+
const (
26+
// Error message templates
27+
RateLimitMessageTemplate = "Rate limit reached for %s in organization org-xxx on requests per min (RPM): Limit 3, Used 3, Requested 1."
28+
ModelNotFoundMessageTemplate = "The model '%s-nonexistent' does not exist"
2329
)
2430

2531
type FailureSpec struct {
@@ -31,42 +37,42 @@ type FailureSpec struct {
3137
}
3238

3339
var predefinedFailures = map[string]FailureSpec{
34-
"rate_limit": {
40+
common.FailureTypeRateLimit: {
3541
StatusCode: 429,
3642
ErrorType: "rate_limit_exceeded",
3743
ErrorCode: "rate_limit_exceeded",
3844
Message: "Rate limit reached for model in organization org-xxx on requests per min (RPM): Limit 3, Used 3, Requested 1.",
3945
Param: nil,
4046
},
41-
"invalid_api_key": {
47+
common.FailureTypeInvalidAPIKey: {
4248
StatusCode: 401,
4349
ErrorType: "invalid_request_error",
4450
ErrorCode: "invalid_api_key",
4551
Message: "Incorrect API key provided",
4652
Param: nil,
4753
},
48-
"context_length": {
54+
common.FailureTypeContextLength: {
4955
StatusCode: 400,
5056
ErrorType: "invalid_request_error",
5157
ErrorCode: "context_length_exceeded",
5258
Message: "This model's maximum context length is 4096 tokens. However, your messages resulted in 4500 tokens.",
5359
Param: stringPtr("messages"),
5460
},
55-
"server_error": {
61+
common.FailureTypeServerError: {
5662
StatusCode: 503,
5763
ErrorType: "server_error",
5864
ErrorCode: "server_error",
5965
Message: "The server is overloaded or not ready yet.",
6066
Param: nil,
6167
},
62-
"invalid_request": {
68+
common.FailureTypeInvalidRequest: {
6369
StatusCode: 400,
6470
ErrorType: "invalid_request_error",
6571
ErrorCode: "invalid_request_error",
6672
Message: "Invalid request: missing required parameter 'model'.",
6773
Param: stringPtr("model"),
6874
},
69-
"model_not_found": {
75+
common.FailureTypeModelNotFound: {
7076
StatusCode: 404,
7177
ErrorType: "invalid_request_error",
7278
ErrorCode: "model_not_found",
@@ -76,19 +82,16 @@ var predefinedFailures = map[string]FailureSpec{
7682
}
7783

7884
// ShouldInjectFailure determines whether to inject a failure based on configuration
79-
func ShouldInjectFailure(config *Configuration) bool {
80-
if config.Mode != ModeFailure {
85+
func ShouldInjectFailure(config *common.Configuration) bool {
86+
if config.FailureInjectionRate == 0 {
8187
return false
8288
}
8389

84-
rand.Seed(time.Now().UnixNano())
85-
return rand.Intn(100) < config.FailureInjectionRate
90+
return common.RandomInt(1, 100) <= config.FailureInjectionRate
8691
}
8792

8893
// GetRandomFailure returns a random failure from configured types or all types if none specified
89-
func GetRandomFailure(config *Configuration) FailureSpec {
90-
rand.Seed(time.Now().UnixNano())
91-
94+
func GetRandomFailure(config *common.Configuration) FailureSpec {
9295
var availableFailures []string
9396
if len(config.FailureTypes) == 0 {
9497
// Use all failure types if none specified
@@ -101,17 +104,18 @@ func GetRandomFailure(config *Configuration) FailureSpec {
101104

102105
if len(availableFailures) == 0 {
103106
// Fallback to server_error if no valid types
104-
return predefinedFailures["server_error"]
107+
return predefinedFailures[common.FailureTypeServerError]
105108
}
106109

107-
randomType := availableFailures[rand.Intn(len(availableFailures))]
110+
randomIndex := common.RandomInt(0, len(availableFailures)-1)
111+
randomType := availableFailures[randomIndex]
108112

109113
// Customize message with current model name
110114
failure := predefinedFailures[randomType]
111-
if randomType == "rate_limit" && config.Model != "" {
112-
failure.Message = fmt.Sprintf("Rate limit reached for %s in organization org-xxx on requests per min (RPM): Limit 3, Used 3, Requested 1.", config.Model)
113-
} else if randomType == "model_not_found" && config.Model != "" {
114-
failure.Message = fmt.Sprintf("The model '%s-nonexistent' does not exist", config.Model)
115+
if randomType == common.FailureTypeRateLimit && config.Model != "" {
116+
failure.Message = fmt.Sprintf(RateLimitMessageTemplate, config.Model)
117+
} else if randomType == common.FailureTypeModelNotFound && config.Model != "" {
118+
failure.Message = fmt.Sprintf(ModelNotFoundMessageTemplate, config.Model)
115119
}
116120

117121
return failure

pkg/common/failures_test.go renamed to pkg/llm-d-inference-sim/failures_test.go

Lines changed: 20 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ See the License for the specific language governing permissions and
1414
limitations under the License.
1515
*/
1616

17-
package common_test
17+
package llmdinferencesim_test
1818

1919
import (
2020
"strings"
@@ -23,33 +23,27 @@ import (
2323
. "github.com/onsi/gomega"
2424

2525
"github.com/llm-d/llm-d-inference-sim/pkg/common"
26+
llmdinferencesim "github.com/llm-d/llm-d-inference-sim/pkg/llm-d-inference-sim"
2627
)
2728

2829
var _ = Describe("Failures", func() {
2930
Describe("ShouldInjectFailure", func() {
30-
It("should not inject failure when not in failure mode", func() {
31+
It("should not inject failure when injection rate is 0", func() {
3132
config := &common.Configuration{
3233
Mode: common.ModeRandom,
33-
FailureInjectionRate: 100,
34-
}
35-
Expect(common.ShouldInjectFailure(config)).To(BeFalse())
36-
})
37-
38-
It("should not inject failure when rate is 0", func() {
39-
config := &common.Configuration{
40-
Mode: common.ModeFailure,
4134
FailureInjectionRate: 0,
4235
}
43-
Expect(common.ShouldInjectFailure(config)).To(BeFalse())
36+
Expect(llmdinferencesim.ShouldInjectFailure(config)).To(BeFalse())
4437
})
4538

46-
It("should inject failure when in failure mode with 100% rate", func() {
39+
It("should inject failure when injection rate is 100", func() {
4740
config := &common.Configuration{
48-
Mode: common.ModeFailure,
41+
Mode: common.ModeRandom,
4942
FailureInjectionRate: 100,
5043
}
51-
Expect(common.ShouldInjectFailure(config)).To(BeTrue())
44+
Expect(llmdinferencesim.ShouldInjectFailure(config)).To(BeTrue())
5245
})
46+
5347
})
5448

5549
Describe("GetRandomFailure", func() {
@@ -58,7 +52,7 @@ var _ = Describe("Failures", func() {
5852
Model: "test-model",
5953
FailureTypes: []string{},
6054
}
61-
failure := common.GetRandomFailure(config)
55+
failure := llmdinferencesim.GetRandomFailure(config)
6256
Expect(failure.StatusCode).To(BeNumerically(">=", 400))
6357
Expect(failure.Message).ToNot(BeEmpty())
6458
Expect(failure.ErrorType).ToNot(BeEmpty())
@@ -67,9 +61,9 @@ var _ = Describe("Failures", func() {
6761
It("should return rate limit failure when specified", func() {
6862
config := &common.Configuration{
6963
Model: "test-model",
70-
FailureTypes: []string{"rate_limit"},
64+
FailureTypes: []string{common.FailureTypeRateLimit},
7165
}
72-
failure := common.GetRandomFailure(config)
66+
failure := llmdinferencesim.GetRandomFailure(config)
7367
Expect(failure.StatusCode).To(Equal(429))
7468
Expect(failure.ErrorType).To(Equal("rate_limit_exceeded"))
7569
Expect(failure.ErrorCode).To(Equal("rate_limit_exceeded"))
@@ -78,9 +72,9 @@ var _ = Describe("Failures", func() {
7872

7973
It("should return invalid API key failure when specified", func() {
8074
config := &common.Configuration{
81-
FailureTypes: []string{"invalid_api_key"},
75+
FailureTypes: []string{common.FailureTypeInvalidAPIKey},
8276
}
83-
failure := common.GetRandomFailure(config)
77+
failure := llmdinferencesim.GetRandomFailure(config)
8478
Expect(failure.StatusCode).To(Equal(401))
8579
Expect(failure.ErrorType).To(Equal("invalid_request_error"))
8680
Expect(failure.ErrorCode).To(Equal("invalid_api_key"))
@@ -89,9 +83,9 @@ var _ = Describe("Failures", func() {
8983

9084
It("should return context length failure when specified", func() {
9185
config := &common.Configuration{
92-
FailureTypes: []string{"context_length"},
86+
FailureTypes: []string{common.FailureTypeContextLength},
9387
}
94-
failure := common.GetRandomFailure(config)
88+
failure := llmdinferencesim.GetRandomFailure(config)
9589
Expect(failure.StatusCode).To(Equal(400))
9690
Expect(failure.ErrorType).To(Equal("invalid_request_error"))
9791
Expect(failure.ErrorCode).To(Equal("context_length_exceeded"))
@@ -101,9 +95,9 @@ var _ = Describe("Failures", func() {
10195

10296
It("should return server error when specified", func() {
10397
config := &common.Configuration{
104-
FailureTypes: []string{"server_error"},
98+
FailureTypes: []string{common.FailureTypeServerError},
10599
}
106-
failure := common.GetRandomFailure(config)
100+
failure := llmdinferencesim.GetRandomFailure(config)
107101
Expect(failure.StatusCode).To(Equal(503))
108102
Expect(failure.ErrorType).To(Equal("server_error"))
109103
Expect(failure.ErrorCode).To(Equal("server_error"))
@@ -112,9 +106,9 @@ var _ = Describe("Failures", func() {
112106
It("should return model not found failure when specified", func() {
113107
config := &common.Configuration{
114108
Model: "test-model",
115-
FailureTypes: []string{"model_not_found"},
109+
FailureTypes: []string{common.FailureTypeModelNotFound},
116110
}
117-
failure := common.GetRandomFailure(config)
111+
failure := llmdinferencesim.GetRandomFailure(config)
118112
Expect(failure.StatusCode).To(Equal(404))
119113
Expect(failure.ErrorType).To(Equal("invalid_request_error"))
120114
Expect(failure.ErrorCode).To(Equal("model_not_found"))
@@ -126,7 +120,7 @@ var _ = Describe("Failures", func() {
126120
FailureTypes: []string{},
127121
}
128122
// This test is probabilistic since it randomly selects, but we can test structure
129-
failure := common.GetRandomFailure(config)
123+
failure := llmdinferencesim.GetRandomFailure(config)
130124
Expect(failure.StatusCode).To(BeNumerically(">=", 400))
131125
Expect(failure.ErrorType).ToNot(BeEmpty())
132126
})

0 commit comments

Comments
 (0)