You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tests/fault_tolerance/deploy/README.md
+13-1Lines changed: 13 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -119,6 +119,17 @@ The following failure types are defined in `scenarios.py`:
119
119
|`sglang_prefill_scheduler`| Terminate SGLang prefill scheduler process. |`SIGKILL` to `sglang::scheduler`| sglang only |
120
120
|`sglang_prefill_detokenizer`| Terminate SGLang prefill detokenizer process. |`SIGKILL` to `sglang::detokenizer`| sglang only |
121
121
122
+
#### Token Overflow Tests
123
+
124
+
In addition to process and pod failures, the suite includes tests for **token overflow**, where the model receives an input prompt larger than its configured `max_seq_len`. These tests are crucial for verifying that the system can gracefully reject invalid requests without crashing.
125
+
126
+
-**Failure Injection**: Unlike other tests, this failure is injected from the **client side**. The `aiperf` client is configured to send a batch of requests with oversized token lengths.
127
+
-**Two-Phase Execution**: These tests run in two distinct phases, creating separate log directories for each:
128
+
1.**`overflow` Phase**: Sends oversized requests. The expected outcome is a high rate of failed requests (rejections) as the server correctly identifies and blocks them.
129
+
2.**`recovery` Phase**: Immediately after the overflow phase, sends valid, normal-sized requests. The expected outcome is a high success rate, confirming that the system has recovered and remains operational.
130
+
131
+
The combined results of these two phases demonstrate both the system's ability to reject invalid inputs and its stability after handling them.
0 commit comments