You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Modify v1 completions_stream logic to raise most exceptions before async streaming inference response (#534)
* consolidate streaming response logic into separate inline function. call execute() synchronously and call inline function async
* iterate
* refactor: pull inference result status/empty check outside of framework conditionals to dedupe code. put logic for unsuccessful/empty results before other handling logic for readability. add some commenting and other small edits.
* formatting fixes
* improve commenting
* fix and reenable 404 unit test
* fix stream success unit test, add async test client fixture
* move _response_chunk_generator() from an inline def in execute() to a separate private method for the usecase
* fix issue with streaming tests interacting by defining a per-session event loop fixture and reconfiguring test_create_streaming_task_success as an async test
* one more unit test
* update llm-engine Completions docs with details on streaming error handling
Copy file name to clipboardExpand all lines: docs/guides/completions.md
+7-2Lines changed: 7 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -67,7 +67,11 @@ applications. When streaming, tokens will be sent as data-only
67
67
68
68
To enable token streaming, pass `stream=True` to either [Completion.create](../../api/python_client/#llmengine.completion.Completion.create) or [Completion.acreate](../../api/python_client/#llmengine.completion.Completion.acreate).
69
69
70
-
Note that errors from streaming calls are returned back to the user as plain-text messages and currently need to be handled by the client.
70
+
### Streaming Error Handling
71
+
72
+
Note: Error handling semantics are mixed for streaming calls:
73
+
- Errors that arise *before* streaming begins are returned back to the user as `HTTP` errors with the appropriate status code.
74
+
- Errors that arise *after* streaming begins within a `HTTP 200` response are returned back to the user as plain-text messages and currently need to be handled by the client.
71
75
72
76
An example of token streaming using the synchronous Completions API looks as follows:
73
77
@@ -78,6 +82,7 @@ import sys
78
82
79
83
from llmengine import Completion
80
84
85
+
# errors occurring before streaming begins will be thrown here
81
86
stream = Completion.create(
82
87
model="llama-2-7b",
83
88
prompt="Give me a 200 word summary on the current economic events in the US.",
@@ -90,7 +95,7 @@ for response in stream:
90
95
if response.output:
91
96
print(response.output.text, end="")
92
97
sys.stdout.flush()
93
-
else: # an error occurred
98
+
else: # an error occurred after streaming began
94
99
print(response.error) # print the error message out
0 commit comments