-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] Refactor SSE Parsing #125959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Refactor SSE Parsing #125959
Conversation
ServerSentEvent is now a record with `event` and `data`, rather than it being a record for value with a separate `ServerSentEventField`. - `value` was renamed to `data` - `hasValue` was renamed to `hasData` - Parsing was refactored to read more like its spec: writing to a buffer and flushing when we reach a blank newline - We now support multiline data payloads
|
|
||
| public void testInfer_StreamRequest() throws Exception { | ||
| String responseJson = """ | ||
| event: message_start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I tested Anthropic, I noticed they started sending event in the payload, so I added it to the test just in case it breaks (it doesn't)
|
Pinging @elastic/ml-core (Team:ML) |
davidkyle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| var fieldStr = lineWithColon.substring(0, firstColon).toLowerCase(Locale.ROOT); | ||
|
|
||
| var value = lineWithColon.substring(firstColon + 1); | ||
| var trimmedValue = value.length() > 0 && value.charAt(0) == ' ' ? value.substring(1) : value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not String::trim() or String:: stripLeading()?
Is the idea to literally remove the first space char only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's literally remove the first space char only:
Collect the characters on the line after the first U+003A COLON character (:), and let value be that string. If value starts with a U+0020 SPACE character, remove it from value.
https://html.spec.whatwg.org/multipage/server-sent-events.html#event-stream-interpretation
Or at least I'm interpreting that as "if there are two or more spaces, only remove one space"
| var payload = """ | ||
| event: message | ||
| data: hello | ||
| data: there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| */ | ||
| public class ServerSentEventParser { | ||
| private static final Pattern END_OF_LINE_REGEX = Pattern.compile("\\n|\\r|\\r\\n"); | ||
| private static final Pattern END_OF_LINE_REGEX = Pattern.compile("\\r\\n|\\n|\\r"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the String class docs for a while and was pleased to find there is actually a method for splitting a string like this
https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/String.html#lines()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I didn't know that existed. Let's use that, it does change the logic a bit though
ServerSentEvent is now a record with `event` and `data`, rather than it being a record for value with a separate `ServerSentEventField`. - `value` was renamed to `data` - `hasValue` was renamed to `hasData` - Parsing was refactored to read more like its spec: writing to a buffer and flushing when we reach a blank newline - We now support multiline data payloads
💚 Backport successful
|
ServerSentEvent is now a record with `event` and `data`, rather than it being a record for value with a separate `ServerSentEventField`. - `value` was renamed to `data` - `hasValue` was renamed to `hasData` - Parsing was refactored to read more like its spec: writing to a buffer and flushing when we reach a blank newline - We now support multiline data payloads
ServerSentEvent is now a record with `event` and `data`, rather than it being a record for value with a separate `ServerSentEventField`. - `value` was renamed to `data` - `hasValue` was renamed to `hasData` - Parsing was refactored to read more like its spec: writing to a buffer and flushing when we reach a blank newline - We now support multiline data payloads
ServerSentEvent is now a record with
eventanddata, rather than it being a record for value with a separateServerSentEventField.valuewas renamed todatahasValuewas renamed tohasData