Skip to content

Commit 21f053a

Browse files
authored
increase timeout to receive first token (#263)
Based on discussion from Slack: https://allenai.slack.com/archives/C07530Y79Q9/p1743707972179919?thread_ts=1743706445.356419&cid=C07530Y79Q9 The timeout we set for the Tulu3 405B release was a little too short for all models. This was causing Molmo to get "overloaded" errors when it shouldn't have. @codeviking said that Molmo's TTFT p99 was 10s, so we're going a little above that to play it safe.
1 parent 4eb0dd1 commit 21f053a

File tree

1 file changed

+6
-7
lines changed

1 file changed

+6
-7
lines changed

src/message/create_message_service.py

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -126,13 +126,12 @@ def upload_request_files(
126126
filename = f"{root_message_id}/{message_id}-{i}{file_extension}"
127127

128128
if file.content_type is None:
129-
file_url = storage_client.upload_content(filename=filename, content=file.stream.read(), is_anonymous=is_anonymous)
129+
file_url = storage_client.upload_content(
130+
filename=filename, content=file.stream.read(), is_anonymous=is_anonymous
131+
)
130132
else:
131133
file_url = storage_client.upload_content(
132-
filename=filename,
133-
content=file.stream.read(),
134-
content_type=file.content_type,
135-
is_anonymous=is_anonymous
134+
filename=filename, content=file.stream.read(), content_type=file.content_type, is_anonymous=is_anonymous
136135
)
137136

138137
# since we read from the file we need to rewind it so the next consumer can read it
@@ -287,7 +286,7 @@ def stream_new_message(
287286
message_id=msg.id,
288287
storage_client=storage_client,
289288
root_message_id=message_chain[0].id,
290-
is_anonymous=agent.is_anonymous_user
289+
is_anonymous=agent.is_anonymous_user,
291290
)
292291

293292
chain: list[InferenceEngineMessage] = [
@@ -386,7 +385,7 @@ def map_chunk(chunk: InferenceEngineChunk):
386385
results = pool.apply_async(lambda: next(message_generator))
387386

388387
# We handle the first chunk differently since we want to timeout if it takes longer than 5 seconds
389-
first_chunk = results.get(5.0)
388+
first_chunk = results.get(15.0)
390389
yield map_chunk(first_chunk)
391390

392391
for chunk in message_generator:

0 commit comments

Comments
 (0)