[WIP] Fix for TTFT bug by cnewell · Pull Request #264 · allenai/olmo-api

cnewell · 2025-04-07T16:58:25Z

Fixes https://github.com/allenai/reviz-modal/issues/52

The problem is that we sometimes get ridiculously negative times to first token on the OLMo API Dashboard, which then squashes all the actual data into a flat line on the graph. The underlying issue is that in some cases we short-circuit the streaming before getting the first token, leaving the TTFT timestamp set to its initial value of 0, which of course gives a large negative number when subtracting the overall start timestamp.

I'm debating between just not logging an event at all in these cases, versus logging a new inference.failure event, and currently calling it a success if finish_reason is either Stop (known to be good) or None, which is to say just failing on known error finish_reason. Thoughts? Anything else I may be missing here?

mtblanton

I think this makes sense as a quick fix. It makes our existing dashboards that track inference.timing work properly while still giving us good timing info with inference.failure.

I really want to get OTEL into this repo, which would be able to handle these conditions nicely.

codeviking · 2025-04-07T20:22:38Z

The underlying issue is that in some cases we short-circuit the streaming before getting the first token

Do you mind elaborating on that a bit? It sounds like in that case first_ns is not set, and this only happens when finish_reason is set and indicates it stopped for some other reason?

IIUC maybe we just do something like:

if first_ns > start_all:
  entry["ttft_ms"] = (first_ns - start_all) // 1e6

...which only adds the value if it's non-negative, with a timeout indicating that it can sometimes be negative?

But again, I'd love to understand better about how/why it can be negative/why first_ns isn't being set as expected.

cnewell · 2025-04-07T21:38:52Z

Sure. The way it works is:

On line 150, at the start of the function stream_new_message, we're capturing a start time.
At line 345 we initialize first_ns to 0.
At lines 350-373 we define a nested function map_chunk that will be applied to each chunk, or token, of a streamed response, and if first_ns is still 0 when it runs, meaning this is the first token, it captures the current timestamp.
At lines 387-392 we map map_chunk over the token-by-token chunks from the generator.
And finally starting at line 500, we build our timing log event, using 'first_ns - start_all' to as the TTFT.

Returning to step 4 where we map map_chunk over the results, you'll see that there's now a special step for the first token returned where we apply a 15 second timeout and throw an exception if we fail the timeout (this is a more recent addition). In this case, map_chunk never gets run when that timeout fails, and first_ns never gets updated from its initial assignment of 0, BUT the exception is caught and handled at lines 399-400 and we continue, so the timing log entry still gets created, but at this point first_ns - start_all is negative.

codeviking · 2025-04-07T22:25:04Z

Thanks. That's helpful.

I think it makes sense to only log this information for things that succeed. I don't think we want timing information for errors.

But I worry this comparison doesn't quite capture things:

if finish_reason is None or finish_reason == FinishReason.Stop:

...as I think things can finish for a number of reasons successfully.

I haven't looked at the code, but maybe you can adjust is so that when we catch an exception we don't emit this message?

mtblanton · 2025-04-07T23:38:33Z

logging timing for failures may be useful for debugging. knowing how long it takes for something to stop working could be helpful?

codeviking · 2025-04-07T23:53:54Z

logging timing for failures may be useful for debugging. knowing how long it takes for something to stop working could be helpful?

~~That's fair~~ Good point. Though I think for monitoring purposes (alarms) we'll want to only consider the success case.

So I think there's two things here: (1) we shouldn't emit TTFT if we can't, because we never received a token and (2) we should distinguish errors from successes, and only alarm on the latter.

Maybe we focus this PR on (1). @cnewell I think the code could be amended a bit in that case, so only add the TTFT metric when first_ns isn't 0?

…rst_ns instead of finish_reason

cnewell · 2025-04-08T18:43:35Z

@codeviking @mtblanton Updated this to just omit the inference.timing log if first_ns is still 0, and not log inference.failure pending more thought about it.

src/message/create_message_service.py

codeviking

I left one idea that would guard against any negative value. But I suspect the two are equivalent in practice.

lgtm -- thx for thinking through this w/ me!

Co-authored-by: Sam Skjonsberg <sams@allenai.org>

Fixes https://github.com/allenai/reviz-modal/issues/52 Co-authored-by: Sam Skjonsberg <sams@allenai.org>

WIP fix

2280ce6

cnewell requested review from a team and mtblanton April 7, 2025 16:58

Merge branch 'main' into chrisn/ttft_logging_bug

3c6745a

mtblanton approved these changes Apr 7, 2025

View reviewed changes

cnewell added 2 commits April 8, 2025 11:39

Removing new inference.failure log, gating inference.timing log on fi…

81073b8

…rst_ns instead of finish_reason

Merge branch 'main' into chrisn/ttft_logging_bug

7fb2f68

codeviking reviewed Apr 8, 2025

View reviewed changes

src/message/create_message_service.py Outdated Show resolved Hide resolved

codeviking approved these changes Apr 8, 2025

View reviewed changes

Update src/message/create_message_service.py

d15771f

Co-authored-by: Sam Skjonsberg <sams@allenai.org>

cnewell merged commit 1c36f74 into main Apr 8, 2025
3 checks passed

cnewell deleted the chrisn/ttft_logging_bug branch April 8, 2025 18:59

mtblanton pushed a commit that referenced this pull request Jun 13, 2025

[WIP] Fix for TTFT bug (#264)

e854658

Fixes https://github.com/allenai/reviz-modal/issues/52 Co-authored-by: Sam Skjonsberg <sams@allenai.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix for TTFT bug#264

[WIP] Fix for TTFT bug#264
cnewell merged 5 commits intomainfrom
chrisn/ttft_logging_bug

cnewell commented Apr 7, 2025

Uh oh!

mtblanton left a comment

Uh oh!

codeviking commented Apr 7, 2025

Uh oh!

cnewell commented Apr 7, 2025

Uh oh!

codeviking commented Apr 7, 2025

Uh oh!

mtblanton commented Apr 7, 2025

Uh oh!

codeviking commented Apr 7, 2025 •

edited

Loading

Uh oh!

cnewell commented Apr 8, 2025

Uh oh!

Uh oh!

codeviking left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cnewell commented Apr 7, 2025

Uh oh!

mtblanton left a comment

Choose a reason for hiding this comment

Uh oh!

codeviking commented Apr 7, 2025

Uh oh!

cnewell commented Apr 7, 2025

Uh oh!

codeviking commented Apr 7, 2025

Uh oh!

mtblanton commented Apr 7, 2025

Uh oh!

codeviking commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cnewell commented Apr 8, 2025

Uh oh!

Uh oh!

codeviking left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codeviking commented Apr 7, 2025 •

edited

Loading