Skip to content

[WIP] Fix for TTFT bug#264

Merged
cnewell merged 5 commits intomainfrom
chrisn/ttft_logging_bug
Apr 8, 2025
Merged

[WIP] Fix for TTFT bug#264
cnewell merged 5 commits intomainfrom
chrisn/ttft_logging_bug

Conversation

@cnewell
Copy link
Copy Markdown
Contributor

@cnewell cnewell commented Apr 7, 2025

Fixes https://github.com/allenai/reviz-modal/issues/52

The problem is that we sometimes get ridiculously negative times to first token on the OLMo API Dashboard, which then squashes all the actual data into a flat line on the graph. The underlying issue is that in some cases we short-circuit the streaming before getting the first token, leaving the TTFT timestamp set to its initial value of 0, which of course gives a large negative number when subtracting the overall start timestamp.

I'm debating between just not logging an event at all in these cases, versus logging a new inference.failure event, and currently calling it a success if finish_reason is either Stop (known to be good) or None, which is to say just failing on known error finish_reason. Thoughts? Anything else I may be missing here?

@cnewell cnewell requested review from a team and mtblanton April 7, 2025 16:58
Copy link
Copy Markdown
Contributor

@mtblanton mtblanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense as a quick fix. It makes our existing dashboards that track inference.timing work properly while still giving us good timing info with inference.failure.

I really want to get OTEL into this repo, which would be able to handle these conditions nicely.

@codeviking
Copy link
Copy Markdown

The underlying issue is that in some cases we short-circuit the streaming before getting the first token

Do you mind elaborating on that a bit? It sounds like in that case first_ns is not set, and this only happens when finish_reason is set and indicates it stopped for some other reason?

IIUC maybe we just do something like:

if first_ns > start_all:
  entry["ttft_ms"] = (first_ns - start_all) // 1e6

...which only adds the value if it's non-negative, with a timeout indicating that it can sometimes be negative?

But again, I'd love to understand better about how/why it can be negative/why first_ns isn't being set as expected.

@cnewell
Copy link
Copy Markdown
Contributor Author

cnewell commented Apr 7, 2025

Sure. The way it works is:

  1. On line 150, at the start of the function stream_new_message, we're capturing a start time.
  2. At line 345 we initialize first_ns to 0.
  3. At lines 350-373 we define a nested function map_chunk that will be applied to each chunk, or token, of a streamed response, and if first_ns is still 0 when it runs, meaning this is the first token, it captures the current timestamp.
  4. At lines 387-392 we map map_chunk over the token-by-token chunks from the generator.
  5. And finally starting at line 500, we build our timing log event, using 'first_ns - start_all' to as the TTFT.

Returning to step 4 where we map map_chunk over the results, you'll see that there's now a special step for the first token returned where we apply a 15 second timeout and throw an exception if we fail the timeout (this is a more recent addition). In this case, map_chunk never gets run when that timeout fails, and first_ns never gets updated from its initial assignment of 0, BUT the exception is caught and handled at lines 399-400 and we continue, so the timing log entry still gets created, but at this point first_ns - start_all is negative.

@codeviking
Copy link
Copy Markdown

Thanks. That's helpful.

I think it makes sense to only log this information for things that succeed. I don't think we want timing information for errors.

But I worry this comparison doesn't quite capture things:

if finish_reason is None or finish_reason == FinishReason.Stop:

...as I think things can finish for a number of reasons successfully.

I haven't looked at the code, but maybe you can adjust is so that when we catch an exception we don't emit this message?

@mtblanton
Copy link
Copy Markdown
Contributor

logging timing for failures may be useful for debugging. knowing how long it takes for something to stop working could be helpful?

@codeviking
Copy link
Copy Markdown

codeviking commented Apr 7, 2025

logging timing for failures may be useful for debugging. knowing how long it takes for something to stop working could be helpful?

That's fair Good point. Though I think for monitoring purposes (alarms) we'll want to only consider the success case.

So I think there's two things here: (1) we shouldn't emit TTFT if we can't, because we never received a token and (2) we should distinguish errors from successes, and only alarm on the latter.

Maybe we focus this PR on (1). @cnewell I think the code could be amended a bit in that case, so only add the TTFT metric when first_ns isn't 0?

@cnewell
Copy link
Copy Markdown
Contributor Author

cnewell commented Apr 8, 2025

@codeviking @mtblanton Updated this to just omit the inference.timing log if first_ns is still 0, and not log inference.failure pending more thought about it.

Copy link
Copy Markdown

@codeviking codeviking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one idea that would guard against any negative value. But I suspect the two are equivalent in practice.

lgtm -- thx for thinking through this w/ me!

Co-authored-by: Sam Skjonsberg <sams@allenai.org>
@cnewell cnewell merged commit 1c36f74 into main Apr 8, 2025
3 checks passed
@cnewell cnewell deleted the chrisn/ttft_logging_bug branch April 8, 2025 18:59
mtblanton pushed a commit that referenced this pull request Jun 13, 2025
Fixes https://github.com/allenai/reviz-modal/issues/52
Co-authored-by: Sam Skjonsberg <sams@allenai.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants