Conversation
mtblanton
left a comment
There was a problem hiding this comment.
I think this makes sense as a quick fix. It makes our existing dashboards that track inference.timing work properly while still giving us good timing info with inference.failure.
I really want to get OTEL into this repo, which would be able to handle these conditions nicely.
Do you mind elaborating on that a bit? It sounds like in that case IIUC maybe we just do something like: ...which only adds the value if it's non-negative, with a timeout indicating that it can sometimes be negative? But again, I'd love to understand better about how/why it can be negative/why |
|
Sure. The way it works is:
Returning to step 4 where we map |
|
Thanks. That's helpful. I think it makes sense to only log this information for things that succeed. I don't think we want timing information for errors. But I worry this comparison doesn't quite capture things: ...as I think things can finish for a number of reasons successfully. I haven't looked at the code, but maybe you can adjust is so that when we catch an exception we don't emit this message? |
|
logging timing for failures may be useful for debugging. knowing how long it takes for something to stop working could be helpful? |
So I think there's two things here: (1) we shouldn't emit TTFT if we can't, because we never received a token and (2) we should distinguish errors from successes, and only alarm on the latter. Maybe we focus this PR on (1). @cnewell I think the code could be amended a bit in that case, so only add the TTFT metric when |
…rst_ns instead of finish_reason
|
@codeviking @mtblanton Updated this to just omit the inference.timing log if |
codeviking
left a comment
There was a problem hiding this comment.
I left one idea that would guard against any negative value. But I suspect the two are equivalent in practice.
lgtm -- thx for thinking through this w/ me!
Co-authored-by: Sam Skjonsberg <sams@allenai.org>
Fixes https://github.com/allenai/reviz-modal/issues/52 Co-authored-by: Sam Skjonsberg <sams@allenai.org>
Fixes https://github.com/allenai/reviz-modal/issues/52
The problem is that we sometimes get ridiculously negative times to first token on the OLMo API Dashboard, which then squashes all the actual data into a flat line on the graph. The underlying issue is that in some cases we short-circuit the streaming before getting the first token, leaving the TTFT timestamp set to its initial value of 0, which of course gives a large negative number when subtracting the overall start timestamp.
I'm debating between just not logging an event at all in these cases, versus logging a new
inference.failureevent, and currently calling it a success if finish_reason is either Stop (known to be good) or None, which is to say just failing on known error finish_reason. Thoughts? Anything else I may be missing here?