Skip to content

update(ai): update batch AI to use new Selector and add sort by current latency#3415

Merged
leszko merged 3 commits intolivepeer:masterfrom
ad-astra-video:add-sort-by-current-latency
Feb 27, 2025
Merged

update(ai): update batch AI to use new Selector and add sort by current latency#3415
leszko merged 3 commits intolivepeer:masterfrom
ad-astra-video:add-sort-by-current-latency

Conversation

@ad-astra-video
Copy link
Copy Markdown
Collaborator

What does this pull request do? Explain your changes. (required)

Adds option to sort new Selector using LatencyScore instead of InitialLatency. Batch AI jobs do not have any tracking or data provided for InitialLatency but would like to use the new Selector that drops knownSessions logic.

This will allow using the selection algorithm to select Orchestrator sessions (price/random/stake weights and min perf score). Next is to update min perf score to allow it to be set for each pipeline. Gateways can run different instances for each pipeline if necessary to set differently currently.

Specific updates (required)

  • add parameter to NewSelector to choose which to sort Selector sessions by.
  • updates batch AI selectors to use the new Selector

How did you test each of these updates (required)

Does this pull request close any open issues?

No

Checklist:

@github-actions github-actions bot added go Pull requests that update Go code AI Issues and PR related to the AI-video branch. labels Feb 23, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 23, 2025

Codecov Report

Attention: Patch coverage is 72.00000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 32.14704%. Comparing base (de523e8) to head (40d137f).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
server/ai_session.go 0.00000% 7 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##              master       #3415         +/-   ##
===================================================
+ Coverage   32.14042%   32.14704%   +0.00662%     
===================================================
  Files            147         147                 
  Lines          41020       41024          +4     
===================================================
+ Hits           13184       13188          +4     
  Misses         27060       27060                 
  Partials         776         776                 
Files with missing lines Coverage Δ
server/selection.go 93.47826% <100.00000%> (+0.14493%) ⬆️
server/ai_session.go 2.05882% <0.00000%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de523e8...40d137f. Read the comment docs.

Files with missing lines Coverage Δ
server/selection.go 93.47826% <100.00000%> (+0.14493%) ⬆️
server/ai_session.go 2.05882% <0.00000%> (ø)

Copy link
Copy Markdown
Contributor

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. Why not to include the LatencyScore into the selection algorithm like I suggested here?
  2. Aren't you using the selection algorithm at all for batch AI Jobs? Because this sorting is only used if the selection algorithm is disabled, otherwise no matter how you sort, you'll use the result from the selection algorithm.

@ad-astra-video
Copy link
Copy Markdown
Collaborator Author

Why not to include the LatencyScore into the selection algorithm like I suggested #3402 (comment)?

I want to do this in a separate PR later this week or early next week. Are you thinking to add the LatencyScore as the perfScores fed to the selection algo or adding a weight in the selection algo for known latency if available? I am thinking that easier first step is to set the LatencyScore as the perfScores is a much lighter lift and will provide most of experience wanted. We need a way to set the minPerfScore by capability/model id as well so I thought doing this in a separate PR made sense.

Aren't you using the selection algorithm at all for batch AI Jobs? Because this sorting is only used if the selection algorithm is disabled, otherwise no matter how you sort, you'll use the result from the selection algorithm.

Sorting still puts the fastest orchestrators in the front of the line when the probabilities are applied. I think this provides some advantage but have not run extensive scenario testing yet.

@leszko
Copy link
Copy Markdown
Contributor

leszko commented Feb 26, 2025

Why not to include the LatencyScore into the selection algorithm like I suggested #3402 (comment)?

I want to do this in a separate PR later this week or early next week. Are you thinking to add the LatencyScore as the perfScores fed to the selection algo or adding a weight in the selection algo for known latency if available? I am thinking that easier first step is to set the LatencyScore as the perfScores is a much lighter lift and will provide most of experience wanted. We need a way to set the minPerfScore by capability/model id as well so I thought doing this in a separate PR made sense.

This is up to you. I think that the "clearer" solution would be to add latency score as a param in the selection algorithm here. Then it's one more input to the selection algorithm. You could also override perfScores, but I guess you may want at some point make a selection algorithm which is based on both perf scores and latency score.

Aren't you using the selection algorithm at all for batch AI Jobs? Because this sorting is only used if the selection algorithm is disabled, otherwise no matter how you sort, you'll use the result from the selection algorithm.

Sorting still puts the fastest orchestrators in the front of the line when the probabilities are applied. I think this provides some advantage but have not run extensive scenario testing yet.

Yeah, it's (even) more confusing to reason about it. So, maybe if you plan to use Latency Scores in the selection algorithm, then we should park this PR and just send the one you planned?
I mention this mostly, because this selection logic is already complex and now when we add the usage of Latency Scores in two places, then it's one more drop in this spaghetti.

@leszko
Copy link
Copy Markdown
Contributor

leszko commented Feb 27, 2025

Merging.

@leszko leszko merged commit 59ea865 into livepeer:master Feb 27, 2025
17 checks passed
@leszko
Copy link
Copy Markdown
Contributor

leszko commented Feb 27, 2025

FYI: I'll make small refactoring here: #3428

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI Issues and PR related to the AI-video branch. go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants