update(ai): update batch AI to use new Selector and add sort by current latency by ad-astra-video · Pull Request #3415 · livepeer/go-livepeer

ad-astra-video · 2025-02-23T16:18:01Z

What does this pull request do? Explain your changes. (required)

Adds option to sort new Selector using LatencyScore instead of InitialLatency. Batch AI jobs do not have any tracking or data provided for InitialLatency but would like to use the new Selector that drops knownSessions logic.

This will allow using the selection algorithm to select Orchestrator sessions (price/random/stake weights and min perf score). Next is to update min perf score to allow it to be set for each pipeline. Gateways can run different instances for each pipeline if necessary to set differently currently.

Specific updates (required)

add parameter to NewSelector to choose which to sort Selector sessions by.
updates batch AI selectors to use the new Selector

How did you test each of these updates (required)

Does this pull request close any open issues?

No

Checklist:

Read the contribution guide
make runs successfully
All tests in ./test.sh pass
README and other documentation updated
Pending changelog updated

codecov · 2025-02-23T16:28:47Z

Codecov Report

Attention: Patch coverage is 72.00000% with 7 lines in your changes missing coverage. Please review.

Project coverage is 32.14704%. Comparing base (de523e8) to head (40d137f).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
server/ai_session.go	0.00000%	7 Missing ⚠️

Additional details and impacted files

@@                 Coverage Diff                 @@
##              master       #3415         +/-   ##
===================================================
+ Coverage   32.14042%   32.14704%   +0.00662%     
===================================================
  Files            147         147                 
  Lines          41020       41024          +4     
===================================================
+ Hits           13184       13188          +4     
  Misses         27060       27060                 
  Partials         776         776

Files with missing lines	Coverage Δ
server/selection.go	`93.47826% <100.00000%> (+0.14493%)`	⬆️
server/ai_session.go	`2.05882% <0.00000%> (ø)`

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de523e8...40d137f. Read the comment docs.

Files with missing lines	Coverage Δ
server/selection.go	`93.47826% <100.00000%> (+0.14493%)`	⬆️
server/ai_session.go	`2.05882% <0.00000%> (ø)`

leszko

Two questions:

Why not to include the LatencyScore into the selection algorithm like I suggested here?
Aren't you using the selection algorithm at all for batch AI Jobs? Because this sorting is only used if the selection algorithm is disabled, otherwise no matter how you sort, you'll use the result from the selection algorithm.

ad-astra-video · 2025-02-25T18:21:51Z

Why not to include the LatencyScore into the selection algorithm like I suggested #3402 (comment)?

I want to do this in a separate PR later this week or early next week. Are you thinking to add the LatencyScore as the perfScores fed to the selection algo or adding a weight in the selection algo for known latency if available? I am thinking that easier first step is to set the LatencyScore as the perfScores is a much lighter lift and will provide most of experience wanted. We need a way to set the minPerfScore by capability/model id as well so I thought doing this in a separate PR made sense.

Aren't you using the selection algorithm at all for batch AI Jobs? Because this sorting is only used if the selection algorithm is disabled, otherwise no matter how you sort, you'll use the result from the selection algorithm.

Sorting still puts the fastest orchestrators in the front of the line when the probabilities are applied. I think this provides some advantage but have not run extensive scenario testing yet.

leszko · 2025-02-26T13:29:10Z

Why not to include the LatencyScore into the selection algorithm like I suggested #3402 (comment)?

I want to do this in a separate PR later this week or early next week. Are you thinking to add the LatencyScore as the perfScores fed to the selection algo or adding a weight in the selection algo for known latency if available? I am thinking that easier first step is to set the LatencyScore as the perfScores is a much lighter lift and will provide most of experience wanted. We need a way to set the minPerfScore by capability/model id as well so I thought doing this in a separate PR made sense.

This is up to you. I think that the "clearer" solution would be to add latency score as a param in the selection algorithm here. Then it's one more input to the selection algorithm. You could also override perfScores, but I guess you may want at some point make a selection algorithm which is based on both perf scores and latency score.

Aren't you using the selection algorithm at all for batch AI Jobs? Because this sorting is only used if the selection algorithm is disabled, otherwise no matter how you sort, you'll use the result from the selection algorithm.

Sorting still puts the fastest orchestrators in the front of the line when the probabilities are applied. I think this provides some advantage but have not run extensive scenario testing yet.

Yeah, it's (even) more confusing to reason about it. So, maybe if you plan to use Latency Scores in the selection algorithm, then we should park this PR and just send the one you planned?
I mention this mostly, because this selection logic is already complex and now when we add the usage of Latency Scores in two places, then it's one more drop in this spaghetti.

leszko · 2025-02-27T10:34:37Z

Merging.

leszko · 2025-02-27T10:39:42Z

FYI: I'll make small refactoring here: #3428

ad-astra-video added 2 commits February 23, 2025 07:49

add sort by latency option to new selector

651f196

cleanup

16ae186

github-actions bot added go Pull requests that update Go code AI Issues and PR related to the AI-video branch. labels Feb 23, 2025

leszko reviewed Feb 24, 2025

View reviewed changes

Merge branch 'master' into add-sort-by-current-latency

40d137f

leszko approved these changes Feb 27, 2025

View reviewed changes

leszko merged commit 59ea865 into livepeer:master Feb 27, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update(ai): update batch AI to use new Selector and add sort by current latency#3415

update(ai): update batch AI to use new Selector and add sort by current latency#3415
leszko merged 3 commits intolivepeer:masterfrom
ad-astra-video:add-sort-by-current-latency

ad-astra-video commented Feb 23, 2025

Uh oh!

codecov bot commented Feb 23, 2025 •

edited

Loading

Uh oh!

leszko left a comment

Uh oh!

ad-astra-video commented Feb 25, 2025

Uh oh!

leszko commented Feb 26, 2025

Uh oh!

leszko commented Feb 27, 2025

Uh oh!

Uh oh!

leszko commented Feb 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ad-astra-video commented Feb 23, 2025

Uh oh!

codecov bot commented Feb 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

leszko left a comment

Choose a reason for hiding this comment

Uh oh!

ad-astra-video commented Feb 25, 2025

Uh oh!

leszko commented Feb 26, 2025

Uh oh!

leszko commented Feb 27, 2025

Uh oh!

Uh oh!

leszko commented Feb 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Feb 23, 2025 •

edited

Loading