Conversation
|
I present you an interesting problem we're facing: static default elo. The way elo is calculated allows for negative elo, which is theoretically allowed as far as elo goes. In our code base, it is almost necessary, because:
Now, this also means that the median elo constantly changes. There are cases where the entire leaderboard goes through a lull period, which presents an interesting situation where the default elo is higher than everybody on the leaderboard. This, obviously, skews calculations. One way to fix this is to inject all new users into the median of the leaderboard. Do you folks have any thoughts? |
|
(fyi bilibili is broken for roughly the same reason) |
|
Thanks for the data @owobred. I think this has given me enough visibility to say that we're not using elo correctly. To understand this, let's break down the ideal scoring process and define some scoring requirements, and then talk about why our current implementation of elo won't work at all. Our scoring process:
Our scoring requirements:
Even though we've implemented the core scoring algorithm practically the same way as chess, the way we're using them is not the same because each user could be battling people way outside their league, very unlike chess.
So, this causes the median to become more and more negative, which would violate our scoring requirements because it'll be hard not to be negative. To fix this, we need to work in the same assumption-pool as how chess elo was calculated. That is to say: for each user, find maybe the 10 closest users to them, and then do battle with those elo scores. Do not commit the elo for each iteration. This would pit each user in the same league as one another, and conform more closely to how elo is meant to be used. It will also allow us to implement minimum elo, since battling is done in a user's rank locality. |
|
pinging @Gaijutsu for visibility |
|
(Time-dependent) may try to make some changes and test them on the stream tonight |
|
That's a well reasoned analysis, and I completely agree with the idea of localised elo matches. The proposed scoring requirements are good, but removes a dynamic element to the leaderboard. The changing of positions in the current scoring system is an issue, however we might benefit from having some sort of leaderboard for the stream based purely on that stream's ephemeral rank, or maybe a 'Top X of Today's Stream'. Implementing a local method will make the leaderboard largely settle, assuming users interaction rate averages across all streams. This is based on how VOD elo does a single recalculation at the end of a stream, so only one round of local elo matches occurs. This may not be the case depending on how live elo does its elo recalculations. Localised elo matches would also reduce elo recalculation by 100x (for closest 10 users at least) which is awesome for live elo. |
These aren't "proposed" scoring requirements; they've always been that way. Those are the whole reason why the elo scoring method is used in the first place. Seems like it has truly failed its purpose if the dev who ported it over to Rust didn't know 😅 Also, I'm not certain I understand why understanding the scoring requirements will remove the dynamic element from the leaderboard.
This is a good feature request, but it feels largely irrelevant to this discussion.
Really? Based on how I see it, it'll actually vary more compared to the previous method. In fact, in the graphs you've posted, it does indeed vary more; less people are at the median.
Having worked a little on Live Elo (to try and fix a bug), live elo does two kinds of calculations:
Between
Thanks for this! ❤️ I'm implementing something myself, so it might be good to commit your changes into a branch somewhere for inspection.
I need slightly more details here. How are the groups formed? Are they overlapping windows, or separate chunks of 9? i.e. suppose I am a user in the middle of the list. Do I battle 8 * 8 times = 64 times, or only 8 times ever? In my own implementation, I have a "partial window" of users centered around the user I want to force a battle with: (few edits here because I got confused about my own algorithm) So in each slice, only the center user actually gets their elo updated in the battle. In effect, each user will only have
Yep this seems strange to me, although the general shape seems right.
So a range of maximally 9 users? (i.e. the same as the above partial window?) If so, the graph looks weird to me. I expect a more distributed graph. I have some changes stashed, so I'll probably compare it shortly |
|
Sorry, said the wrong thing 😅, my brain grabbed the first word it thought of. I shouldn't have use proposed, just meant more along the lines of 'the requirements as written'. In any case, my point regarding the leaderboard being dynamic was more that someone could rise or fall a significant number of places in a short period. This is very much not intended on our leaderboards as they are now, but it was fun to suddenly see new names at the top from time to time. This is a separate discussion though, my bad 🙏 As for the implementation details, the grouped implementation does non-overlapping groups of 9, and each user fights 8 battles, one with each other member of their group. I'm dropping this in a favour of the partial window approach. My sliding partial approach is very similar to yours, except it maintains a fixed window size. As in yours, only the center user gets updated. These updates are stored in a HashMap to be applied only after all battles have been fought. The group size was 9 for both test runs. The vast majority of users ended up on intergers for some reason ¯_(ツ)_/¯. I'll keep looking into this I did my testing in a fork. It is VERY scuffed, so not sure how helpful it can be for now. The per-user-range is the only branch vaguely worth looking at |
|
Alright, based on today's stream, I think we need a slight adjustment to how we're selecting people in the window. It looks like we might do better treating people with the same elo as one person when adding them to the window, otherwise we might experience a cold-start issue. |
|
I have a potential idea on how to approach this (though I haven't tested it at all, might be a terrible idea). Essentially, each player has a "matchmaking budget" which dictates how many players they can match against, based on how far apart they are. I wrote some python (on my phone, idk if it's valid 🥺) that kinda conveys my idea. It also includes the class Player:
elo: float
score: float
current_player: Player # not in `players` list
players: list[Player]
players.sort(key=lambda p: abs(p.elo - current_player.elo))
budget = 100
opponents: list[Player] = []
for opponent in players:
if budget <= 0:
break
budget -= 1
budget -= abs(opponent.elo - current_player.elo) # could be scaled in some way to favour closer players
opponents.append(opponent)
# ... calculate new elo against opponents |
This looks pretty interesting. Might implement a test along these lines just to check it out |
first commit Remove unused code
|
New idea. Instead of using Elo, which can take up to 3 minutes to calculate, we want to update a distribution based on observed data. This is useful especially in live-elo, since we now have the element of time to play with. Here's the idea:
Ideally this solves two problems:
Essaying was the easy part; actually writing the code / doing the math would be difficult :^) |







Changes
Checklist
.gitignore)Related cards
Neuro Chat Elo Development