adapt scoring for user-submitted models#76
Conversation
| # https://github.com/BerriAI/litellm/blob/b9621c760d3355e06dd17ec89b9eb6776755392e/litellm/litellm_core_utils/get_model_cost_map.py#L16 | ||
| # See the Development.md before changing. | ||
| desired_model_costs_url = "https://raw.githubusercontent.com/BerriAI/litellm/eb66daeef740947c0326826817cf68fb56a8b931/litellm/model_prices_and_context_window_backup.json" | ||
| desired_model_costs_url = "https://raw.githubusercontent.com/BerriAI/litellm/9a5c778f1824641fe9f6c8dcc1d096fd9d8ef9f0/litellm/model_prices_and_context_window_backup.json" |
There was a problem hiding this comment.
how'd you choose this one? i ended up in the same place for running some other cost calcs. think we should take whatever the latest is
|
I have a further request if it makes sense. Put the costs used in the data. https://github.com/allenai/agent-eval/compare/update-cost-map-gpt54 |
That makes sense to me...do you want me to bring those changes into this PR? |
yes please :) I'm trying to figure out the specific model names for all the newer runs we're trying to get and see if they're actually in that version of the costs file. If that file is really only a few days old, then it's probably(?) fine |

I am attempting to score some recently arrived external submissions for AstaBench with model usage that won't allow cost calculation in our current code.
https://huggingface.co/datasets/allenai/asta-bench-submissions/tree/main/1.0.0/test/EvoScientist_EvoScientist_Coder_2026-03-19_16-22-34
The solver args show this provider
openrouter/moonshotai/kimi-k2.5, which comes through in the inspect model usage objects like this:Moonshot is supported as an inference provider in litellm and some cost objects have been added to https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json
but are not yet in a released version. This PR adds pricing in local_cost to handle this provider/model.
1.0.0/test/Distyl_AI_Button_2026-03-23_18-54-16
This cost information can be added by bumping the litellm version to 1.82.3 and updating the
desired_model_costs_urlto match the sha for this release.According to litellm, the compromised PyPI packages were litellm==1.82.7 and litellm==1.82.8.
Verified that scoring these two submissions is possible with these changes.