Berkeley Function Calling Leaderboard Updates (v1.3) #1119
ShishirPatil
announced in
Announcements
Replies: 1 comment
-
|
Please add Nemotron 3 Nano and GPT-OSS into the leaderboard! We need more evaluations of SLMs! Bonus: include Diffusion LLMs to the equation cus they might be better? vectara/hallucination-leaderboard#164 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Highlights
🏆 Stable release of Berkeley Function Calling Leaderboard V3 with Multi-step and Multi-turn function call evaluation
What's Changed
live_parallel_multiple_9-8-0copy-paste issue by @pkesseli in [BFCL] Fixlive_parallel_multiple_9-8-0copy-paste issue #865multi_turn_base_34Ground Truth by @HuanzhiMao in [BFCL] Fix Typo inmulti_turn_base_34Ground Truth #876retry_with_backofffor Amazon Nova Handler by @HuanzhiMao in [BFCL Chore] Implementretry_with_backofffor Amazon Nova Handler #880live_simple_183-108-0by @pkesseli in [BFCL] Fixlive_simple_183-108-0#872live_simple_44-18-0andlive_simple_45-18-1by @pkesseli in [BFCL] Fixlive_simple_44-18-0andlive_simple_45-18-1#870idwith Result File Test Case IDs by @HuanzhiMao in [BFCL Chore] Align Score Fileidwith Result File Test Case IDs #893o3-mini-2025-01-31ando3-mini-2025-01-31-FCby @HuanzhiMao in [BFCL] Add New Modelo3-mini-2025-01-31ando3-mini-2025-01-31-FC#898gemini-2.0-flash-001,gemini-2.0-flash-lite-preview-02-05,gemini-2.0-pro-exp-02-05. by @HuanzhiMao in [BFCL] Add New Modelgemini-2.0-flash-001,gemini-2.0-flash-lite-preview-02-05,gemini-2.0-pro-exp-02-05. #902gpt-4.5-preview-2025-02-27,gpt-4.5-preview-2025-02-27-FCby @HuanzhiMao in [BFCL] Add New Modelgpt-4.5-preview-2025-02-27,gpt-4.5-preview-2025-02-27-FC#922DeepSeek-R1by @HuanzhiMao in [BFCL] Add New ModelDeepSeek-R1#901requirements.txtLocation to Remove Global Dependency Confusion by @HuanzhiMao in Fix Gorilla Paperrequirements.txtLocation to Remove Global Dependency Confusion #937deepseek-ai/DeepSeek-R1by @HuanzhiMao in [BFCL] Support Local Inference fordeepseek-ai/DeepSeek-R1#926Qwen2.5Models in Function Calling Mode by @HuanzhiMao in [BFCL] Add Support forQwen2.5Models in Function Calling Mode #925claude-3-7-sonnet-20250219,claude-3-7-sonnet-20250219-FCby @HuanzhiMao in [BFCL] Add New Modelclaude-3-7-sonnet-20250219,claude-3-7-sonnet-20250219-FC#923constant.pyFiles to aconstantsFolder by @catherineruoxiwu in [BFCL] Reorganized Allconstant.pyFiles to aconstantsFolder #944gemini-2.0-flash-lite-001,gemini-2.0-flash-thinking-exp-01-21by @HuanzhiMao in [BFCL] Add New Modelsgemini-2.0-flash-lite-001,gemini-2.0-flash-thinking-exp-01-21#942Gemma-3Series Models by @HuanzhiMao in [BFCL] Add GoogleGemma-3Series Models #939model_metadata.pytoconstantsfolder by @catherineruoxiwu in [BFCL] Movemodel_metadata.pytoconstantsfolder #949./data/possible_answerFolder by @catherineruoxiwu in [BFCL] Moved Ground Truths for Executable Tests to./data/possible_answerFolder #953./bfcl/eval_checker/executable_eval/data/by @catherineruoxiwu in [BFCL] Reorganizing Codes in./bfcl/eval_checker/executable_eval/data/#954gemini-2.5-proto the Leaderboard by @catherineruoxiwu in [BFCL] Addgemini-2.5-proto the Leaderboard #974multi_turn_base_166Ground Truth. by @HuanzhiMao in [BFCL] Fix Typo inmulti_turn_base_166Ground Truth. #979Llama-4-Scout,Llama-4-Maverickby @HuanzhiMao in [BFCL] Add New ModelsLlama-4-Scout,Llama-4-Maverick#981--local-model-pathby @catherineruoxiwu in [BFCL] Add Support for Fully Offline Model Inference via--local-model-path#985xLAM-2-8b-fc-rby @HuanzhiMao in Fix Typo in Model Name forxLAM-2-8b-fc-r#992microsoft/phi-4to the Leaderboard by @catherineruoxiwu in [BFCL] Addmicrosoft/phi-4to the Leaderboard #1000writer-sdkDependency Version by @HuanzhiMao in Bumpwriter-sdkDependency Version #1006live_multiple_1052-279-0by @itea1001 in [BFCL] fix entry id typo inlive_multiple_1052-279-0#1022versiontobfclCLI by @ShishirPatil in [BFCL] AddversiontobfclCLI #1038Qwen3Series by @HuanzhiMao in [BFCL] Support DashScope API Inference forQwen3Series #1061systemrole withdeveloperrole for OpenAI models by @errorfourten in [BFCL] Replacesystemrole withdeveloperrole for OpenAI models #1090New Contributors
live_parallel_multiple_9-8-0copy-paste issue #865constant.pyFiles to aconstantsFolder #944Full Changelog: v1.2...v1.3
This discussion was created from the release Berkeley Function Calling Leaderboard Updates (v1.2).
Beta Was this translation helpful? Give feedback.
All reactions