Add MiniMax as LLM evaluation provider by octo-patch · Pull Request #32 · snap-research/locomo

octo-patch · 2026-03-23T22:10:36Z

Summary

Add MiniMax (M2.5, M2.5-highspeed, M2.7) as a new LLM provider for the LoCoMo long-term conversational memory benchmark
Follow the existing provider pattern (GPT, Claude, Gemini) with run_minimax() in global_methods.py and task_eval/minimax_utils.py
MiniMax models offer 204K-1M context windows via OpenAI-compatible API, making them suitable for long-term conversation evaluation

Changes

File	Description
global_methods.py	Add run_minimax() and set_minimax_key() with OpenAI-compatible API, temperature clamping, think-tag stripping
task_eval/minimax_utils.py	New MiniMax evaluation utils with get_minimax_answers() following claude_utils.py pattern
task_eval/evaluate_qa.py	Add MiniMax model dispatch alongside GPT/Claude/Gemini
scripts/evaluate_minimax.sh	Evaluation script for MiniMax-M2.5 and MiniMax-M2.5-highspeed
scripts/env.sh	Add MINIMAX_API_KEY environment variable
README.MD	Add MiniMax evaluation instructions
tests/test_minimax_unit.py	27 unit tests covering API calls, model mapping, temperature clamping, think-tag stripping, answer parsing
tests/test_minimax_integration.py	3 integration tests with real MiniMax API calls

Usage

Set your MiniMax API key in scripts/env.sh, then run:
bash scripts/evaluate_minimax.sh

Test plan

27 unit tests passing (mocked API calls)
3 integration tests passing (real API calls)
Verify evaluation output format matches existing providers

Add MiniMax (M2.5, M2.5-highspeed, M2.7) as a new LLM provider for the LoCoMo long-term conversational memory benchmark, following the existing provider pattern (GPT, Claude, Gemini). - Add run_minimax() to global_methods.py with OpenAI-compatible API, temperature clamping, and think-tag stripping for M2.5 models - Create task_eval/minimax_utils.py with get_minimax_answers() following the claude_utils.py pattern (large context, no truncation) - Update evaluate_qa.py with MiniMax model dispatch - Add scripts/evaluate_minimax.sh evaluation script - Add MINIMAX_API_KEY to scripts/env.sh - Update README.MD with MiniMax evaluation instructions - Add 27 unit tests and 3 integration tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MiniMax as LLM evaluation provider#32

Add MiniMax as LLM evaluation provider#32
octo-patch wants to merge 1 commit intosnap-research:mainfrom
octo-patch:feature/add-minimax-provider

octo-patch commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

octo-patch commented Mar 23, 2026

Summary

Changes

Usage

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant