Commit ddaa1d5
committed
refactor inference core, unify error handling, and enhance judge flexibility
Major changes to the inference architecture and evaluation pipeline:
1. Core Architecture & Error Handling:
- Centralized retry and exception handling logic in `BaseAPI` and
`BaseModel`.
- Implemented `fail-fast` mechanism to exit immediately on critical
errors (OOM, Auth failures).
- Introduced `ignore-patterns` ("Green Light" mechanism) to
gracefully handle and record specific errors (e.g., policy
violations, content filtering) as valid responses.
- Cleaned up `generate_inner` across all API wrappers by removing
redundant try-except blocks and loops.
2. Streaming & Performance:
- Added configurable `--stream` support for major API wrappers
(OpenAI, Claude, Gemini, etc.).
- Implemented `image_mem` support in wrappers to allow zero-IO
in-memory Base64 image passing, bypassing temporary file
creation.
3. Judge & Model Initialization:
- Refactored `build_judge_model` to support dynamic class
loading, config-based routing, and fallback to
OpenAI-compatible protocols.
- Unified model initialization logic to respect `Config >
CLI` priority for parameters like `retry`, `verbose`,
and `stream`.
4. Utilities & Environment:
- Implemented Lazy Loading proxy for heavy-dependency
datasets (e.g., AstroVisBench) to resolve import hell.
- Added environment variable isolation context:
supports `_EVAL` suffixed env vars (e.g.,
`OPENAI_API_KEY_EVAL`) to override settings during
evaluation only.1 parent 4bcce23 commit ddaa1d5
File tree
698 files changed
+1561
-3164
lines changed- docs
- en
- ja
- zh-CN
- scieval
- api
- dataset
- AstroVisBench
- CGAVCounting
- CMPhysBench
- ChemBench
- EgoExoBench
- GUI
- OmniDocBench
- PHYSICS
- Math-Verify
- assets
- examples
- scripts
- src/math_verify
- tests
- latex2sympy2_extended
- sandbox
- scripts
- src/latex2sympy2_extended
- tests
- Researchbench
- SciCode
- eval
- src/scicode
- compare
- gen
- parse
- utils
- utils
- Ocrbench_v2
- spotting_eval
- ccocr_evaluator
- chartmimic
- eval_configs
- evaluator
- clima_qa
- megabench
- aggregation
- parsing
- common
- scoring
- common
- tools
- mmhelix
- evaluators
- utils
- mmif
- vcrbench
- vgrpbench
- puzzles
- smp
- utils
- vlm
- granite_vision
- hawk_vl
- hawk
- model
- language_model
- multimodal_projector
- vision_encoder
- qwen_vit
- internvl
- llava
- misc
- ola
- ola
- datasets
- model
- language_model
- multimodal_encoder
- multimodal_projector
- multimodal_resampler
- speech_encoder
- beats
- speech_projector
- ovis
- utils
- qtunevl
- qwen2_vl
- qwen3_vl
- thyme
- ursa
- ursa_model
- valley
- video_llm
- xcomposer
- scripts
- vlmeval
- dataset/utils
- Ocrbench_v2/spotting_eval/__pycache__
- vgrpbench/configs/formating-prompt
- aquarium
- battleships
- binairo
- coloredsudoku
- fieldexplore
- futoshiki
- hitori
- jigsawsudoku
- kakurasu
- kakuro
- killersudoku
- lightup
- nonogram
- oddevensudoku
- renzoku
- skyscraper
- starbattle
- sudoku
- thermometers
- treesandtents
- vlm/video_llm/configs
- llama_vid/processor/clip-patch14-224
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
698 files changed
+1561
-3164
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
196 | 196 | | |
197 | 197 | | |
198 | 198 | | |
199 | | - | |
| 199 | + | |
200 | 200 | | |
201 | 201 | | |
202 | 202 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
39 | | - | |
| 39 | + | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
90 | | - | |
| 90 | + | |
| 91 | + | |
91 | 92 | | |
92 | 93 | | |
93 | 94 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
99 | | - | |
| 99 | + | |
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
| |||
131 | 131 | | |
132 | 132 | | |
133 | 133 | | |
134 | | - | |
| 134 | + | |
135 | 135 | | |
136 | 136 | | |
137 | 137 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
| 51 | + | |
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
98 | | - | |
| 98 | + | |
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
65 | 65 | | |
66 | 66 | | |
67 | 67 | | |
68 | | - | |
| 68 | + | |
| 69 | + | |
69 | 70 | | |
70 | 71 | | |
71 | 72 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
31 | | - | |
| 31 | + | |
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
0 commit comments