Skip to content

Commit a270a98

Browse files
authored
Fix issue 3355 assertion error (#3356)
* update tests * Add test to reproduce issue #3355 AssertionError * update tests * Fix issue #3355: Handle None values in parse_generations/parse_logprobs Fixes the AssertionError in Collator.get_original() that occurred when API responses contain None values. The issue was that the code skipped None values when building the result list, causing a mismatch between the expected number of items and actual items, triggering the assertion. Changes: - generate_until: Always append results, converting None to empty string - _loglikelihood_tokens: Always append results, converting None to (-inf, False) - Applied fix to both sequential and concurrent code paths The fix ensures: 1. Result lists always have the correct number of items for reordering 2. None values are converted to sensible defaults instead of being skipped 3. Progress bars update correctly for all requests 4. Only successful results are cached Resolves #3355 * remove irrelevant test * Remove speculative loglikelihood fixes Only fix generate_until, which has confirmed evidence of the bug from issue #3355. The loglikelihood methods have no evidence of this issue: - No failures reported in issue #3355 - Different code structure in parse_logprobs (no gap creation) - Stack traces only show generate_until failures Keep the fix minimal and evidence-based. * Add warning logs when API returns null content Log concise warnings when parse_generations returns None to help users diagnose issues with API responses that have 'content': null. Warning message suggests checking: - reasoning_content field (for reasoning models) - generation limits (max_tokens, length) Includes first 50 chars of context in sequential path for debugging. * remove context log
1 parent dfe769b commit a270a98

File tree

1 file changed

+25
-10
lines changed

1 file changed

+25
-10
lines changed

lm_eval/models/api_models.py

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -769,17 +769,24 @@ def _collate_gen(_requests):
769769
),
770770
contexts,
771771
):
772-
if generated_text is not None:
772+
# Always append to res to maintain the correct number of items
773+
# even if generation failed (generated_text is None)
774+
if generated_text is None:
775+
eval_logger.warning(
776+
"API returned null content. Check reasoning_content field or generation limits..."
777+
)
778+
res.append("")
779+
else:
773780
res.append(generated_text)
774781

775-
# partial caching
776-
if context is not None:
777-
self.cache_hook.add_partial(
778-
"generate_until",
779-
(context, all_gen_kwargs[0]),
780-
generated_text,
781-
)
782-
pbar.update(1)
782+
# partial caching only for successful generations
783+
if generated_text is not None and context is not None:
784+
self.cache_hook.add_partial(
785+
"generate_until",
786+
(context, all_gen_kwargs[0]),
787+
generated_text,
788+
)
789+
pbar.update(1)
783790
else:
784791
for chunk in chunked:
785792
contexts, all_gen_kwargs, encodings_list = zip(*chunk)
@@ -809,7 +816,15 @@ def _collate_gen(_requests):
809816
)
810817
)
811818
)
812-
res.extend(results)
819+
# Convert None values to empty strings to maintain consistency
820+
for r in results:
821+
if r is None:
822+
eval_logger.warning(
823+
"API returned null content. Check reasoning_content field or generation limits."
824+
)
825+
res.append("")
826+
else:
827+
res.append(r)
813828

814829
return re_ord.get_original(res)
815830

0 commit comments

Comments
 (0)