Fix issue 3355 assertion error (#3356)

marksverdhei · web-flow · commit a270a98bc10d · 2025-11-10T03:58:21.000+05:00
* update tests * Add test to reproduce issue #3355 AssertionError * update tests * Fix issue #3355: Handle None values in parse_generations/parse_logprobs Fixes the AssertionError in Collator.get_original() that occurred when API responses contain None values. The issue was that the code skipped None values when building the result list, causing a mismatch between the expected number of items and actual items, triggering the assertion. Changes: - generate_until: Always append results, converting None to empty string - _loglikelihood_tokens: Always append results, converting None to (-inf, False) - Applied fix to both sequential and concurrent code paths The fix ensures: 1. Result lists always have the correct number of items for reordering 2. None values are converted to sensible defaults instead of being skipped 3. Progress bars update correctly for all requests 4. Only successful results are cached Resolves #3355 * remove irrelevant test * Remove speculative loglikelihood fixes Only fix generate_until, which has confirmed evidence of the bug from issue #3355. The loglikelihood methods have no evidence of this issue: - No failures reported in issue #3355 - Different code structure in parse_logprobs (no gap creation) - Stack traces only show generate_until failures Keep the fix minimal and evidence-based. * Add warning logs when API returns null content Log concise warnings when parse_generations returns None to help users diagnose issues with API responses that have 'content': null. Warning message suggests checking: - reasoning_content field (for reasoning models) - generation limits (max_tokens, length) Includes first 50 chars of context in sequential path for debugging. * remove context log
diff --git a/lm_eval/models/api_models.py b/lm_eval/models/api_models.py
@@ -769,17 +769,24 @@ def _collate_gen(_requests):
                     ),
                     contexts,
                 ):
-                    if generated_text is not None:
+                    # Always append to res to maintain the correct number of items
+                    # even if generation failed (generated_text is None)
+                    if generated_text is None:
+                        eval_logger.warning(
+                            "API returned null content. Check reasoning_content field or generation limits..."
+                        )
+                        res.append("")
+                    else:
                         res.append(generated_text)
 
-                        # partial caching
-                        if context is not None:
-                            self.cache_hook.add_partial(
-                                "generate_until",
-                                (context, all_gen_kwargs[0]),
-                                generated_text,
-                            )
-                            pbar.update(1)
+                    # partial caching only for successful generations
+                    if generated_text is not None and context is not None:
+                        self.cache_hook.add_partial(
+                            "generate_until",
+                            (context, all_gen_kwargs[0]),
+                            generated_text,
+                        )
+                    pbar.update(1)
         else:
             for chunk in chunked:
                 contexts, all_gen_kwargs, encodings_list = zip(*chunk)
@@ -809,7 +816,15 @@ def _collate_gen(_requests):
                         )
                     )
                 )
-                res.extend(results)
+                # Convert None values to empty strings to maintain consistency
+                for r in results:
+                    if r is None:
+                        eval_logger.warning(
+                            "API returned null content. Check reasoning_content field or generation limits."
+                        )
+                        res.append("")
+                    else:
+                        res.append(r)
 
         return re_ord.get_original(res)