Skip to content

Conversation

@kcz358
Copy link
Collaborator

@kcz358 kcz358 commented Jan 4, 2026

Before you open a pull-request, please check if a similar issue already exists or has been closed before.

When you open a pull-request, please be sure to include the following

  • A descriptive title: [xxx] XXXX
  • A detailed description

If you meet the lint warnings, you can use following scripts to reformat code.

pip install pre-commit
pre-commit install
pre-commit run --all-files

Ask for review

Once you feel comfortable for your PR, feel free to @ one of the contributors to review

@kcz358 @KemingWu

Thank you for your contributions!

@kcz358
Copy link
Collaborator Author

kcz358 commented Jan 4, 2026

Fix the issue as mention in #941

@kcz358 kcz358 merged commit 69e5287 into main Jan 4, 2026
3 checks passed
@kcz358 kcz358 deleted the dev/imgedit branch January 4, 2026 03:38
@claude
Copy link

claude bot commented Jan 4, 2026

Code Review for PR #966

Summary

This PR fixes the OpenAI API call structure in the ImgEdit evaluation logic by reordering message content and adding a temperature parameter. The changes are valid and improve evaluation quality.


Changes Analysis

1. Message Content Reordering

Change: Images now appear before the text prompt in the message content array.

Rationale:

  • For vision-language models (like GPT-4o), placing images before instructions follows OpenAI's recommended best practices
  • The prompts in prompt.py explicitly say "Below are the images before and after editing:" at the end, implying images should be presented first
  • This ordering helps the model better understand the visual context before receiving scoring instructions

Impact: Should improve evaluation accuracy and consistency.


2. Temperature Parameter Addition

Change: Added temperature=0.1 to the API call.

Rationale:

  • For evaluation/scoring tasks, low temperature (0-0.2) ensures deterministic, consistent results
  • This aligns with other evaluation code in the codebase:
    • lmms_eval/tasks/mia_bench/utils.py:72 uses temperature: 0
    • lmms_eval/tasks/egothink/utils.py:149 uses temperature=0
  • Temperature 0.1 (vs 0.0) provides minimal randomness while maintaining reproducibility

Impact: More consistent scoring across multiple runs of the same evaluation.


Code Quality Assessment

Strengths:

  • Minimal, focused change (2 additions, 1 deletion)
  • Follows existing patterns in the codebase
  • No breaking changes to API or functionality
  • Appropriate for bug fix classification

Considerations:

  1. Test Coverage: No tests exist for the imgedit module. Consider adding tests for:

    • _call_openai_for_evaluation() function (with mocked OpenAI client)
    • parse_gpt_scores() function with various response formats
    • Message content structure validation
  2. Documentation: The change is self-explanatory, but a code comment explaining why images come before text could help future maintainers:

    # Place images before text for better vision model comprehension
    content = [
        {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{original_b64}"}},
        {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{edited_b64}"}},
        {"type": "text", "text": full_prompt},
    ]
  3. Potential Issue - Line 201: The temperature=0.1 parameter is appropriate, but ensure this value is tested. If evaluation variance is observed, consider:

    • Making temperature configurable via environment variable
    • Documenting the choice in code comments

Security & Performance

  • Security: ✅ No security concerns. The change doesn't introduce new attack vectors.
  • Performance: ✅ Neutral impact. Message reordering and temperature parameter don't affect API latency.

Compliance with CLAUDE.md Guidelines

Minimal Changes: Only modifies code related to the bug fix
Simplicity: Straightforward, readable change
Code Style: Follows existing formatting (88 char limit: ✓, indentation: ✓)
⚠️ Testing: No regression test added (though none existed before)
⚠️ Type Hints: Existing code already has proper type hints


Recommendations

  1. Merge as-is: The fix is correct and improves evaluation quality
  2. Future enhancement: Add integration tests for the ImgEdit evaluation pipeline
  3. Optional: Add a brief inline comment explaining the image-first ordering rationale

Final Verdict

APPROVED

This is a well-targeted bug fix that improves the correctness of the OpenAI API integration. The changes align with best practices and codebase conventions.

Suggested commit message enhancement:

[Fix] Fix imgedit eval logic for call openai client

Reorder message content to place images before text prompt, following 
OpenAI vision model best practices. Add temperature=0.1 for consistent 
evaluation scoring.

This ensures better visual context understanding by the evaluation model
and improves reproducibility of scores across multiple runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants