-
Notifications
You must be signed in to change notification settings - Fork 15
refactor(multimodal): change score range from [0,1] to 1-5 scale #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Update ImageCoherenceGrader to use 1-5 scoring - Update ImageHelpfulnessGrader to use 1-5 scoring - Update TextToImageGrader to use 1-5 scoring - Update documentation and tests accordingly
Summary of ChangesHello @XiaoBoAI, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant refactoring of the scoring mechanism across several multimodal graders. The primary goal is to standardize the output scores to a more intuitive and widely understood 1-5 scale, moving away from the previous 0-1 or 0-10 ranges. This change simplifies score interpretation and ensures consistency across different evaluation metrics within the multimodal grading system. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully refactors the multimodal graders (ImageCoherenceGrader, ImageHelpfulnessGrader, TextToImageGrader) to use a 1-5 scoring scale instead of the previous [0,1] normalized scale. The changes are consistently applied across the grader implementations, documentation, and tests. The core logic for score calculation has been updated, and prompts and docstrings now reflect the new 1-5 scale.
While the changes within the diff are correct, I've identified a few related areas that might need attention to fully complete this refactoring:
- In
ImageCoherenceGraderandImageHelpfulnessGrader, error cases (like no images found) result in a score of0.0, which is outside the new1-5range. This should likely be1.0for consistency withTextToImageGrader. - In
TextToImageGrader, theaevaluatemethod returns a score of0.0for invalid inputs, which is also out of range. Additionally, the generatedreasonstring for this grader still refers to scores out of/10.
Since these issues are on lines not modified in this pull request, I haven't added specific comments for them. However, addressing them would make this refactoring more complete and robust. I've added one specific comment regarding a documentation change.
| | `ImageCoherenceGrader` | Evaluates image-text coherence | LLM-Based | {0, 1} | | ||
| | `ImageHelpfulnessGrader` | Assesses if images help understanding | LLM-Based | {0, 1} | | ||
| | `TextToImageGrader` | Evaluates text-to-image generation quality | LLM-Based | {0, 1} | | ||
| | `ImageEditingGrader` | Evaluates image editing quality | LLM-Based | {0, 1} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenJudge Version
[The version of OpenJudge you are working on, e.g.
import openjudge; print(openjudge.__version__)]Description
[Please describe the background, purpose, changes made, and how to test this PR]
Checklist
Please check the following items before code is ready to be reviewed.
pre-commit run --all-filescommand