-
Notifications
You must be signed in to change notification settings - Fork 65
feat: update auto-gen-metric #289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @e06084, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on a significant refactoring of metric definitions and their corresponding documentation. The primary goal is to standardize metric naming to reflect their underlying LLM-based nature and to improve the overall organization and clarity of the metric catalog. Additionally, new evaluation metrics specifically designed for Retrieval Augmented Generation (RAG) systems have been integrated, enhancing the system's capability to assess advanced language model applications. These changes are supported by updates to the automated documentation generation process, ensuring consistency and maintainability. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant refactoring of the metrics system, primarily renaming 'Prompt-based' metrics to 'LLM-based' metrics and reorganizing the metric categories for better clarity. The changes are consistently applied across metric definition files, the auto-generation script, and the resulting metrics.md documentation. The script scripts/generate_metrics.py has been updated to reflect this new structure and now uses llm_name_map and standardizes on using class names for metric names in the documentation.
My review focuses on the correctness of the generation script and the consistency of the generated documentation. I've identified a potential data issue in the documentation and a suggestion to improve the maintainability of the generation script. Overall, the changes are well-executed and improve the structure of the metrics.
| | `LLMMinerURecognizeQuality` | LLMMinerURecognizeQuality | Evaluate the quality of mineru recognize | Internal Implementation | [📊 See Results](error_category and error_label) | | ||
| | `VLMDocumentParsingOCRTrain` | VLMDocumentParsingOCRTrain | Evaluate the quality of mineru recognize | Internal Implementation | [📊 See Results](error_category and error_label) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics LLMMinerURecognizeQuality and VLMDocumentParsingOCRTrain appear to have identical descriptions ("Evaluate the quality of mineru recognize"). This might be a copy-paste error in the source _metric_info of one of these metrics. Please verify if VLMDocumentParsingOCRTrain should have a more specific description to differentiate it from LLMMinerURecognizeQuality.
| category_order = [ | ||
| "RAG Evaluation Metrics", | ||
| "Pretrain Text Quality Assessment Metrics", | ||
| "SFT Data Assessment Metrics", | ||
| "Classification Metrics", | ||
| "Multimodality Assessment Metrics", | ||
| "Rule-Based TEXT Quality Metrics", | ||
| "Rule-Based IMG Quality Metrics" | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The category_order list only defines the order for a subset of the metric categories. The remaining categories are appended alphabetically, which might lead to an unintended document structure if new categories are added in the future. To ensure a stable and explicit document structure, it's better to include all categories in this list in the desired order.
category_order = [
"RAG Evaluation Metrics",
"Pretrain Text Quality Assessment Metrics",
"SFT Data Assessment Metrics",
"Classification Metrics",
"Multimodality Assessment Metrics",
"Rule-Based TEXT Quality Metrics",
"Rule-Based IMG Quality Metrics",
"Audio Quality Metrics",
"Meta Rater Evaluation Metrics",
"OCR Eval Metric",
"Resume Quality Assessment Metrics",
"Rule-Based RESUME Quality Metrics",
"Text Generation",
]* feat: update auto-gen-metric * 📚 Auto-update metrics documentation --------- Co-authored-by: GitHub Action <[email protected]>
No description provided.