Skip to content

Conversation

@e06084
Copy link
Collaborator

@e06084 e06084 commented Dec 15, 2025

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @e06084, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on a significant refactoring of metric definitions and their corresponding documentation. The primary goal is to standardize metric naming to reflect their underlying LLM-based nature and to improve the overall organization and clarity of the metric catalog. Additionally, new evaluation metrics specifically designed for Retrieval Augmented Generation (RAG) systems have been integrated, enhancing the system's capability to assess advanced language model applications. These changes are supported by updates to the automated documentation generation process, ensuring consistency and maintainability.

Highlights

  • Metric Naming Convention Update: Many existing metrics have been renamed from a Prompt* prefix to an LLM* prefix, standardizing the naming convention for Large Language Model-based metrics across various categories like Pretrain Text Quality Assessment, SFT Data Assessment, Classification, and Meta Rater Evaluation.
  • Refined Metric Categorization: Metric categories have been updated for clarity and better organization. For instance, 'Resume ATS Matching Metrics' and 'Resume ATS Optimization Metrics' are now consolidated under 'Resume Quality Assessment Metrics', and 'Xinghe Data Quality Metrics' are now 'Rule-Based TEXT Quality Metrics'.
  • Introduction of RAG Evaluation Metrics: A new dedicated section for 'RAG Evaluation Metrics' has been added to the documentation, introducing metrics such as LLMRAGAnswerRelevancy, LLMRAGContextPrecision, and LLMRAGFaithfulness to assess Retrieval Augmented Generation systems.
  • Documentation Generation Script Enhancement: The scripts/generate_metrics.py script has been refactored to align with the new metric naming and categorization, ensuring that the metrics.md documentation is automatically generated accurately and includes the new RAG metrics in the correct order.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring of the metrics system, primarily renaming 'Prompt-based' metrics to 'LLM-based' metrics and reorganizing the metric categories for better clarity. The changes are consistently applied across metric definition files, the auto-generation script, and the resulting metrics.md documentation. The script scripts/generate_metrics.py has been updated to reflect this new structure and now uses llm_name_map and standardizes on using class names for metric names in the documentation.

My review focuses on the correctness of the generation script and the consistency of the generated documentation. I've identified a potential data issue in the documentation and a suggestion to improve the maintainability of the generation script. Overall, the changes are well-executed and improve the structure of the metrics.

Comment on lines +95 to +96
| `LLMMinerURecognizeQuality` | LLMMinerURecognizeQuality | Evaluate the quality of mineru recognize | Internal Implementation | [📊 See Results](error_category and error_label) |
| `VLMDocumentParsingOCRTrain` | VLMDocumentParsingOCRTrain | Evaluate the quality of mineru recognize | Internal Implementation | [📊 See Results](error_category and error_label) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The metrics LLMMinerURecognizeQuality and VLMDocumentParsingOCRTrain appear to have identical descriptions ("Evaluate the quality of mineru recognize"). This might be a copy-paste error in the source _metric_info of one of these metrics. Please verify if VLMDocumentParsingOCRTrain should have a more specific description to differentiate it from LLMMinerURecognizeQuality.

Comment on lines +217 to +225
category_order = [
"RAG Evaluation Metrics",
"Pretrain Text Quality Assessment Metrics",
"SFT Data Assessment Metrics",
"Classification Metrics",
"Multimodality Assessment Metrics",
"Rule-Based TEXT Quality Metrics",
"Rule-Based IMG Quality Metrics"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The category_order list only defines the order for a subset of the metric categories. The remaining categories are appended alphabetically, which might lead to an unintended document structure if new categories are added in the future. To ensure a stable and explicit document structure, it's better to include all categories in this list in the desired order.

    category_order = [
        "RAG Evaluation Metrics",
        "Pretrain Text Quality Assessment Metrics",
        "SFT Data Assessment Metrics",
        "Classification Metrics",
        "Multimodality Assessment Metrics",
        "Rule-Based TEXT Quality Metrics",
        "Rule-Based IMG Quality Metrics",
        "Audio Quality Metrics",
        "Meta Rater Evaluation Metrics",
        "OCR Eval Metric",
        "Resume Quality Assessment Metrics",
        "Rule-Based RESUME Quality Metrics",
        "Text Generation",
    ]

@e06084 e06084 merged commit f2ffe4f into MigoXLab:dev Dec 15, 2025
tenwanft pushed a commit to tenwanft/dingo that referenced this pull request Dec 24, 2025
* feat: update auto-gen-metric

* 📚 Auto-update metrics documentation

---------

Co-authored-by: GitHub Action <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants