feat: add LLMTextQualityV5 #294

e06084 · 2025-12-17T08:21:53Z

No description provided.

gemini-code-assist · 2025-12-17T08:22:12Z

Summary of Changes

Hello @e06084, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the text quality assessment capabilities by introducing LLMTextQualityV5, a specialized model for evaluating pretraining data for large language models. It refines the criteria for assessing text quality across multiple critical dimensions and provides a robust mechanism for processing and interpreting the evaluation results. The changes also ensure seamless integration with existing workflows through updated examples and comprehensive testing.

Highlights

New LLM Text Quality Model: Introduced LLMTextQualityV5, a new model for evaluating text quality specifically for LLM pretraining data. This model comes with a comprehensive prompt defining four key quality dimensions: Completeness, Effectiveness, Similarity, and Security.
Enhanced Response Processing: The LLMTextQualityV4 class has been updated, and LLMTextQualityV5 includes a new process_response method. This method is responsible for cleaning markdown code blocks from LLM outputs and parsing the JSON response into a structured EvalDetail object, facilitating standardized evaluation results.
Updated Examples and Configuration: The example configurations in examples/dataset/s3.py and examples/llm_and_rule/llm_local.py have been updated. They now utilize environment variables for LLM configuration, specify max_workers and batch_size for the executor, and integrate the new LLMTextQualityV4 and LLMTextQualityV5 evaluators respectively.
Dedicated Test Suite: A new test file, test_text_quality_v5.py, has been added to thoroughly validate the process_response method of LLMTextQualityV5. This includes tests for various scenarios such as good quality text, and specific errors related to completeness, effectiveness, similarity, and security, ensuring robust parsing and evaluation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces LLMTextQualityV5, a new model for text quality assessment with a detailed prompt, and updates LLMTextQualityV4. My review focuses on improving code maintainability and the reusability of the provided examples.

Key feedback points:

There is significant code duplication between LLMTextQualityV4 and LLMTextQualityV5, which should be refactored into a common base class to follow the DRY principle.
The logic for cleaning markdown from LLM responses can be made more robust.
The example scripts have been updated with hardcoded file paths that are likely user-specific, which hinders their reusability. I've suggested using placeholders or paths to generic sample files instead.

Overall, the new V5 model and its detailed prompt are a great addition. Addressing the feedback will improve the long-term quality and maintainability of the codebase.

dingo/model/llm/text_quality/llm_text_quality_v4.py

dingo/model/llm/text_quality/llm_text_quality_v5.py

dingo/model/llm/text_quality/llm_text_quality_v4.py

dingo/model/llm/text_quality/llm_text_quality_v5.py

examples/dataset/s3.py

examples/llm_and_rule/llm_local.py

* feat: add LLMTextQualityV5 * x * x

feat: add LLMTextQualityV5

51ae1f5

gemini-code-assist bot reviewed Dec 17, 2025

View reviewed changes

e06084 added 2 commits December 17, 2025 17:19

x

352f298

x

20acadd

e06084 merged commit 1375e14 into MigoXLab:dev Dec 17, 2025
2 checks passed

tenwanft pushed a commit to tenwanft/dingo that referenced this pull request Dec 24, 2025

feat: add LLMTextQualityV5 (MigoXLab#294)

692ebdd

* feat: add LLMTextQualityV5 * x * x

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add LLMTextQualityV5 #294

feat: add LLMTextQualityV5 #294

Uh oh!

e06084 commented Dec 17, 2025

Uh oh!

gemini-code-assist bot commented Dec 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: add LLMTextQualityV5 #294

feat: add LLMTextQualityV5 #294

Uh oh!

Conversation

e06084 commented Dec 17, 2025

Uh oh!

gemini-code-assist bot commented Dec 17, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant