docs: update documentation and tests to use new evaluator config format #312

Kylie-dot-s · 2025-12-23T08:10:38Z

Update docs to use new evaluator array config structure instead of deprecated prompt_list/rule_list/llm_config
Fix relative path references in documentation (../../ -> proper paths)
Update test files to use new InputArgs format with evaluator and dataset configs
Update example links in eval documentation

gemini-code-assist · 2025-12-23T08:11:12Z

Summary of Changes

Hello @Kylie-dot-s, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant overhaul to the evaluator configuration system, moving from a fragmented approach to a more unified and flexible array-based structure. This change streamlines how evaluation pipelines are defined, allowing for clearer specification of data field mappings and the sequential application of multiple evaluators (both rule-based and LLM-based). The update also includes comprehensive documentation and test file adjustments to ensure consistency and ease of use with the new configuration paradigm.

Highlights

Unified Evaluator Configuration: The configuration for evaluators has been refactored to use a new array-based structure. This replaces the deprecated prompt_list, rule_list, and llm_config parameters with a flexible evaluator array, where each entry defines an EvalPipline with fields mapping and a list of evals.
Documentation Updates: All relevant documentation files, including config.md, document_ocr.md, hallucination_guide.md, and others, have been updated to reflect the new evaluator configuration format and corrected relative path references.
Test File Modernization: Existing test files, particularly those related to HuggingFace datasets and S3 data sources, have been updated to align with the new InputArgs format, incorporating the dataset and evaluator configurations.
Simplified LLM Evaluator Integration: LLM evaluators now embed their prompts directly, removing the need for a separate prompt/ directory and simplifying the overall model architecture and configuration.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request significantly refactors the configuration and usage of evaluators within the system, primarily by deprecating the dataset.field, executor.eval_group, executor.rule_list, and executor.prompt_list parameters. The core change introduces a new evaluator configuration, which is now an array of EvalPipline objects. Each EvalPipline defines fields for data mapping and an evals array containing individual evaluator configurations (with name and config parameters for both rule-based and LLM-based evaluators).

Documentation files (docs/config.md, docs/document_ocr.md, docs/eval/*, docs/factcheck_guide.md, docs/hallucination_guide.md, docs/html_extract_compare_v2.md, docs/image_lable_check_guide.md, docs/image_quality_check_guide.md, docs/layout_quality_guide.md, docs/posts/zhihu.md, docs/technical/*) have been updated to reflect this new structure, including revised configuration examples, Python SDK usage, and explanations of the new evaluator array. Specifically, prompt related concepts and parameters have been largely replaced or integrated into the new LLM evaluator structure, and LLM evaluators now explicitly require key and api_url in their config.

Several review comments highlight the need to remove deprecated eval_group parameters from example configurations in various documentation files to prevent user confusion and ensure consistency with the new configuration schema. Additionally, one comment suggests completing a partial JSON example in docs/config.md for better user clarity.

docs/eval/rule/slimpajama_data_evaluated_by_rule.md

docs/config.md

docs/eval/prompt/kaoti_data_evaluated_by_prompt.md

docs/eval/prompt/multi_language_data_evaluated_by_prompt.md

docs/eval/prompt/redpajama_data_evaluated_by_prompt.md

docs/eval/rule/slimpajama_data_evaluated_by_rule.md

test/scripts/data/dataset/test_hf_dataset.py

test/scripts/data/datasource/test_s3.py

test/scripts/data/datasource/test_hf_datasource.py

- Update docs to use new evaluator array config structure instead of deprecated prompt_list/rule_list/llm_config - Fix relative path references in documentation (../../ -> proper paths) - Update test files to use new InputArgs format with evaluator and dataset configs - Update example links in eval documentation

gemini-code-assist bot reviewed Dec 23, 2025

View reviewed changes

shijinpjlab reviewed Dec 23, 2025

View reviewed changes

Kylie-dot-s force-pushed the fix/docs-update-new-config-format branch from 2808691 to 7f37164 Compare December 23, 2025 08:27

Kylie-dot-s force-pushed the fix/docs-update-new-config-format branch from 7f37164 to 822e68f Compare December 23, 2025 08:44

Kylie-dot-s and others added 4 commits December 23, 2025 18:31

fix: update score.py to use LLMTextQualityV5 instead of deleted module

d7072cf

update: fix tests and examples

d842a6c

🎨 Auto-format code with pre-commit

00f9b39

fix: resolve flake8 unused variable error

160beee

shijinpjlab closed this Dec 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: update documentation and tests to use new evaluator config format #312

docs: update documentation and tests to use new evaluator config format #312

Uh oh!

Kylie-dot-s commented Dec 23, 2025

Uh oh!

gemini-code-assist bot commented Dec 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

docs: update documentation and tests to use new evaluator config format #312

docs: update documentation and tests to use new evaluator config format #312

Uh oh!

Conversation

Kylie-dot-s commented Dec 23, 2025

Uh oh!

gemini-code-assist bot commented Dec 23, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants