Skip to content

Conversation

@helloml0326
Copy link
Collaborator

  • Update Train Judge Models card (remove WIP badge, add link)
  • Update Generate Rubrics card to cover both Zero-Shot and Data-Driven approaches
  • Add LangSmith and Langfuse integration cards (remove WIP badges)
  • Add Zero-Shot Evaluation to Quick Tutorials section
  • Reduce card header font size for better text fitting
  • Update Training Frameworks description

OpenJudge Version

[The version of OpenJudge you are working on, e.g. import openjudge; print(openjudge.__version__)]

Description

[Please describe the background, purpose, changes made, and how to test this PR]

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has been formatted with pre-commit run --all-files command
  • All tests are passing
  • Docstrings are in Google style
  • Related documentation has been updated (e.g. links, examples, etc.)
  • Code is ready for review

- Update Train Judge Models card (remove WIP badge, add link)
- Update Generate Rubrics card to cover both Zero-Shot and Data-Driven approaches
- Add LangSmith and Langfuse integration cards (remove WIP badges)
- Add Zero-Shot Evaluation to Quick Tutorials section
- Reduce card header font size for better text fitting
- Update Training Frameworks description
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @helloml0326, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on updating the project's documentation, specifically the main index page and README files, to accurately reflect the current state of features and integrations. The changes aim to provide users with up-to-date information on grader building methods, including the newly available judge model training and advanced rubric generation techniques. Additionally, it highlights the readiness of key observability platform integrations, enhancing clarity and improving the overall user experience by presenting a more current and comprehensive overview of the project's capabilities.

Highlights

  • Judge Model Training: The 'Train Judge Models' feature is now fully available, removing its 'Coming Soon' status and providing a direct link to its documentation.
  • Rubric Generation Enhancements: The 'Generate Rubrics' functionality has been expanded to explicitly cover both Zero-Shot and Data-Driven approaches, each with dedicated descriptions and links.
  • Observability Platform Integrations: Official integration cards for LangSmith and Langfuse have been added, indicating their availability and removing previous 'Work in Progress' badges.
  • New Quick Tutorial: A new 'Zero-Shot Evaluation' quick tutorial has been introduced to guide users on comparing models without test data.
  • Documentation UI/UX Improvements: Card header font sizes have been reduced and card minimum widths adjusted for better text fitting and overall presentation on the index page.
  • Training Frameworks Description Update: The description for training frameworks has been updated to clarify their use for fine-tuning models.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a comprehensive update to the documentation, reflecting new features such as Zero-Shot Evaluation, LangSmith/Langfuse integrations, and refined grader building methods. The changes across README.md, README_zh.md, and docs/index.md are clear and align well with the stated goals. The accompanying CSS adjustments for the documentation site also improve the visual presentation. Overall, this is a solid update that enhances the project's documentation. I've included a few minor suggestions to further improve formatting and consistency.

* **Zero-shot Rubrics Generation:** Not sure what criteria to use, and no labeled data yet? Just provide a task description and optional sample queries—the LLM will automatically generate evaluation rubrics for you. Ideal for rapid prototyping when you want to get started immediately. 👉 [Zero-shot Rubrics Generation Guide](https://modelscope.github.io/OpenJudge/building_graders/generate_rubrics_as_graders/#simple-rubric-zero-shot-generation)
* **Data-driven Rubrics Generation:** Ambiguous requirements, but have few examples? Use the GraderGenerator to automatically
summarize evaluation Rubrics from your annotated data, and generate a llm-based grader. 👉 [Data-driven Rubrics Generation Guide](https://modelscope.github.io/OpenJudge/building_graders/generate_rubrics_as_graders/#iterative-rubric-data-driven-generation)
* **Training Judge Models:** Massive data and need peak performance? Use our training pipeline to train a dedicated Judge Model. This is ideal for complex scenarios where prompt-based grading falls short.👉 [Train Judge Models](https://modelscope.github.io/OpenJudge/building_graders/training_judge_models/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and consistency, it's good practice to add a space before the arrow emoji (👉).

Suggested change
* **Training Judge Models:** Massive data and need peak performance? Use our training pipeline to train a dedicated Judge Model. This is ideal for complex scenarios where prompt-based grading falls short.👉 [Train Judge Models](https://modelscope.github.io/OpenJudge/building_graders/training_judge_models/)
* **Training Judge Models:** Massive data and need peak performance? Use our training pipeline to train a dedicated Judge Model. This is ideal for complex scenarios where prompt-based grading falls short. 👉 [Train Judge Models](https://modelscope.github.io/OpenJudge/building_graders/training_judge_models/)

* **自定义:** 需求明确但没有现成的评分器?如果您有明确的规则或逻辑,使用我们的 Python 接口或 Prompt 模板快速定义您自己的评分器。👉 [自定义评分器开发指南](https://modelscope.github.io/OpenJudge/building_graders/create_custom_graders/)
* **零样本评估标准生成:** 不确定使用什么标准,也没有标注数据?只需提供任务描述和可选的示例查询,LLM 将自动为您生成评估标准。非常适合快速原型开发。👉 [零样本评估标准生成指南](https://modelscope.github.io/OpenJudge/building_graders/generate_rubrics_as_graders/#simple-rubric-zero-shot-generation)
* **数据驱动的评估标准生成:** 需求模糊但有少量样例?使用 GraderGenerator 从您的标注数据中自动总结评估标准,并生成基于 LLM 的评分器。👉 [数据驱动评估标准生成指南](https://modelscope.github.io/OpenJudge/building_graders/generate_rubrics_as_graders/#iterative-rubric-data-driven-generation)
* **训练评判模型:** 拥有大量数据且需要极致性能?使用我们的训练流程来训练专用的评判模型。适用于基于 Prompt 的评分无法满足的复杂场景。👉 [训练评判模型](https://modelscope.github.io/OpenJudge/building_graders/training_judge_models/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and consistency, it's good practice to add a space before the arrow emoji (👉).

Suggested change
* **训练评判模型:** 拥有大量数据且需要极致性能?使用我们的训练流程来训练专用的评判模型。适用于基于 Prompt 的评分无法满足的复杂场景。👉 [训练评判模型](https://modelscope.github.io/OpenJudge/building_graders/training_judge_models/)
* **训练评判模型:** 拥有大量数据且需要极致性能?使用我们的训练流程来训练专用的评判模型。适用于基于 Prompt 的评分无法满足的复杂场景。 👉 [训练评判模型](https://modelscope.github.io/OpenJudge/building_graders/training_judge_models/)

### 🔌 轻松集成

我们正在积极构建与主流可观测性平台和训练框架的无缝连接器。敬请期待!→ 查看 [集成](#-集成)
如果您正在使用主流可观测性平台(如 **LangSmith****Langfuse**),我们提供无缝集成方案,可增强平台的评测器和自动评测能力。我们也正在构建与训练框架(如 **verl**)的集成方案。👉 查看 [集成](#-集成) 了解详情
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and consistency, it's good practice to add a space before the arrow emoji (👉).

Suggested change
如果您正在使用主流可观测性平台(如 **LangSmith****Langfuse**),我们提供无缝集成方案,可增强平台的评测器和自动评测能力。我们也正在构建与训练框架(如 **verl**)的集成方案。👉 查看 [集成](#-集成) 了解详情
如果您正在使用主流可观测性平台(如 **LangSmith****Langfuse**),我们提供无缝集成方案,可增强平台的评测器和自动评测能力。我们也正在构建与训练框架(如 **verl**)的集成方案。 👉 查看 [集成](#-集成) 了解详情

Comment on lines +67 to +68


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These empty lines appear to be unnecessary and can be removed for better code cleanliness.

@XiaoBoAI XiaoBoAI merged commit 2f6beee into modelscope:main Jan 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants