Skip to content

feat: Enable unit tests for dataset presets#194

Merged
attafosu merged 9 commits intofeat/attafosu/sglang-openai-api-compatibilityfrom
feat/attafosu/dataset-unit-tests
Mar 20, 2026
Merged

feat: Enable unit tests for dataset presets#194
attafosu merged 9 commits intofeat/attafosu/sglang-openai-api-compatibilityfrom
feat/attafosu/dataset-unit-tests

Conversation

@attafosu
Copy link
Copy Markdown
Collaborator

What does this PR do?

Adds unit tests for dataset presets

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

- Add test_dataset_presets.py with 20 test cases for 6 presets across 5 datasets
- Add comprehensive testing guide and schema reference documentation

Tests verify that transforms work correctly without end-to-end runs,
enabling fast regression detection when transform code changes.

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
@attafosu attafosu requested a review from a team as a code owner March 20, 2026 05:38
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 20, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions github-actions bot requested review from arekay-nv and nvzhihanj March 20, 2026 05:39
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a dedicated unit testing framework for dataset presets. Its primary goal is to ensure the reliability and correctness of data transformations applied by these presets, without the overhead of full end-to-end benchmark runs. This enhancement will significantly improve the development workflow by providing immediate feedback on preset changes and reducing the risk of introducing regressions in data preparation logic.

Highlights

  • New Documentation: Added a new documentation file, DATASET_PRESET_TESTING.md, which provides a comprehensive guide on how to run and contribute unit tests for dataset presets.
  • Unit Test Implementation: Implemented extensive unit tests for several dataset presets, including CNNDailyMail, AIME25, GPQA, LiveCodeBench, and OpenOrca, to validate their correct instantiation, data transformation logic, and expected output formats.
  • Slow Test Marking: Introduced the @pytest.mark.slow decorator for tests that depend on heavier libraries like transformers, enabling selective execution and faster test runs when excluding these marked tests.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a great addition, enabling unit tests for dataset presets and providing clear documentation. The tests cover several presets and verify key aspects like instantiation and transform application. My review includes a few suggestions to improve the test suite's efficiency by reducing redundant computations, ensure consistency in marking slow tests, and enhance test coverage for one of the presets. Overall, this is a valuable contribution to the project's test infrastructure.

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
@attafosu attafosu merged commit 89ea457 into feat/attafosu/sglang-openai-api-compatibility Mar 20, 2026
1 check passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant