Skip to content

Datasetsplitter now autoselect suitable KFold splitter based on input…#124

Merged
AKuederle merged 4 commits intomainfrom
better_data_splitter
Jul 24, 2025
Merged

Datasetsplitter now autoselect suitable KFold splitter based on input…#124
AKuederle merged 4 commits intomainfrom
better_data_splitter

Conversation

@AKuederle
Copy link
Member

…s. Closes #121

@AKuederle AKuederle requested a review from Copilot July 24, 2025 10:14

This comment was marked as outdated.

@codecov-commenter
Copy link

codecov-commenter commented Jul 24, 2025

Codecov Report

Attention: Patch coverage is 97.67442% with 1 line in your changes missing coverage. Please review.

Project coverage is 92.60%. Comparing base (0ed4cd5) to head (0366751).

Files with missing lines Patch % Lines
tpcp/validate/_cross_val_helper.py 95.83% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #124      +/-   ##
==========================================
+ Coverage   92.52%   92.60%   +0.07%     
==========================================
  Files          28       28              
  Lines        2435     2473      +38     
==========================================
+ Hits         2253     2290      +37     
- Misses        182      183       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@AKuederle AKuederle requested a review from Copilot July 24, 2025 12:25
@AKuederle AKuederle linked an issue Jul 24, 2025 that may be closed by this pull request
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the DatasetSplitter class to automatically select appropriate cross-validation splitters based on provided parameters and adds validation to ensure proper parameter handling across the codebase. The changes close issue #121 by implementing auto-selection functionality for K-fold splitters.

  • Auto-selection of appropriate splitters (KFold, StratifiedKFold, GroupKFold, StratifiedGroupKFold) based on groupby and stratify parameters
  • Added validation warnings when incompatible splitters are used with grouping/stratification features
  • Enhanced parameter validation in base classes to ensure proper implementation patterns

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tpcp/validate/_cross_val_helper.py Enhanced DatasetSplitter with auto-selection logic and validation warnings
tpcp/_base.py Added strict parameter validation and improved factory handling
tests/test_pipelines/test_validate.py Added comprehensive tests for auto-selection and validation warnings
tests/test_base.py Updated tests for new parameter validation behavior
CHANGELOG.md Documented breaking changes and new features
Comments suppressed due to low confidence (1)

tpcp/validate/_cross_val_helper.py:43

  • [nitpick] The parameter name 'ignore_potentially_invalid_splitter_warning' is very long and could be shortened to something like 'ignore_validation_warning' or 'suppress_warnings' for better readability.
    ignore_potentially_invalid_splitter_warning

@AKuederle AKuederle merged commit 3037404 into main Jul 24, 2025
1 of 5 checks passed
@AKuederle AKuederle deleted the better_data_splitter branch July 24, 2025 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parameter validation seems broken? DataSplitter should recommend compatible CV splitter by default

3 participants