Skip to content

Add data validation gate and reporting#61

Merged
AKKI0511 merged 2 commits intomainfrom
conduct-deep-analysis-of-quanttradeai
Dec 17, 2025
Merged

Add data validation gate and reporting#61
AKKI0511 merged 2 commits intomainfrom
conduct-deep-analysis-of-quanttradeai

Conversation

@AKKI0511
Copy link
Owner

Summary

  • add reusable validation/reporting helper and enforce data checks across training, evaluation, and model backtests
  • extend CLI commands with --skip-validation flag and emit validation reports in experiment/backtest folders
  • add coverage for validation gating behavior and refresh documentation for the new workflow

Testing

  • poetry run pytest
  • poetry run pre-commit run --all-files

Codex Task

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 371 to 375
nan_ratio_by_column = {
col: float(df[col].isnull().mean()) for col in df.columns
}
max_nan_ratio = max(nan_ratio_by_column.values(), default=0.0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow sparse optional columns in validation

Validation now computes nan_ratio_by_column across every column and then fails if the maximum exceeds 1%, which includes optional columns such as the news text column added in _attach_news. When news ingestion is enabled, that column is expected to be mostly null (no headline every bar), so max_nan_ratio will routinely exceed the 1% threshold and _validate_or_raise will halt training/backtesting even though the required OHLCV data is present. Validation should ignore optional/sparse fields or limit the NaN check to required price columns to avoid blocking any configuration that enables news ingestion.

Useful? React with 👍 / 👎.

@AKKI0511 AKKI0511 merged commit 793fec7 into main Dec 17, 2025
1 check passed
@AKKI0511 AKKI0511 deleted the conduct-deep-analysis-of-quanttradeai branch December 17, 2025 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant