Skip to content

Conversation

@jakechirsch
Copy link

This PR adds a new threshold argument to to_datetime that allows users to specify the minimum fraction of valid datetime components required for parsing to succeed. For this feature, "successful" parsing means that the function returns either a valid Timestamp or NaT (i.e., it does not raise an exception). The threshold determines whether partially-invalid values produce NaT or raise an error. This enables more flexible and robust parsing behavior for partially-invalid dates while preserving strict behavior by default (threshold=1.0).

Summary of changes

  • Added threshold argument to to_datetime.
  • Implemented validation logic and clamping of threshold values to [0.0, 1.0].
  • Updated parsing internals to compute the fraction of valid components.
  • Added tests for valid, invalid, and boundary threshold behavior.
  • Added documentation: explanation, parameter description, and example.
  • Added type annotations across all new argument signatures.
  • Ensured all code checks and pre-commit hooks pass.

@jakechirsch jakechirsch marked this pull request as draft November 21, 2025 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: Make pd.to_datetime with format parameter more robust to dirty data

1 participant