You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently building a local SQLite database for MIMIC-IV v3.1 using the official import.py script from the buildmimic/sqlite folder.
The script fails with an OutOfBoundsDatetime error when processing the pharmacy.csv file. The traceback points to a specific timestamp value of 5117-07-25, which is outside the valid range for Pandas' to_datetime function.
I've read the official documentation on "Date shifting," which states that dates are "randomly distributed in the future" for anonymization purposes. A date nearly 3,000 years in the future seems to be an extreme outlier, and I'm wondering if it's a known data artifact or a placeholder value (e.g., for an "indefinite" event).
I have a few questions for the community:
Has anyone else encountered this specific 5117 year value in pharmacy.csv?
Does a value like this fall within the expected range for the standard date-shifting process, or is it indeed a placeholder?
What is the recommended best practice for handling this? Is modifying the import.py script to use errors='coerce' (converting the value to NaT/NULL) the standard approach, or is there a better way to preserve the original information?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm currently building a local SQLite database for MIMIC-IV v3.1 using the official import.py script from the buildmimic/sqlite folder.
The script fails with an OutOfBoundsDatetime error when processing the pharmacy.csv file. The traceback points to a specific timestamp value of 5117-07-25, which is outside the valid range for Pandas' to_datetime function.
I've read the official documentation on "Date shifting," which states that dates are "randomly distributed in the future" for anonymization purposes. A date nearly 3,000 years in the future seems to be an extreme outlier, and I'm wondering if it's a known data artifact or a placeholder value (e.g., for an "indefinite" event).
I have a few questions for the community:
Has anyone else encountered this specific 5117 year value in pharmacy.csv?
Does a value like this fall within the expected range for the standard date-shifting process, or is it indeed a placeholder?
What is the recommended best practice for handling this? Is modifying the import.py script to use errors='coerce' (converting the value to NaT/NULL) the standard approach, or is there a better way to preserve the original information?
Thank you for any insights or suggestions!
Beta Was this translation helpful? Give feedback.
All reactions