TensorFlow Data Validation 0.25.0
Version 0.25.0
Major Features and Improvements
-
Add support for detecting drift and distribution skew in numeric features.
-
tfdv.validate_statisticsnow also reports the raw measurements of
distribution skew/drift (if any is done), regardless whether skew/drift is
detected. The report is in thedrift_skew_infoof theAnomaliesproto
(return value ofvalidate_statistics). -
From this release TFDV will also be hosting nightly packages on
https://pypi-nightly.tensorflow.org. To install the nightly package use the
following command:pip install -i https://pypi-nightly.tensorflow.org/simple tensorflow-data-validationNote: These nightly packages are unstable and breakages are likely to
happen. The fix could often take a week or more depending on the complexity
involved for the wheels to be available on the PyPI cloud service. You can
always use the stable version of TFDV available on PyPI by running the
commandpip install tensorflow-data-validation.
Bug Fixes and Other Changes
- Added
tfdv.load_stats_binaryto load stats what were written using
tfdv.WriteStatisticsToText(nowtfdv.WriteStatisticsToBinaryFile). - Anomalies previously (un)classified as UKNOWN_TYPE now trigger more specific
anomaly types: DOMAIN_INVALID_FOR_TYPE, UNEXPECTED_DATA_TYPE,
FEATURE_MISSING_NAME, FEATURE_MISSING_TYPE, INVALID_SCHEMA_SPECIFICATION - Fixed a bug that
import tensorflow_data_validationwould fail if IPython
is not installed. IPython is an optional dependency of TFDV. - Depends on
apache-beam[gcp]>=2.25,<3. - Depends on
tensorflow-metadata>=0.25,<0.26. - Depends on
tensorflow-transform>=0.25,<0.26. - Depends on
tfx-bsl>=0.25,<0.26.
Known Issues
- N/A
Breaking Changes
tfdv.WriteStatisticsToTextis renamed as
tfdv.WriteStatisticsToBinaryFile. The former is still available but will
be removed in a future release.
Deprecations
- N/A