TensorFlow Data Validation 0.30.0
Major Features and Improvements
-
This version is the last version before TFDV 1.0. Once 1.0, all the TFDV
public APIs (i.e. symbols in the root__init__.py) will be subject to
semantic versioning. We are deprecating some public APIs in this version
and they will be removed in 1.0. -
Sketch-based top-k/unique stats generator now is able to detect invalid
utf-8 sequences / large texts and replace them with a placeholder.
It will not suffer from memory issue usually caused by image / large text
features in the data. Note that this generator is not by default used yet. -
Added
StatsOptions.experimental_use_sketch_based_topk_uniqueswhich
enables the sketch-based top-k/unique stats generator.
Bug Fixes and Other Changes
- Fixed bug in
display_schemathat caused domains not to be displayed. - Modified how
get_schema_dataframeoutputs numeric domains. - Anomalies previously (un)classified as UKNOWN_TYPE now trigger more specific
anomaly types: INVALID_DOMAIN_SPECIFICATION and MULTIPLE_REASONS. - Depends on
tensorflow-metadata>=0.30,<0.31. - Depends on
tfx-bsl>=0.30,<0.31.
Known Issues
- N/A
Breaking Changes
- N/A
Deprecations
tfdv.LiftStatsGeneratoris going to be removed in the next version from
the public API. To enable that generator,
supplyStatsOptions.label_featuretfdv.NonStreamingCustomStatsGeneratoris going to be removed in the next
version from the public API. You may continue to import it from TFDV
but it will not be subject to compatibility guarantees.tfdv.validate_instanceis going to be removed in the next
version from the public API. You may continue to import it from TFDV
but it will not be subject to compatibility guarantees.- Removed
tfdv.DecodeCSV,tfdv.DecodeTFExample(deprecated in 0.27). - Removed
feature_whitelistintfdv.StatsOptions(deprecated in 0.28).
Usefeature_allowlistinstead. tfdv.get_feature_value_sliceris deprecated.
tfdv.experimental_get_feature_value_sliceris introduced as a replacement.
TFDV is likely to have a different slicing functionality post 1.0, which
may not be compatible with the current slicers.StatsOptions.slicing_functionsis deprecated.
StatsOptions.experimental_slicing_functionsis introduced as a
replacement.tfdv.WriteStatisticsToTextis removed (deprecated in 0.25.0).- Parameter
compression_typeintfdv.generate_statistics_from_tfrecord
is deprecated. The compression type is currently automatically determined.