|
57 | 57 | ## Deprecations |
58 | 58 |
|
59 | 59 | * Deprecating Py2 support. |
60 | | - |
61 | | -# Release 0.21.5 |
62 | | - |
63 | | -## Major Features and Improvements |
64 | | - |
65 | | -* Add `label_feature` to `StatsOptions` and enable `LiftStatsGenerator` when |
66 | | - `label_feature` and `schema` are provided. |
67 | | -* Add JSON serialization support for StatsOptions. |
68 | | - |
69 | | -## Bug Fixes and Other Changes |
70 | | -* Only requires `avro-python3>=1.8.1,!=1.9.2.*,<2.0.0` on Python 3.5 + MacOS |
71 | | - |
72 | | -## Breaking Changes |
73 | | - |
74 | | -## Deprecations |
75 | | - |
76 | | -# Release 0.21.4 |
77 | | - |
78 | | -## Major Features and Improvements |
79 | | - |
80 | | -* Support visualizing feature value lift in facets visualization. |
81 | | - |
82 | | -## Bug Fixes and Other Changes |
83 | | - |
84 | | -* Fix issue writing out string feature values in LiftStatsGenerator. |
85 | | -* Requires 'apache-beam[gcp]>=2.17,<3'. |
86 | | -* Requires 'tensorflow-transform>=0.21.1,<0.22'. |
87 | | -* Requires 'tfx-bsl>=0.21.3,<0.22'. |
88 | | - |
89 | | -## Breaking Changes |
90 | | - |
91 | | -## Deprecations |
92 | | - |
93 | | -# Release 0.21.2 |
94 | | - |
95 | | -## Major Features and Improvements |
96 | | - |
97 | | -## Bug Fixes and Other Changes |
98 | | - |
99 | | -* Fix facets visualization. |
100 | | -* Optimize LiftStatsGenerator for string features. |
101 | | -* Make `_WeightedCounter` serializable. |
102 | | -* Add support computing for weighted examples in LiftStatsGenerator. |
103 | | - |
104 | | -## Breaking Changes |
105 | | - |
106 | | -## Deprecations |
107 | | - |
108 | | -* `tfdv.TFExampleDecoder` has been removed. This legacy decoder converts |
109 | | - serialized `tf.Example` to a dict of numpy arrays, which is the legacy |
110 | | - input format (prior to Apache Arrow). TFDV has stopped accepting that format |
111 | | - since 0.14. Use `tfdv.DecodeTFExample` instead. |
112 | | - |
113 | | -# Release 0.21.1 |
114 | | - |
115 | | -## Major Features and Improvements |
116 | | - |
117 | | -## Bug Fixes and Other Changes |
118 | | -* Do validation on weighted feature stats. |
119 | | -* During schema inference, skip features which are missing common stats. This |
120 | | - makes schema inference work when the input stats are generated from some |
121 | | - pre-existing, unknown schema. |
122 | | -* Fix facets visualization in Chrome >=M80. |
123 | | - |
124 | | -## Known Issues |
125 | | - |
126 | | -* Running TFDV with Apache Beam 2.18 or 2.19 does not work on Windows. If you |
127 | | - are using TFDV on Windows, use Apache Beam 2.17. |
128 | | - |
129 | | -## Breaking Changes |
130 | | - |
131 | | -## Deprecations |
132 | | - |
133 | | -# Release 0.21.0 |
134 | | - |
135 | | -## Major Features and Improvements |
136 | | - |
137 | | -* Started depending on the CSV parsing / type inferring utilities provided |
138 | | - by `tfx-bsl` (since tfx-bsl 0.15.2). This also brings performance improvements |
139 | | - to the CSV decoder (~2x faster in decoding. Type inferring performance is not |
140 | | - affected). |
141 | | -* Compute bytes statistics for features of BYTES type. Avoid computing topk and |
142 | | - uniques for such features. |
143 | | -* Added LiftStatsGenerator which computes lift between one feature (typically a |
144 | | - label) and all other categorical features. |
145 | | - |
146 | | -## Bug Fixes and Other Changes |
147 | | - |
148 | | -* Exclude examples in which the entire sparse feature is missing when |
149 | | - calculating sparse feature statistics. |
150 | | -* Validate min_examples_count dataset constraint. |
151 | | -* Document the schema fields, statistics fields, and detection condition for |
152 | | - each anomaly type that TFDV detects. |
153 | | -* Handle null array in cross feature stats generator, top-k & uniques combiner |
154 | | - stats generator, and sklearn mutual information generator. |
155 | | -* Handle infinity in basic stats generator. |
156 | | -* Set num_missing and num_examples correctly in the presence of sparse |
157 | | - features. |
158 | | -* Compute weighted feature stats for all weighted features declared in schema. |
159 | | -* Enforce that mutual information is non-negative. |
160 | | -* Depends on `tensorflow-metadata>=0.21.0,<0.22`. |
161 | | -* Depends on `pyarrow>=0.15` (removed the upper bound as it is determined by |
162 | | - `tfx-bsl`). |
163 | | -* Depends on `tfx-bsl>=0.21.0,<0.22` |
164 | | -* Depends on `apache-beam>=2.17,<3` |
165 | | -* Validate that float feature does not contain NaNs (if disallow_nan is True). |
166 | | - |
167 | | -## Breaking Changes |
168 | | - |
169 | | -* Changed the behavior regarding to statistics over CSV data: |
170 | | - |
171 | | - - Previously, if a CSV column was mixed with integers and empty strings, FLOAT |
172 | | - statistics will be collected for that column. A change was made so INT |
173 | | - statistics would be collected instead. |
174 | | - |
175 | | -* Removed `csv_decoder.DecodeCSVToDict` as `Dict[str, np.ndarray]` had no longer |
176 | | - been the internal data representation any more since 0.14. |
177 | | - |
178 | | -## Deprecations |
179 | | - |
180 | | -# Release 0.15.0 |
181 | | - |
182 | | -## Major Features and Improvements |
183 | | - |
184 | | -* Generate statistics for sparse features. |
185 | | -* Directly convert a batch of tf.Examples to Arrow tables. Avoids conversion of |
186 | | - tf.Example to intermediate Dict representation. |
187 | | - |
188 | | -## Bug Fixes and Other Changes |
189 | | - |
190 | | -* Generate statistics for the weight feature. |
191 | | -* Support validation and schema inference from sliced statistics that include |
192 | | - the default slice (validation/inference will be done using the default slice |
193 | | - statistics). |
194 | | -* Avoid flattening null arrays. |
195 | | -* Set `weighted_num_examples` field in the statistics proto if a weight |
196 | | - feature is specified. |
197 | | -* Replace DecodedExamplesToTable with a Python implementation. |
198 | | -* Building TFDV from source does not need pyarrow anymore. |
199 | | -* Depends on `apache-beam[gcp]>=2.16,<3`. |
200 | | -* Depends on `six>=1.12,<2`. |
201 | | -* Depends on `scikit-learn>=0.18,<0.22`. |
202 | | -* Depends on `tfx-bsl>=0.15,<0.16`. |
203 | | -* Depends on `tensorflow-metadata>=0.15,<0.16`. |
204 | | -* Depends on `tensorflow-transform>=0.15,<0.16`. |
205 | | -* Depends on `tensorflow>=1.15,<3`. |
206 | | - * Starting from 1.15, package |
207 | | - `tensorflow` comes with GPU support. Users won't need to choose between |
208 | | - `tensorflow` and `tensorflow-gpu`. |
209 | | - * Caveat: `tensorflow` 2.0.0 is an exception and does not have GPU |
210 | | - support. If `tensorflow-gpu` 2.0.0 is installed before installing |
211 | | - `tensorflow-data-validation`, it will be replaced with `tensorflow` 2.0.0. |
212 | | - Re-install `tensorflow-gpu` 2.0.0 if needed. |
213 | | - |
214 | | -## Breaking Changes |
215 | | - |
216 | | -## Deprecations |
217 | | - |
218 | | -# Release 0.14.1 |
219 | | - |
220 | | -## Major Features and Improvements |
221 | | - |
222 | | -* Add support for custom schema transformations when inferring schema. |
223 | | - |
224 | | -## Bug Fixes and Other Changes |
225 | | - |
226 | | -* Fix incorrect file hashes in the TFDV wheel. |
227 | | -* Fix DOMException when embedding visualization in iframe. |
228 | | - |
229 | | -## Breaking Changes |
230 | | - |
231 | | -## Deprecations |
232 | | - |
233 | | -# Release 0.14.0 |
234 | | - |
235 | | -## Major Features and Improvements |
236 | | - |
237 | | -* Performance improvement due to optimizing inner loops. |
238 | | -* Add support for time semantic domain related statistics. |
239 | | -* Performance improvement due to batching accumulators before merging. |
240 | | -* Add utility method `validate_examples_in_tfrecord`, which identifies anomalous |
241 | | - examples in TFRecord files containing TFExamples and generates statistics for |
242 | | - those anomalous examples. |
243 | | -* Add utility method `validate_examples_in_csv`, which identifies anomalous |
244 | | - examples in CSV files and generates statistics for those anomalous examples. |
245 | | -* Add fast TF example decoder written in C++. |
246 | | -* Make `BasicStatsGenerator` to take arrow table as input. Example batches are |
247 | | - converted to Apache Arrow tables internally and we are able to make use of |
248 | | - vectorized numpy functions. Improved performance of BasicStatsGenerator |
249 | | - by ~40x. |
250 | | -* Make `TopKUniquesStatsGenerator` and `TopKUniquesCombinerStatsGenerator` to |
251 | | - take arrow table as input. |
252 | | -* Add `update_schema` API which updates the schema to conform to statistics. |
253 | | -* Add support for validating changes in the number of examples between the |
254 | | - current and previous spans of data (using the existing `validate_statistics` |
255 | | - function). |
256 | | -* Support building a manylinux2010 compliant wheel in docker. |
257 | | -* Add support for cross feature statistics. |
258 | | - |
259 | | -## Bug Fixes and Other Changes |
260 | | - |
261 | | -* Expand unit test coverage. |
262 | | -* Update natural language stats generator to generate stats if actual ratio |
263 | | - equals `match_ratio`. |
264 | | -* Use `__slots__` in accumulators. |
265 | | -* Fix overflow warning when generating numeric stats for large integers. |
266 | | -* Set max value count in schema when the feature has same valency, thereby |
267 | | - inferring shape for multivalent required features. |
268 | | -* Fix divide by zero error in natural language stats generator. |
269 | | -* Add `load_anomalies_text` and `write_anomalies_text` utility functions. |
270 | | -* Define ReasonFeatureNeeded proto. |
271 | | -* Add support for Windows OS. |
272 | | -* Make semantic domain stats generators to take arrow column as input. |
273 | | -* Fix error in number of missing examples and total number of examples |
274 | | - computation. |
275 | | -* Make FeaturesNeeded serializable. |
276 | | -* Fix memory leak in fast example decoder. |
277 | | -* Add `semantic_domain_stats_sample_rate` option to compute semantic domain |
278 | | - statistics over a sample. |
279 | | -* Increment refcount of None in fast example decoder. |
280 | | -* Add `compression_type` option to `generate_statistics_from_*` methods. |
281 | | -* Add link to SysML paper describing some technical details behind TFDV. |
282 | | -* Add Python types to the source code. |
283 | | -* Make`GenerateStatistics` generate a DatasetFeatureStatisticsList containing a |
284 | | - dataset with num_examples == 0 instead of an empty proto if there are no |
285 | | - examples in the input. |
286 | | -* Depends on `absl-py>=0.7,<1` |
287 | | -* Depends on `apache-beam[gcp]>=2.14,<3` |
288 | | -* Depends on `numpy>=1.16,<2`. |
289 | | -* Depends on `pandas>=0.24,<1`. |
290 | | -* Depends on `pyarrow>=0.14.0,<0.15.0`. |
291 | | -* Depends on `scikit-learn>=0.18,<0.21`. |
292 | | -* Depends on `tensorflow-metadata>=0.14,<0.15`. |
293 | | -* Depends on `tensorflow-transform>=0.14,<0.15`. |
294 | | - |
295 | | -## Breaking Changes |
296 | | - |
297 | | -* Change `examples_threshold` to `values_threshold` and update documentation to |
298 | | - clarify that counts are of values in semantic domain stats generators. |
299 | | -* Refactor IdentifyAnomalousExamples to remove sampling and output |
300 | | - (anomaly reason, example) tuples. |
301 | | -* Rename `anomaly_proto` parameter in anomalies utilities to `anomalies` to |
302 | | - make it more consistent with proto and schema utilities. |
303 | | -* `FeatureNameStatistics` produced by `GenerateStatistics` is now identified |
304 | | - by its `.path` field instead of the `.name` field. For example: |
305 | | - |
306 | | - ``` |
307 | | - feature { |
308 | | - name: "my_feature" |
309 | | - } |
310 | | - ``` |
311 | | - becomes: |
312 | | - |
313 | | - ``` |
314 | | - feature { |
315 | | - path { |
316 | | - step: "my_feature" |
317 | | - } |
318 | | - } |
319 | | - ``` |
320 | | -* Change `validate_instance` API to accept an Arrow table instead of a Dict. |
321 | | -* Change `GenerateStatistics` API to accept Arrow tables as input. |
322 | | - |
323 | | -## Deprecations |
324 | | - |
325 | | -# Release 0.13.1 |
326 | | - |
327 | | -## Major Features and Improvements |
328 | | - |
329 | | -## Bug Fixes and Other Changes |
330 | | - |
331 | | -* Modify validation logic to raise `SCHEMA_MISSING_COLUMN` anomaly when |
332 | | - observing a feature with no stats (was still broken, now fixed). |
333 | | - |
334 | | -## Breaking Changes |
335 | | - |
336 | | -## Deprecations |
337 | | - |
338 | | -# Release 0.13.0 |
339 | | - |
340 | | -## Major Features and Improvements |
341 | | - |
342 | | -* Use joblib to exploit multiprocessing when computing statistics over a pandas |
343 | | - dataframe. |
344 | | -* Add support for semantic domain related statistics (natural language, image), |
345 | | - enabled by `StatsOptions.enable_semantic_domain_stats`. |
346 | | -* Python 3.5 is supported. |
347 | | - |
348 | | -## Bug Fixes and Other Changes |
349 | | - |
350 | | -* Expand unit test coverage. |
351 | | -* Modify validation logic to raise `SCHEMA_MISSING_COLUMN` anomaly when |
352 | | - observing a feature with no stats. |
353 | | -* Add utility functions `write_stats_text` and `load_stats_text` to write and |
354 | | - load DatasetFeatureStatisticsList protos. |
355 | | -* Avoid using multiprocessing by default when generating statistics over a |
356 | | - dataframe. |
357 | | -* Depends on `joblib>=0.12,<1`. |
358 | | -* Depends on `tensorflow-transform>=0.13,<0.14`. |
359 | | -* Depends on `tensorflow-metadata>=0.12.1,<0.14`. |
360 | | -* Requires pre-installed `tensorflow>=1.13.1,<2`. |
361 | | -* Depends on `apache-beam[gcp]>=2.11,<3`. |
362 | | -* Depends on `absl>=0.1.6,<1`. |
363 | | - |
364 | | -## Breaking Changes |
365 | | - |
366 | | -## Deprecations |
367 | | - |
368 | | -# Release 0.12.0 |
369 | | - |
370 | | -## Major Features and Improvements |
371 | | - |
372 | | -* Add support for computing statistics over slices of data. |
373 | | -* Performance improvement due to optimizing inner loops. |
374 | | -* Add support for generating statistics from a pandas dataframe. |
375 | | -* Performance improvement due to pre-allocating tf.Example in |
376 | | - TFExampleDecoder. |
377 | | -* Performance improvement due to merging common stats generator, numeric stats |
378 | | - generator and string stats generator as a single basic stats generator. |
379 | | -* Performance improvement due to merging top-k and uniques generators. |
380 | | -* Add a `validate_instance` function, which checks a single example for |
381 | | - anomalies. |
382 | | -* Add a utility method `get_statistics_html`, which returns HTML that can be |
383 | | - used for Facets visualization outside of a notebook. |
384 | | -* Add support for schema inference of semantic domains. |
385 | | -* Performance improvement on statistics computation over a pandas dataframe. |
386 | | - |
387 | | -## Bug Fixes and Other Changes |
388 | | - |
389 | | -* Use constant '__BYTES_VALUE__' in the statistics proto to represent a bytes |
390 | | - value which cannot be decoded as a utf-8 string. |
391 | | -* Introduced CombinerFeatureStatsGenerator, a specialized interface for |
392 | | - combiners that do not require cross-feature computations. |
393 | | -* Expand unit test coverage. |
394 | | -* Add optional frequency threshold that allows keeping only the most frequent |
395 | | - values that are present in a minimum number of examples. |
396 | | -* Add optional desired batch size that allows specification of the number of |
397 | | - examples to include in each batch. |
398 | | -* Depends on `numpy>=1.14.5,<2`. |
399 | | -* Depends on `protobuf>=3.6.1,<4`. |
400 | | -* Depends on `apache-beam[gcp]>=2.10,<3`. |
401 | | -* Depends on `tensorflow-metadata>=0.12.1,<0.13`. |
402 | | -* Depends on `scikit-learn>=0.18,<1`. |
403 | | -* Depends on `IPython>=5.0`. |
404 | | -* Requires pre-installed `tensorflow>=1.12,<2`. |
405 | | -* Revise example notebook and update it to be able to run in Colab and Jupyter. |
406 | | - |
407 | | -## Breaking changes |
408 | | -* Represent batch as a list of ndarrays instead of ndarrays of ndarrays. |
409 | | -* Modify decoders to return ndarrays of type numpy.float32 for FLOAT features. |
410 | | - |
411 | | -## Deprecations |
412 | | - |
413 | | -# Release 0.11.0 |
414 | | - |
415 | | -## Major Features and Improvements |
416 | | - |
417 | | -* Add option to infer feature types from schema when generating statistics over |
418 | | - CSV data. |
419 | | -* Add utility method `set_domain` to set the domain of a feature in the schema. |
420 | | -* Add option to compute weighted statistics by providing a weight feature. |
421 | | -* Add a PTransform for decoding TF examples. |
422 | | -* Add utility methods `write_schema_text` and `load_schema_text` to write and |
423 | | - load the schema protocol buffer. |
424 | | -* Add option to compute statistics over a sample. |
425 | | -* Optimize performance of statistics computation (~2x improvement on benchmark |
426 | | - datasets). |
427 | | - |
428 | | -## Bug Fixes and Other Changes |
429 | | - |
430 | | -* Depends on `apache-beam[gcp]>=2.8,<3`. |
431 | | -* Depends on `tensorflow-transform>=0.11,<0.12`. |
432 | | -* Depends on `tensorflow-metadata>=0.9,<0.10`. |
433 | | -* Fix bug in clearing oneof domain\_info field in Feature proto. |
434 | | -* Fix overflow error for large integers by casting them to STRING type. |
435 | | -* Added API docs. |
436 | | - |
437 | | -## Breaking changes |
438 | | - |
439 | | -* Requires pre-installed `tensorflow>=1.11,<2`. |
440 | | -* Make tf.Example decoder to represent a feature with no value list as a |
441 | | - missing value (None). |
442 | | -* Make StatsOptions as a class. |
443 | | - |
444 | | -## Deprecations |
445 | | - |
446 | | -# Release 0.9.0 |
447 | | - |
448 | | -* Initial release of TensorFlow Data Validation. |
0 commit comments