v0.7.0

benfred released this 24 Sep 03:45

· 354 commits to main since this release

b55c57c

NVTabular v0.7.0

Improvements

Add column tagging API #943
Export dataset schema when writing out datasets #948
Make dataloaders aware of schema #947
Standardize a Workflows representation of its output columns #372
Add multi-gpu training example using PyTorch Distributed #775
Speed up reading Parquet files from remote storage like GCS or S3 #1119
Add utility to convert TFRecord datasets to Parquet #1085
Add multi-gpu training example using PyTorch Distributed #775
Add multihot support for PyTorch inference #719
Add options to reserve categorical indices in the Categorify() op #1074
Update notebooks to work with CPU only systems #960
Save output from Categorify op in a single table for HugeCTR #946
Add a keyset file for HugeCTR integration #1049

Bug Fixes

Fix category counts written out by the Categorify op #1128
Fix HugeCTR inference example #1130
Fix make_feature_column_workflow bug in Categorify if features have vocabularies of varying size. #1062
Fix TargetEncoding op on CPU only systems #976
Fix writing empty partitions to Parquet files #1097

Assets 2