Skip to content

Commit 4aa203e

Browse files
tomvdwThe TensorFlow Datasets Authors
authored andcommitted
Log the number of examples assigned to each shard in BeamWriter
This helps debugging whether the data is distributed correctly PiperOrigin-RevId: 704190477
1 parent c1e2f9e commit 4aa203e

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

tensorflow_datasets/core/writer.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -612,6 +612,7 @@ def _assign_shard(
612612
hkey=key, num_buckets=num_shards, max_hkey=largest_key
613613
)
614614
self._get_distribution(name="ShardDistribution").update(shard_number)
615+
self.inc_counter(f"{self._filename_template.split}.shard_{shard_number}")
615616
return (shard_number, key_serialized_example)
616617

617618
def _store_split_info(

0 commit comments

Comments
 (0)