Commit 2e87765
Use the max size of serialized examples to find a safe number of shards
If we know the max size of serialized examples, then we can account for the worst case scenario where one shard would get only examples of the max size. This hopefully should prevent users running into problems with having too big shards.
PiperOrigin-RevId: 7263777781 parent 281ce2d commit 2e87765
File tree
6 files changed
+157
-24
lines changed- tensorflow_datasets/core
- dataset_builders
- utils
6 files changed
+157
-24
lines changedLines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
406 | 406 | | |
407 | 407 | | |
408 | 408 | | |
| 409 | + | |
409 | 410 | | |
410 | 411 | | |
411 | 412 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
244 | 244 | | |
245 | 245 | | |
246 | 246 | | |
247 | | - | |
| 247 | + | |
248 | 248 | | |
249 | 249 | | |
| 250 | + | |
250 | 251 | | |
251 | 252 | | |
252 | 253 | | |
| |||
263 | 264 | | |
264 | 265 | | |
265 | 266 | | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
266 | 271 | | |
267 | 272 | | |
268 | 273 | | |
| |||
272 | 277 | | |
273 | 278 | | |
274 | 279 | | |
275 | | - | |
| 280 | + | |
276 | 281 | | |
277 | 282 | | |
278 | | - | |
| 283 | + | |
279 | 284 | | |
280 | 285 | | |
281 | 286 | | |
| |||
299 | 304 | | |
300 | 305 | | |
301 | 306 | | |
| 307 | + | |
302 | 308 | | |
303 | 309 | | |
304 | 310 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| 60 | + | |
60 | 61 | | |
61 | 62 | | |
62 | 63 | | |
63 | 64 | | |
64 | 65 | | |
65 | 66 | | |
66 | | - | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
67 | 70 | | |
68 | 71 | | |
69 | 72 | | |
70 | 73 | | |
71 | | - | |
72 | | - | |
| 74 | + | |
| 75 | + | |
73 | 76 | | |
74 | 77 | | |
75 | 78 | | |
76 | 79 | | |
77 | 80 | | |
78 | 81 | | |
79 | 82 | | |
80 | | - | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
81 | 93 | | |
82 | 94 | | |
83 | 95 | | |
| |||
96 | 108 | | |
97 | 109 | | |
98 | 110 | | |
| 111 | + | |
99 | 112 | | |
100 | 113 | | |
101 | 114 | | |
102 | 115 | | |
103 | 116 | | |
104 | 117 | | |
105 | | - | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
106 | 122 | | |
107 | 123 | | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
108 | 127 | | |
109 | 128 | | |
110 | 129 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
32 | 105 | | |
33 | 106 | | |
34 | | - | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
35 | 113 | | |
36 | 114 | | |
37 | 115 | | |
38 | 116 | | |
39 | 117 | | |
40 | 118 | | |
41 | 119 | | |
| 120 | + | |
42 | 121 | | |
43 | 122 | | |
44 | 123 | | |
| |||
48 | 127 | | |
49 | 128 | | |
50 | 129 | | |
51 | | - | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
52 | 134 | | |
53 | 135 | | |
54 | 136 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
| 119 | + | |
119 | 120 | | |
120 | 121 | | |
121 | 122 | | |
| |||
125 | 126 | | |
126 | 127 | | |
127 | 128 | | |
| 129 | + | |
128 | 130 | | |
129 | 131 | | |
130 | 132 | | |
131 | 133 | | |
132 | | - | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
133 | 137 | | |
134 | 138 | | |
135 | 139 | | |
| |||
372 | 376 | | |
373 | 377 | | |
374 | 378 | | |
| 379 | + | |
375 | 380 | | |
376 | 381 | | |
377 | 382 | | |
| |||
589 | 594 | | |
590 | 595 | | |
591 | 596 | | |
592 | | - | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
593 | 600 | | |
594 | 601 | | |
595 | 602 | | |
| 603 | + | |
596 | 604 | | |
597 | 605 | | |
598 | 606 | | |
| |||
658 | 666 | | |
659 | 667 | | |
660 | 668 | | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
661 | 672 | | |
662 | | - | |
663 | | - | |
664 | | - | |
665 | | - | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
666 | 680 | | |
667 | 681 | | |
668 | 682 | | |
669 | 683 | | |
670 | | - | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
671 | 687 | | |
672 | 688 | | |
673 | 689 | | |
| |||
826 | 842 | | |
827 | 843 | | |
828 | 844 | | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| 51 | + | |
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
| |||
134 | 135 | | |
135 | 136 | | |
136 | 137 | | |
| 138 | + | |
137 | 139 | | |
138 | 140 | | |
139 | 141 | | |
| |||
0 commit comments