Commit e820b8b
* fc
Signed-off-by: Praateek <praateekm@gmail.com>
* review comments
Signed-off-by: Praateek <praateekm@gmail.com>
* make blocksize work with parquet
Signed-off-by: Praateek <praateekm@gmail.com>
* filetype
Signed-off-by: Praateek <praateekm@gmail.com>
* fix merge
Signed-off-by: Praateek <praateekm@gmail.com>
* add test cases
Signed-off-by: Praateek <praateekm@gmail.com>
* add test file
Signed-off-by: Praateek <praateekm@gmail.com>
* failing test for select_columns
Signed-off-by: Praateek <praateekm@gmail.com>
* rename func name
Signed-off-by: Praateek <praateekm@gmail.com>
* add test case for different columns
Signed-off-by: Praateek <praateekm@gmail.com>
* improve test for different_cols
Signed-off-by: Praateek <praateekm@gmail.com>
* ..
Signed-off-by: Praateek <praateekm@gmail.com>
* review comments + add warnings for inconsistent schemas
Signed-off-by: Praateek <praateekm@gmail.com>
* Update nemo_curator/utils/distributed_utils.py
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Praateek Mahajan <praateekmahajan@users.noreply.github.com>
* Update nemo_curator/utils/distributed_utils.py
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Praateek Mahajan <praateekmahajan@users.noreply.github.com>
* Update nemo_curator/utils/distributed_utils.py
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Praateek Mahajan <praateekmahajan@users.noreply.github.com>
* Update nemo_curator/utils/distributed_utils.py
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Praateek Mahajan <praateekmahajan@users.noreply.github.com>
* Update nemo_curator/utils/distributed_utils.py
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Praateek Mahajan <praateekmahajan@users.noreply.github.com>
* Update nemo_curator/utils/distributed_utils.py
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Praateek Mahajan <praateekmahajan@users.noreply.github.com>
* Update nemo_curator/utils/distributed_utils.py
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Praateek Mahajan <praateekmahajan@users.noreply.github.com>
* fix tests
Signed-off-by: Praateek <praateekm@gmail.com>
---------
Signed-off-by: Praateek <praateekm@gmail.com>
Signed-off-by: Praateek Mahajan <praateekmahajan@users.noreply.github.com>
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
1 parent c54826a commit e820b8b
File tree
7 files changed
+814
-56
lines changed- docs/user-guide
- nemo_curator
- datasets
- utils
- tests
7 files changed
+814
-56
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
27 | 37 | | |
28 | 38 | | |
29 | 39 | | |
| |||
59 | 69 | | |
60 | 70 | | |
61 | 71 | | |
| 72 | + | |
62 | 73 | | |
63 | 74 | | |
64 | 75 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
26 | 35 | | |
27 | 36 | | |
28 | 37 | | |
| |||
40 | 49 | | |
41 | 50 | | |
42 | 51 | | |
| 52 | + | |
43 | 53 | | |
44 | 54 | | |
45 | 55 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
53 | | - | |
| 53 | + | |
| 54 | + | |
54 | 55 | | |
55 | 56 | | |
56 | 57 | | |
| |||
74 | 75 | | |
75 | 76 | | |
76 | 77 | | |
77 | | - | |
78 | 78 | | |
| 79 | + | |
| 80 | + | |
79 | 81 | | |
80 | 82 | | |
81 | 83 | | |
| |||
87 | 89 | | |
88 | 90 | | |
89 | 91 | | |
90 | | - | |
91 | | - | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
92 | 95 | | |
93 | 96 | | |
94 | 97 | | |
| |||
109 | 112 | | |
110 | 113 | | |
111 | 114 | | |
112 | | - | |
113 | 115 | | |
| 116 | + | |
| 117 | + | |
114 | 118 | | |
115 | 119 | | |
116 | 120 | | |
| |||
121 | 125 | | |
122 | 126 | | |
123 | 127 | | |
124 | | - | |
125 | | - | |
126 | 128 | | |
127 | 129 | | |
128 | 130 | | |
| |||
142 | 144 | | |
143 | 145 | | |
144 | 146 | | |
145 | | - | |
146 | | - | |
147 | 147 | | |
148 | 148 | | |
149 | 149 | | |
| |||
234 | 234 | | |
235 | 235 | | |
236 | 236 | | |
237 | | - | |
238 | 237 | | |
| 238 | + | |
| 239 | + | |
239 | 240 | | |
240 | 241 | | |
241 | 242 | | |
| |||
267 | 268 | | |
268 | 269 | | |
269 | 270 | | |
| 271 | + | |
270 | 272 | | |
271 | 273 | | |
272 | 274 | | |
| |||
286 | 288 | | |
287 | 289 | | |
288 | 290 | | |
| 291 | + | |
289 | 292 | | |
290 | 293 | | |
291 | 294 | | |
| |||
311 | 314 | | |
312 | 315 | | |
313 | 316 | | |
| 317 | + | |
314 | 318 | | |
315 | 319 | | |
316 | 320 | | |
| |||
0 commit comments