Commit dc93a5a
text curation refactor updates (#1082)
* text curation updates
Signed-off-by: Lawrence Lane <[email protected]>
* concepts
Signed-off-by: Lawrence Lane <[email protected]>
* remove synthetic docs not for this release
Signed-off-by: Lawrence Lane <[email protected]>
* updates
Signed-off-by: Lawrence Lane <[email protected]>
* text concepts and getting started changes
Signed-off-by: Lawrence Lane <[email protected]>
* links, concepts
Signed-off-by: Lawrence Lane <[email protected]>
* crosslinks
Signed-off-by: Lawrence Lane <[email protected]>
* quality assessment updates
Signed-off-by: Lawrence Lane <[email protected]>
* more cleanup
Signed-off-by: Lawrence Lane <[email protected]>
* semdedup
Signed-off-by: Lawrence Lane <[email protected]>
* example import cleanup
Signed-off-by: Lawrence Lane <[email protected]>
* concepts
Signed-off-by: Lawrence Lane <[email protected]>
* Update docs/about/concepts/text/data-acquisition-concepts.md
Co-authored-by: Praateek Mahajan <[email protected]>
Signed-off-by: L.B. <[email protected]>
* feedback batch 1
Signed-off-by: Lawrence Lane <[email protected]>
* feedback batch 2
Signed-off-by: Lawrence Lane <[email protected]>
* file_paths="/path/to/jsonl_directory",
Signed-off-by: Lawrence Lane <[email protected]>
* revert removal of xenna for common crawl executors
Signed-off-by: Lawrence Lane <[email protected]>
* quickstart installation steps
Signed-off-by: Lawrence Lane <[email protected]>
* Update docs/about/concepts/text/data-acquisition-concepts.md
Co-authored-by: Sarah Yurick <[email protected]>
Signed-off-by: L.B. <[email protected]>
* data loading concepts updates / simplification
Signed-off-by: Lawrence Lane <[email protected]>
* data processing feedback
Signed-off-by: Lawrence Lane <[email protected]>
* read-existing pg updates
Signed-off-by: Lawrence Lane <[email protected]>
* add-id updates
Signed-off-by: Lawrence Lane <[email protected]>
* dedup updates
Signed-off-by: Lawrence Lane <[email protected]>
* feedback
Signed-off-by: Lawrence Lane <[email protected]>
* Skip IV2 Unit Test if package not installed (#1111)
* Skip IV2 Unit Test if package not installed
Signed-off-by: Ao Tang <[email protected]>
* syntax
Signed-off-by: Ao Tang <[email protected]>
---------
Signed-off-by: Ao Tang <[email protected]>
Co-authored-by: Dong Hyuk Chang <[email protected]>
* Llane/docs audio modality staging (#1028)
* docs: audio modality staging
Signed-off-by: Lawrence Lane <[email protected]>
* updates
Signed-off-by: Lawrence Lane <[email protected]>
* concept updates
Signed-off-by: Lawrence Lane <[email protected]>
* Update docs/reference/infrastructure/resumable-processing.md
Co-authored-by: Copilot <[email protected]>
Signed-off-by: L.B. <[email protected]>
* Update docs/reference/infrastructure/distributed-computing.md
Co-authored-by: Copilot <[email protected]>
Signed-off-by: L.B. <[email protected]>
* Update docs/curate-audio/process-data/audio-analysis/format-validation.md
Co-authored-by: Copilot <[email protected]>
Signed-off-by: L.B. <[email protected]>
* Update docs/curate-audio/process-data/audio-analysis/duration-calculation.md
Co-authored-by: Copilot <[email protected]>
Signed-off-by: L.B. <[email protected]>
* updates
Signed-off-by: Lawrence Lane <[email protected]>
* remove
Signed-off-by: Lawrence Lane <[email protected]>
* updates
Signed-off-by: Lawrence Lane <[email protected]>
* updates
Signed-off-by: Lawrence Lane <[email protected]>
* updates
Signed-off-by: Lawrence Lane <[email protected]>
* updates
Signed-off-by: Lawrence Lane <[email protected]>
* updates
Signed-off-by: Lawrence Lane <[email protected]>
* updates
Signed-off-by: Lawrence Lane <[email protected]>
* mermaid fixes
Signed-off-by: Lawrence Lane <[email protected]>
* minor things lost from cutting out other commits
Signed-off-by: Lawrence Lane <[email protected]>
* removed any suggestive metric language
Signed-off-by: Lawrence Lane <[email protected]>
* updates
Signed-off-by: Lawrence Lane <[email protected]>
* remove
Signed-off-by: Lawrence Lane <[email protected]>
* updates
Signed-off-by: Lawrence Lane <[email protected]>
* pip and uv install stuff
Signed-off-by: Lawrence Lane <[email protected]>
* simplifying some pages
Signed-off-by: Lawrence Lane <[email protected]>
* link fixes
Signed-off-by: Lawrence Lane <[email protected]>
* update
Signed-off-by: Lawrence Lane <[email protected]>
* missed feedback from sarah
Signed-off-by: Lawrence Lane <[email protected]>
* missed feedback continued
Signed-off-by: Lawrence Lane <[email protected]>
* updates
Signed-off-by: Lawrence Lane <[email protected]>
* updates
Signed-off-by: Lawrence Lane <[email protected]>
* nemo models page feedback
Signed-off-by: Lawrence Lane <[email protected]>
* installation update
Signed-off-by: Lawrence Lane <[email protected]>
* feedback follow up
Signed-off-by: Lawrence Lane <[email protected]>
* duration filtering, wer filtering example removals
Signed-off-by: Lawrence Lane <[email protected]>
---------
Signed-off-by: Lawrence Lane <[email protected]>
Signed-off-by: L.B. <[email protected]>
Co-authored-by: Copilot <[email protected]>
---------
Signed-off-by: Lawrence Lane <[email protected]>
Signed-off-by: L.B. <[email protected]>
Signed-off-by: Ao Tang <[email protected]>
Co-authored-by: Praateek Mahajan <[email protected]>
Co-authored-by: Sarah Yurick <[email protected]>
Co-authored-by: Ao Tang <[email protected]>
Co-authored-by: Dong Hyuk Chang <[email protected]>
Co-authored-by: Copilot <[email protected]>1 parent 62bc37f commit dc93a5a
File tree
69 files changed
+3002
-6838
lines changed- docs
- about
- concepts/text
- admin
- curate-text
- generate-data
- connect-service
- pipelines
- load-data
- process-data
- content-processing
- deduplication
- language-management
- quality-assessment
- specialized-processing
- tutorials
- curate-video
- load-data
- tutorials
- get-started
- reference
- infrastructure
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
69 files changed
+3002
-6838
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | 60 | | |
65 | 61 | | |
66 | 62 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
18 | | - | |
| 19 | + | |
19 | 20 | | |
20 | | - | |
21 | | - | |
22 | | - | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
28 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
29 | 74 | | |
30 | 75 | | |
31 | 76 | | |
| |||
54 | 99 | | |
55 | 100 | | |
56 | 101 | | |
| 102 | + | |
57 | 103 | | |
58 | 104 | | |
59 | 105 | | |
60 | 106 | | |
61 | 107 | | |
62 | 108 | | |
63 | | - | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
64 | 114 | | |
65 | | - | |
66 | | - | |
67 | | - | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
68 | 121 | | |
69 | 122 | | |
70 | 123 | | |
| |||
94 | 147 | | |
95 | 148 | | |
96 | 149 | | |
| 150 | + | |
97 | 151 | | |
98 | 152 | | |
99 | 153 | | |
| |||
103 | 157 | | |
104 | 158 | | |
105 | 159 | | |
106 | | - | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
107 | 164 | | |
108 | 165 | | |
109 | | - | |
| 166 | + | |
110 | 167 | | |
111 | 168 | | |
112 | 169 | | |
| |||
133 | 190 | | |
134 | 191 | | |
135 | 192 | | |
| 193 | + | |
136 | 194 | | |
137 | 195 | | |
138 | 196 | | |
139 | 197 | | |
140 | 198 | | |
141 | | - | |
| 199 | + | |
142 | 200 | | |
143 | | - | |
| 201 | + | |
| 202 | + | |
144 | 203 | | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
145 | 210 | | |
146 | 211 | | |
147 | 212 | | |
| |||
189 | 254 | | |
190 | 255 | | |
191 | 256 | | |
192 | | - | |
| 257 | + | |
193 | 258 | | |
194 | | - | |
| 259 | + | |
195 | 260 | | |
196 | 261 | | |
197 | 262 | | |
198 | 263 | | |
199 | | - | |
200 | | - | |
201 | | - | |
202 | | - | |
203 | | - | |
204 | | - | |
205 | | - | |
206 | | - | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | | - | |
213 | | - | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
214 | 276 | | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
215 | 281 | | |
216 | | - | |
217 | | - | |
218 | 282 | | |
219 | 283 | | |
220 | | - | |
| 284 | + | |
221 | 285 | | |
222 | | - | |
| 286 | + | |
223 | 287 | | |
224 | 288 | | |
225 | | - | |
226 | | - | |
227 | | - | |
228 | | - | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
229 | 301 | | |
230 | | - | |
231 | | - | |
| 302 | + | |
| 303 | + | |
232 | 304 | | |
233 | 305 | | |
234 | 306 | | |
235 | 307 | | |
236 | 308 | | |
237 | 309 | | |
238 | | - | |
| 310 | + | |
239 | 311 | | |
240 | 312 | | |
241 | 313 | | |
242 | 314 | | |
243 | 315 | | |
244 | 316 | | |
245 | | - | |
| 317 | + | |
246 | 318 | | |
247 | 319 | | |
248 | 320 | | |
| |||
274 | 346 | | |
275 | 347 | | |
276 | 348 | | |
277 | | - | |
| 349 | + | |
278 | 350 | | |
279 | | - | |
280 | | - | |
| 351 | + | |
| 352 | + | |
281 | 353 | | |
282 | 354 | | |
283 | 355 | | |
284 | 356 | | |
285 | 357 | | |
286 | | - | |
287 | | - | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
288 | 361 | | |
289 | 362 | | |
290 | 363 | | |
291 | 364 | | |
| 365 | + | |
292 | 366 | | |
293 | 367 | | |
294 | | - | |
| 368 | + | |
295 | 369 | | |
296 | 370 | | |
| 371 | + | |
297 | 372 | | |
298 | 373 | | |
299 | 374 | | |
| |||
307 | 382 | | |
308 | 383 | | |
309 | 384 | | |
310 | | - | |
311 | | - | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
312 | 397 | | |
313 | | - | |
314 | | - | |
| 398 | + | |
| 399 | + | |
315 | 400 | | |
316 | | - | |
317 | | - | |
318 | | - | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
319 | 404 | | |
320 | 405 | | |
321 | 406 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | | - | |
| 38 | + | |
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
46 | 45 | | |
47 | 46 | | |
48 | 47 | | |
49 | | - | |
50 | | - | |
| 48 | + | |
| 49 | + | |
51 | 50 | | |
52 | 51 | | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
| 52 | + | |
59 | 53 | | |
60 | 54 | | |
61 | 55 | | |
| |||
95 | 89 | | |
96 | 90 | | |
97 | 91 | | |
98 | | - | |
99 | | - | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
105 | 92 | | |
106 | 93 | | |
107 | 94 | | |
| |||
131 | 118 | | |
132 | 119 | | |
133 | 120 | | |
134 | | - | |
| 121 | + | |
0 commit comments