You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update user-facing variable names and add override checks for name, resources, and batch_size (#1223)
* Add override checks for `name`, `resources`, and `batch_size`
Signed-off-by: Sarah Yurick <[email protected]>
* add pytests
Signed-off-by: Sarah Yurick <[email protected]>
* rename for name, resources, and batch_size to be user facing
Signed-off-by: Sarah Yurick <[email protected]>
* fix some tests
Signed-off-by: Sarah Yurick <[email protected]>
* small update
Signed-off-by: Sarah Yurick <[email protected]>
* revert DocumentFilter changes
Signed-off-by: Sarah Yurick <[email protected]>
* revert DocumentModifier changes
Signed-off-by: Sarah Yurick <[email protected]>
* update gpu test
Signed-off-by: Sarah Yurick <[email protected]>
* add abhinav's suggestion
Signed-off-by: Sarah Yurick <[email protected]>
---------
Signed-off-by: Sarah Yurick <[email protected]>
-**GPU Acceleration**: Use a GPU-enabled environment for optimal performance. The stage automatically detects CUDA availability and uses GPU decoding when possible.
230
230
-**Parallelism Control**: Adjust `files_per_partition` to control how many tar files are processed together. Lower values increase parallelism but may increase overhead.
231
-
-**Batch Size Tuning**: Increase `task_batch_size` for better throughput, but ensure sufficient memory is available.
231
+
-**Batch Size Tuning**: Increase `batch_size` for better throughput, but ensure sufficient memory is available.
232
232
-**Thread Configuration**: Adjust `num_threads` for I/O operations based on your storage system's characteristics.
0 commit comments