Skip to content

Conversation

@jrbourbeau
Copy link
Member

@jrbourbeau jrbourbeau commented Oct 17, 2025

xref #1184 (comment)

EDIT: The issue was the files we get from Path.rglob aren't always sorted. This PR adds an explicit sorted(...) call

Closes #1184

Signed-off-by: James Bourbeau <[email protected]>
@ayushdg ayushdg merged commit 8c3c9e4 into NVIDIA-NeMo:main Oct 20, 2025
33 of 34 checks passed
@jrbourbeau jrbourbeau deleted the verbose-fail-file-size-test branch October 20, 2025 15:52
lbliii pushed a commit to lbliii/NeMo-Curator that referenced this pull request Oct 22, 2025
* Log more info on test_split_parquet_file_by_size failure

Signed-off-by: James Bourbeau <[email protected]>

* Sort files

Signed-off-by: James Bourbeau <[email protected]>

---------

Signed-off-by: James Bourbeau <[email protected]>
Co-authored-by: Ayush Dattagupta <[email protected]>
Signed-off-by: Lawrence Lane <[email protected]>
jnke2016 pushed a commit to jnke2016/Curator that referenced this pull request Nov 12, 2025
* Log more info on test_split_parquet_file_by_size failure

Signed-off-by: James Bourbeau <[email protected]>

* Sort files

Signed-off-by: James Bourbeau <[email protected]>

---------

Signed-off-by: James Bourbeau <[email protected]>
Co-authored-by: Ayush Dattagupta <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test_split_parquet_file_by_size[20] is non deterministic probably

3 participants