Skip to content

Conversation

@sarahyurick
Copy link
Contributor

@sarahyurick sarahyurick commented Oct 20, 2025

Closes #906.

The only part of our codebase that uses ftfy is the UnicodeReformatter, which is tested here: https://github.com/NVIDIA-NeMo/Curator/blob/main/tests/stages/text/modules/test_modifiers.py. Assuming the test still passes, we should be safe to merge.

Greptile Overview

Updated On: 2025-10-20 21:51:00 UTC

Greptile Summary

This PR removes the version pin on the ftfy dependency, changing it from ftfy==6.1.1 to ftfy in the pyproject.toml file. The change addresses issue #906 and allows the package manager to install any compatible version of ftfy rather than being locked to version 6.1.1. Within the NeMo Curator codebase, ftfy is exclusively used by the UnicodeReformatter module for text cleaning operations, which has existing test coverage in tests/stages/text/modules/test_modifiers.py. This change aligns with best practices for dependency management by relying on semantic versioning rather than hard pins, giving users more flexibility while still maintaining compatibility through ftfy's API stability.

Important Files Changed

Changed Files
Filename Score Overview
pyproject.toml 4/5 Removed version pin from ftfy dependency (changed from ftfy==6.1.1 to ftfy) in the text_cpu optional dependencies

Confidence score: 4/5

  • This PR is safe to merge with low risk, provided existing tests pass
  • Score reflects minimal change scope (single dependency pin removal) and existing test coverage for the affected module (UnicodeReformatter), though slightly lowered because the PR depends on runtime validation rather than compile-time guarantees that newer ftfy versions maintain API compatibility
  • No files require special attention; the change is straightforward and limited to dependency management

Signed-off-by: Sarah Yurick <[email protected]>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@sarahyurick sarahyurick requested a review from VibhuJawa October 20, 2025 22:08
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sarahyurick sarahyurick merged commit 82536bb into NVIDIA-NeMo:main Oct 21, 2025
36 checks passed
lbliii pushed a commit to lbliii/NeMo-Curator that referenced this pull request Oct 22, 2025
Signed-off-by: Sarah Yurick <[email protected]>
Signed-off-by: Lawrence Lane <[email protected]>
jnke2016 pushed a commit to jnke2016/Curator that referenced this pull request Nov 12, 2025
@sarahyurick sarahyurick deleted the remove_ftfy_pin branch February 9, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RAY] Remove pinning ftfy to "6.1.1"

3 participants