Skip to content

Comments

refactor: modernize codebase for current Python & PyTorch stack#47

Open
Omdeepb69 wants to merge 1 commit intoRedHenLab:masterfrom
Omdeepb69:master
Open

refactor: modernize codebase for current Python & PyTorch stack#47
Omdeepb69 wants to merge 1 commit intoRedHenLab:masterfrom
Omdeepb69:master

Conversation

@Omdeepb69
Copy link

Summary

This PR modernizes the legacy NMT pipeline to make it compatible with the current Python and PyTorch ecosystem while preserving the original architecture and functionality.

Key Changes

  • Updated deprecated PyTorch APIs to the current stable equivalents
  • Refactored training and evaluation modules for compatibility with modern tensor operations
  • Cleaned up legacy code patterns and improved overall readability
  • Updated requirements.txt with working dependency versions
  • Added .gitignore to remove cache artifacts from version control
  • Ensured preprocessing, training, and inference scripts run on a modern environment

Why this is needed

The existing implementation depends on older Python/PyTorch versions and cannot run in a current environment without significant fixes.
This PR enables:

  • Reproducibility on modern systems
  • Easier onboarding for new contributors
  • Future research and experimentation on top of this codebase

Backward compatibility

The core model logic and workflow remain unchanged.
Only compatibility and maintainability improvements were introduced.

Testing

  • Verified preprocessing pipeline
  • Verified training script execution
  • Verified translation/inference flow

Future Work (optional)

  • Add Docker / reproducible environment
  • CI for training smoke test
  • HuggingFace dataset/tokenizer integration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant