All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Fix scipy version constraints for Python 3.13 compatibility (use scipy 1.14+ for Python 3.10+)
- Fix GitHub Actions badge URL in README
- Fix test imports to skip sklearn tests when extra not installed
- Add pre-commit hooks for ruff lint and format
- Add RELEASING.md with comprehensive release guide
- Add
clean_texts()function for batch cleaning with multiprocessing support (#20) - Add code snippet and file path filters (#23)
- Add option to remove IP addresses (#34)
- Add language support for Danish, Spanish, Faroese, French, Icelandic, Italian, Norwegian, Scandinavian, and Swedish (#36)
- Add regex exceptions to
clean()(#19)
- Improve scikit-learn compatibility for
CleanTransformer(#31) - Use emoji module's recommended APIs for emoji 2.x
- Update emoji and pandas dependency constraints (#37, #38)
- Bump scikit-learn minimum to >=1.5.0 (security fix)
- Modernize project tooling: replace black + pylint with ruff, update CI to Python 3.9–3.13
- Drop Python 3.6–3.8, require Python >=3.9
- Add pipeline for scikit-learn by sadra-barikbin (#21)
- Add utility function to remove substrings from text
- Drop Python 3.6, support Python 3.10
- Improve documentation
- Rename default branch from
mastertomain
- New way to handle unicode to avoid weird changes (#17)
- Fix emoji & whitespace handling
- Various minor improvements
- Minor fixes
- Support Python 3.8 and 3.9
- Various minor improvments
- Minor fixes
- Initial release