Skip to content

chore(deps): update dependency chardet to v7 - autoclosed#1004

Closed
renovate[bot] wants to merge 1 commit intomasterfrom
renovate/chardet-7.x
Closed

chore(deps): update dependency chardet to v7 - autoclosed#1004
renovate[bot] wants to merge 1 commit intomasterfrom
renovate/chardet-7.x

Conversation

@renovate
Copy link
Copy Markdown
Contributor

@renovate renovate bot commented Mar 4, 2026

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
chardet (changelog) >= 2.0, < 7.0>=2.0, <7.1 age adoption passing confidence

Release Notes

chardet/chardet (chardet)

v7.0.1

Compare Source

Fixes

  • Fixed false UTF-7 detection of SHA-1 git hashes (#​324, fixing #​323) — requirements files with VCS pins (e.g., +4bafdea3...) were misdetected as UTF-7, breaking tools like tox
  • Fixed _SINGLE_LANG_MAP missing aliases for single-language encoding lookup (e.g., big5big5hkscs)
  • Fixed PyPy TypeError in UTF-7 codec handling

Improvements

  • Retrained bigram models — 24 previously failing test cases now pass
  • Updated language equivalences for mutual intelligibility (Slovak/Czech, East Slavic + Bulgarian, Malay/Indonesian, Scandinavian languages)

New Contributors

  • @​rembish made their first contribution — both reporting the UTF-7 false detection issue and submitting the fix! (#​323, #​324)

v7.0.0

Compare Source

Ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x. Just way faster and more accurate!

Highlights:

  • MIT license (previous versions were LGPL)
  • 96.8% accuracy on 2,179 test files (+2.3pp vs chardet 6.0.0, +7.7pp vs charset-normalizer)
  • 41x faster than chardet 6.0.0 with mypyc (28x pure Python), 7.5x faster than charset-normalizer
  • Language detection for every result (90.5% accuracy across 49 languages)
  • 99 encodings across six eras (MODERN_WEB, LEGACY_ISO, LEGACY_MAC, LEGACY_REGIONAL, DOS, MAINFRAME)
  • 12-stage detection pipeline — BOM, UTF-16/32 patterns, escape sequences, binary detection, markup charset, ASCII, UTF-8 validation, byte validity, CJK gating, structural probing, statistical scoring, post-processing
  • Bigram frequency models trained on CulturaX multilingual corpus data for all supported language/encoding pairs
  • Optional mypyc compilation — 1.49x additional speedup on CPython
  • Thread-safe detect() and detect_all() with no measurable overhead; scales on free-threaded Python 3.13t+
  • Negligible import memory (96 B)
  • Zero runtime dependencies

Breaking changes vs 6.0.0:

  • detect() and detect_all() now default to encoding_era=EncodingEra.ALL (6.0.0 defaulted to MODERN_WEB)
  • Internal architecture is completely different (probers replaced by pipeline stages). Only the public API is preserved.
  • LanguageFilter is accepted but ignored (deprecation warning emitted)
  • chunk_size is accepted but ignored (deprecation warning emitted)

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot requested review from Kelly Ma (eddiedialpad) and Shaun Sawyer (shaundialpad) and removed request for a team March 4, 2026 01:07
@renovate renovate bot force-pushed the renovate/chardet-7.x branch from 5d22e57 to c9d45c2 Compare March 5, 2026 00:54
@renovate renovate bot changed the title chore(deps): update dependency chardet to v7 chore(deps): update dependency chardet to v7 - autoclosed Mar 11, 2026
@renovate renovate bot closed this Mar 11, 2026
@renovate renovate bot deleted the renovate/chardet-7.x branch March 11, 2026 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant