-
Notifications
You must be signed in to change notification settings - Fork 914
[Feature] add korean business registration number recognizer #1822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new recognizer KrBrnRecognizer to detect and validate South Korean Business Registration Numbers (BRN), a 10-digit identifier used for business taxation purposes in South Korea.
Key Changes:
- Implements pattern-based recognition with checksum validation using a magic key algorithm
- Adds comprehensive test coverage with valid and invalid BRN test cases
- Integrates the new recognizer into Presidio's configuration system with disabled-by-default setting
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/korea/kr_brn_recognizer.py |
New recognizer implementation with regex patterns, context words, and checksum validation algorithm for Korean Business Registration Numbers |
presidio-analyzer/tests/test_kr_brn_recognizer.py |
Comprehensive unit tests covering valid BRNs (with and without dashes), invalid BRNs (checksum failures, format errors), and edge cases |
presidio-analyzer/presidio_analyzer/predefined_recognizers/country_specific/korea/__init__.py |
Adds KrBrnRecognizer import and export, maintaining alphabetical order in all list |
presidio-analyzer/presidio_analyzer/predefined_recognizers/__init__.py |
Adds KrBrnRecognizer import and export to main predefined_recognizers module |
presidio-analyzer/presidio_analyzer/conf/default_recognizers.yaml |
Configures KrBrnRecognizer with Korean language support (ko/kr) and disabled-by-default setting (following country-specific recognizer convention) |
docs/supported_entities.md |
Documents the new KR_BRN entity type with description and detection method in the Korea section |
...nalyzer/presidio_analyzer/predefined_recognizers/country_specific/korea/kr_brn_recognizer.py
Show resolved
Hide resolved
|
I’ve addressed the issues raised by Copilot. Could you please take a look at the changes now @omri374 ? |
Change Description
This PR adds a new recognizer,
KrBrnRecognizer, to detect and validate South Korean Business Registration Numbers.Issue reference
Fixes #XX
Checklist