You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenMetadata provides basic automatic classification through its Auto PII Tagging feature, relying on predefined regex patterns and language models. While this is useful for generic use cases, it lacks the flexibility required by organizations needing to classify domain-specific data using local formats, internal business rules, or complex logic.
This feature request proposes a pluggable and extensible data classification engine that supports multiple types of user-defined rules — including regex, custom scripts (e.g., Java or Python), and external service calls — to infer data classes during profiling or ingestion.
Proposed Capabilities
1. Custom Data Classes
Allow users to define custom data classes (e.g., CPF, LicensePlate, BankAgencyCode) and associate them with classification logic.
2. Flexible Rule Types
Support different rule types for inference, including:
External function calls or webhooks (e.g., REST APIs for validation)
Recognizer plugins using libraries like spaCy or Microsoft Presidio
3. Rule Management
Provide a centralized interface (UI or config) to register, organize, and enable/disable rules
Allow prioritization of rules and fallback mechanisms (e.g., try A, else B)
4. Integration with Data Profiler
Evaluate custom rules during profiling and show:
Match result
Confidence score
Suggested tag or classification
Link to glossary term (optional)
5. Governance Integration
Inferred classifications should automatically apply tags
Tags should activate existing governance workflows (e.g., access policies)
6. Auditable Results
Record rule evaluations per column with:
Rule name and logic
Match status
Confidence
Execution details or logs
Benefits
Enables detection of localized or domain-specific data patterns
Reduces effort on manual tagging and classification
Encourages knowledge reuse across domains through portable rule sets
Improves governance automation and data discoverability
Note: This feature would be especially useful for data teams working in countries or industries with specific identifiers (e.g., national IDs, license codes, legal fields), and for companies with strong internal data standards.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
OpenMetadata provides basic automatic classification through its Auto PII Tagging feature, relying on predefined regex patterns and language models. While this is useful for generic use cases, it lacks the flexibility required by organizations needing to classify domain-specific data using local formats, internal business rules, or complex logic.
This feature request proposes a pluggable and extensible data classification engine that supports multiple types of user-defined rules — including regex, custom scripts (e.g., Java or Python), and external service calls — to infer data classes during profiling or ingestion.
Proposed Capabilities
1. Custom Data Classes
CPF
,LicensePlate
,BankAgencyCode
) and associate them with classification logic.2. Flexible Rule Types
Support different rule types for inference, including:
3. Rule Management
4. Integration with Data Profiler
5. Governance Integration
6. Auditable Results
Benefits
Note: This feature would be especially useful for data teams working in countries or industries with specific identifiers (e.g., national IDs, license codes, legal fields), and for companies with strong internal data standards.
Beta Was this translation helpful? Give feedback.
All reactions