A comprehensive metadata standard for the modern data and AI ecosystem
OpenMetadata Standards provide a unified, open-source metadata model that describes every aspect of your data and AI ecosystem - from traditional data assets to modern AI systems, covering both structured and unstructured data across your entire organization.
Traditional Data Assets:
- Databases, tables, schemas, and stored procedures
- Data pipelines, workflows, and DAGs
- Dashboards, reports, and visualizations
- Message queues, topics, and event streams
- APIs, endpoints, and service contracts
Unstructured Data & Documents:
- Drive services (Google Drive, OneDrive, SharePoint)
- Spreadsheets, worksheets, and collaborative documents
- File systems, containers, and object storage
- Directories, files, and document repositories
AI Governance & LLM Systems:
- Large Language Models (LLMs) and foundation models
- AI Agents and autonomous systems
- Model Context Protocol (MCP) servers and tools
- Prompts, templates, and prompt engineering
- Vector databases and embeddings
- AI applications and integrations
Data Governance & Quality:
- Data quality tests, suites, and profiles
- Classification, tags, and glossaries
- Data contracts and SLAs
- Lineage from source to consumption
- Teams, users, roles, and ownership
- Domains and data products
!!! info "AI Governance Initiative" OpenMetadata is pioneering AI Governance by extending metadata standards to cover the entire AI lifecycle - from LLMs and agents to prompts and vector databases. This enables organizations to govern AI systems with the same rigor as traditional data assets.
**Learn more**: [AI Governance Roadmap](https://github.com/open-metadata/OpenMetadata/issues/23853)
-
:material-connection:{ .lg .middle } Universal Interoperability
Seamlessly connect and integrate across data platforms, document systems, and AI tools using standardized metadata schemas.
-
:material-graph:{ .lg .middle } Semantic Understanding
Enable rich semantic queries and reasoning through RDF ontologies and knowledge graphs built on W3C standards.
-
:material-robot:{ .lg .middle } AI Governance
Govern AI systems with the same rigor as data - track LLMs, agents, prompts, and model lineage end-to-end.
-
:material-shield-check:{ .lg .middle } Unified Data Governance
Apply consistent governance policies across structured databases, unstructured documents, and AI systems.
-
:material-test-tube:{ .lg .middle } Data Quality
Comprehensive testing, profiling, and validation frameworks ensuring data reliability across all asset types.
-
:material-source-branch:{ .lg .middle } Complete Lineage
Track data flow from raw sources through transformations, ML pipelines, to AI applications and dashboards.
-
:material-account-group:{ .lg .middle } Clear Ownership
Define organizational structure, teams, roles, and responsibilities across all data and AI assets.
-
:material-api:{ .lg .middle } API-First Design
RESTful APIs enable real-time metadata updates and integrations without heavyweight infrastructure.
OpenMetadata Standards are expressed in multiple complementary formats:
Human-readable, machine-validatable schemas
- JSON Schema Draft-07 specification
- 700+ schemas covering all metadata entities
- Strongly typed with validation rules
- IDE autocomplete support
- Used by OpenMetadata APIs
Semantic web standards for knowledge graphs
- W3C OWL ontology for formal semantics
- RDFS classes and properties
- Reasoning and inference capabilities
- SPARQL queryable
- Integration with semantic web tools
Linked data for interoperability
- JSON-LD 1.1 contexts
- Maps JSON to RDF
- Enables semantic annotations
- Web-scale data integration
- Compatible with schema.org
Validation constraints for RDF graphs
- SHACL shapes for validation
- Constraint checking
- Data quality rules
- Graph validation
- Compliance verification
OpenMetadata organizes entities in hierarchical service-based structures:
graph TD
DS[Database Service<br/>MySQL, PostgreSQL, Snowflake] --> DB[Database]
DB --> SCHEMA[Schema]
SCHEMA --> TABLE[Table]
SCHEMA --> SP[Stored Procedure]
TABLE --> COL[Column]
style DS fill:#667eea,color:#fff
style DB fill:#4facfe,color:#fff
style SCHEMA fill:#00f2fe,color:#333
style TABLE fill:#43e97b,color:#333
style SP fill:#43e97b,color:#333
style COL fill:#e0f2fe,color:#333
graph TD
PS[Pipeline Service<br/>Airflow, Dagster, Prefect, dbt] --> P[Pipeline]
P --> T[Task]
style PS fill:#667eea,color:#fff
style P fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style T fill:#00f2fe,color:#333
graph TD
MS[Messaging Service<br/>Kafka, Pulsar, Kinesis] --> TOP[Topic]
TOP --> SCH[Message Schema]
style MS fill:#667eea,color:#fff
style TOP fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style SCH fill:#00f2fe,color:#333
graph TD
DBS[Dashboard Service<br/>Tableau, Looker, PowerBI] --> DM[Data Model]
DBS --> DASH[Dashboard]
DBS --> CH[Chart]
style DBS fill:#667eea,color:#fff
style DM fill:#4facfe,color:#fff
style DASH fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style CH fill:#00f2fe,color:#333
graph TD
MLS[ML Model Service<br/>MLflow, SageMaker] --> ML[ML Model]
ML --> F[Features]
ML --> H[Hyperparameters]
ML --> M[Metrics]
style MLS fill:#667eea,color:#fff
style ML fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style F fill:#f093fb,color:#333
style H fill:#f093fb,color:#333
style M fill:#f093fb,color:#333
graph TD
SS[Storage Service<br/>S3, GCS, Azure Blob] --> C[Container]
C --> F[Files]
style SS fill:#667eea,color:#fff
style C fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
style F fill:#00f2fe,color:#333
Beyond data assets, OpenMetadata Standards model:
Complete data flow tracking
Track transformations from source to dashboard to ML model using:
- Column-level lineage
- Asset-level lineage
- W3C PROV-O provenance ontology
- Pipeline execution lineage
Example: API Service → ETL Pipeline → Table → Dashboard
Explore Lineage Specification →
Business context and classification
Model business knowledge and data sensitivity:
- Glossaries: Business terminology
- Glossary Terms: Definitions with relationships
- Classifications: Hierarchical taxonomies (PII, PHI, Tier)
- Tags: Labels for categorization
Example: Link "Customer" glossary term to customer table, tag email column as PII.Sensitive.Email
Explore Governance Specification →
Testing and profiling framework
Define and track data quality:
- Test Definitions: Reusable test templates
- Test Cases: Applied to tables/columns
- Test Suites: Organized test execution
- Profiling: Statistical analysis
Example: Define uniqueness test for customer_id, run daily, track results
Explore Data Quality Specification →
Organizational structure and ownership
Model your organization:
- Users: Individual people
- Teams: Groups with hierarchies
- Roles: Permission sets
- Ownership: Asset assignments
Example: Data Engineering team owns customer_etl pipeline, Jane Doe is the owner
Explore Teams & Users Specification →
Formal agreements across all assets
Define expectations for any data asset:
- Schema requirements
- Quality SLAs
- Freshness guarantees
- Ownership commitments
Not just tables - contracts apply to Topics, Dashboards, ML Models, APIs, and more
Explore Data Contract Specification →
Business domain organization
Organize data assets by business area or function:
- Domain Hierarchy: Top-level and sub-domains
- Asset Assignment: Assign tables, dashboards, pipelines to domains
- Domain Ownership: Domain-specific owners and experts
- Cross-Domain Dependencies: Track data flows across domains
Example: Sales domain contains customer tables, revenue dashboards, and sales pipelines
Explore Domain Specification →
Packaged data for consumption
Define curated data products for specific use cases:
- Product Definition: Packaged collection of data assets
- Assets: Tables, dashboards, ML models working together
- SLAs: Quality, freshness, and availability guarantees
- Consumers: Teams and applications using the product
Example: "Customer 360" data product includes customer tables, enrichment pipelines, and analytics dashboards
Explore Data Product Specification →
Each metadata entity has comprehensive documentation explaining:
- Overview: What it models and why
- JSON Schema: Complete field reference
- RDF Representation: Ontology classes and properties
- JSON-LD: Semantic annotations
- Examples: Real-world use cases
- Relationships: How it connects to other entities
Table is the core entity representing database tables and views.
Key Fields:
name,fullyQualifiedName,descriptioncolumns[]: Array of column definitions with types, constraintstableType: Regular, View, MaterializedView, Externalowner,domain,tags,glossaryTermsdataModel: SQL query for viewstableConstraints: Primary/foreign keystableProfilerConfig: Profiling settings
Relationships:
- Belongs to
databaseSchema - Contains
columns - Referenced by
dashboards,mlModels - Has
testCasesfor quality - Participates in
lineage
Assets Modeled:
PostgreSQL Database Service
└── crm_database
└── public schema
└── customers table
├── customer_id (PK)
├── email
├── name
└── created_date
Airflow Pipeline Service
└── customer_etl pipeline
├── extract_customers task
├── transform_customers task
└── load_customers task
Tableau Dashboard Service
└── Customer Analytics dashboard
├── Customer Growth chart
└── Customer Segments chart
Lineage:
customers table
→ customer_etl pipeline
→ warehouse.customers_dim table
→ Customer Analytics dashboard
Governance:
customers.emailtagged asPII.Sensitive.Emailcustomerstable linked to "Customer" glossary term- GDPR compliance tag applied
Data Quality:
- Test:
customer_idis unique - Test:
emailmatches regex pattern - Test:
created_date<= today - Profile: Track row count daily
Ownership:
- Data Engineering team owns
customer_etl - Analytics team owns
Customer Analytics - Jane Doe is data steward
Data Contract:
customerstable must update within 1 hour- Email completeness >= 99%
- Row count between 10,000 - 10,000,000
All modeled in:
- ✅ JSON Schema with full validation
- ✅ RDF ontology for semantic queries
- ✅ JSON-LD for linked data
- ✅ SHACL for constraint validation
Start with the JSON Schema overview to understand the core structures.
Browse the hierarchical data assets organized by service type.
Understand lineage, governance, and data quality.
Read detailed specifications for entities like Table, Pipeline, or Dashboard.
Integrate OpenMetadata Standards into your tools using the API reference.
Freely available, community-driven, transparent development
Covers databases, pipelines, dashboards, ML, governance, quality, and more
RDF and ontologies enable reasoning and knowledge graphs
JSON-LD enables integration with any semantic web tool
Custom properties and types for your specific needs
Used in production by organizations managing petabytes of data
- GitHub: open-metadata/OpenMetadataStandards
- Slack: #openmetadata-standards
- Contribute: See Contributing Guide