OpenMetadata Standards

A comprehensive metadata standard for the modern data and AI ecosystem

What Are We Modeling?

OpenMetadata Standards provide a unified, open-source metadata model that describes every aspect of your data and AI ecosystem - from traditional data assets to modern AI systems, covering both structured and unstructured data across your entire organization.

Comprehensive Coverage

Traditional Data Assets:

Databases, tables, schemas, and stored procedures
Data pipelines, workflows, and DAGs
Dashboards, reports, and visualizations
Message queues, topics, and event streams
APIs, endpoints, and service contracts

Unstructured Data & Documents:

Drive services (Google Drive, OneDrive, SharePoint)
Spreadsheets, worksheets, and collaborative documents
File systems, containers, and object storage
Directories, files, and document repositories

AI Governance & LLM Systems:

Large Language Models (LLMs) and foundation models
AI Agents and autonomous systems
Model Context Protocol (MCP) servers and tools
Prompts, templates, and prompt engineering
Vector databases and embeddings
AI applications and integrations

Data Governance & Quality:

Data quality tests, suites, and profiles
Classification, tags, and glossaries
Data contracts and SLAs
Lineage from source to consumption
Teams, users, roles, and ownership
Domains and data products

!!! info "AI Governance Initiative" OpenMetadata is pioneering AI Governance by extending metadata standards to cover the entire AI lifecycle - from LLMs and agents to prompts and vector databases. This enables organizations to govern AI systems with the same rigor as traditional data assets.

**Learn more**: [AI Governance Roadmap](https://github.com/open-metadata/OpenMetadata/issues/23853)

What This Enables

:material-connection:{ .lg .middle } Universal Interoperability

Seamlessly connect and integrate across data platforms, document systems, and AI tools using standardized metadata schemas.
:material-graph:{ .lg .middle } Semantic Understanding

Enable rich semantic queries and reasoning through RDF ontologies and knowledge graphs built on W3C standards.
:material-robot:{ .lg .middle } AI Governance

Govern AI systems with the same rigor as data - track LLMs, agents, prompts, and model lineage end-to-end.
:material-shield-check:{ .lg .middle } Unified Data Governance

Apply consistent governance policies across structured databases, unstructured documents, and AI systems.
:material-test-tube:{ .lg .middle } Data Quality

Comprehensive testing, profiling, and validation frameworks ensuring data reliability across all asset types.
:material-source-branch:{ .lg .middle } Complete Lineage

Track data flow from raw sources through transformations, ML pipelines, to AI applications and dashboards.
:material-account-group:{ .lg .middle } Clear Ownership

Define organizational structure, teams, roles, and responsibilities across all data and AI assets.
:material-api:{ .lg .middle } API-First Design

RESTful APIs enable real-time metadata updates and integrations without heavyweight infrastructure.

The Metadata Stack

OpenMetadata Standards are expressed in multiple complementary formats:

📋 JSON Schema

Human-readable, machine-validatable schemas

JSON Schema Draft-07 specification
700+ schemas covering all metadata entities
Strongly typed with validation rules
IDE autocomplete support
Used by OpenMetadata APIs

Explore JSON Schemas →

🔗 RDF & OWL Ontology

Semantic web standards for knowledge graphs

W3C OWL ontology for formal semantics
RDFS classes and properties
Reasoning and inference capabilities
SPARQL queryable
Integration with semantic web tools

Explore RDF Ontology →

🌐 JSON-LD Contexts

Linked data for interoperability

JSON-LD 1.1 contexts
Maps JSON to RDF
Enables semantic annotations
Web-scale data integration
Compatible with schema.org

Explore JSON-LD →

✅ SHACL Shapes

Validation constraints for RDF graphs

SHACL shapes for validation
Constraint checking
Data quality rules
Graph validation
Compliance verification

Explore SHACL →

The Hierarchical Model

OpenMetadata organizes entities in hierarchical service-based structures:

Database Stack

graph TD
    DS[Database Service<br/>MySQL, PostgreSQL, Snowflake] --> DB[Database]
    DB --> SCHEMA[Schema]
    SCHEMA --> TABLE[Table]
    SCHEMA --> SP[Stored Procedure]
    TABLE --> COL[Column]

    style DS fill:#667eea,color:#fff
    style DB fill:#4facfe,color:#fff
    style SCHEMA fill:#00f2fe,color:#333
    style TABLE fill:#43e97b,color:#333
    style SP fill:#43e97b,color:#333
    style COL fill:#e0f2fe,color:#333

Pipeline Stack

graph TD
    PS[Pipeline Service<br/>Airflow, Dagster, Prefect, dbt] --> P[Pipeline]
    P --> T[Task]

    style PS fill:#667eea,color:#fff
    style P fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style T fill:#00f2fe,color:#333

Messaging Stack

graph TD
    MS[Messaging Service<br/>Kafka, Pulsar, Kinesis] --> TOP[Topic]
    TOP --> SCH[Message Schema]

    style MS fill:#667eea,color:#fff
    style TOP fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style SCH fill:#00f2fe,color:#333

Dashboard Stack

graph TD
    DBS[Dashboard Service<br/>Tableau, Looker, PowerBI] --> DM[Data Model]
    DBS --> DASH[Dashboard]
    DBS --> CH[Chart]

    style DBS fill:#667eea,color:#fff
    style DM fill:#4facfe,color:#fff
    style DASH fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style CH fill:#00f2fe,color:#333

ML Stack

graph TD
    MLS[ML Model Service<br/>MLflow, SageMaker] --> ML[ML Model]
    ML --> F[Features]
    ML --> H[Hyperparameters]
    ML --> M[Metrics]

    style MLS fill:#667eea,color:#fff
    style ML fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style F fill:#f093fb,color:#333
    style H fill:#f093fb,color:#333
    style M fill:#f093fb,color:#333

Storage Stack

graph TD
    SS[Storage Service<br/>S3, GCS, Azure Blob] --> C[Container]
    C --> F[Files]

    style SS fill:#667eea,color:#fff
    style C fill:#4facfe,color:#fff,stroke:#4c51bf,stroke-width:3px
    style F fill:#00f2fe,color:#333

Explore All Data Assets →

Cross-Cutting Concepts

Beyond data assets, OpenMetadata Standards model:

🔄 Lineage

Complete data flow tracking

Track transformations from source to dashboard to ML model using:

Column-level lineage
Asset-level lineage
W3C PROV-O provenance ontology
Pipeline execution lineage

Example: API Service → ETL Pipeline → Table → Dashboard

Explore Lineage Specification →

📚 Governance

Business context and classification

Model business knowledge and data sensitivity:

Glossaries: Business terminology
Glossary Terms: Definitions with relationships
Classifications: Hierarchical taxonomies (PII, PHI, Tier)
Tags: Labels for categorization

Example: Link "Customer" glossary term to customer table, tag email column as PII.Sensitive.Email

Explore Governance Specification →

✓ Data Quality

Testing and profiling framework

Define and track data quality:

Test Definitions: Reusable test templates
Test Cases: Applied to tables/columns
Test Suites: Organized test execution
Profiling: Statistical analysis

Example: Define uniqueness test for customer_id, run daily, track results

Explore Data Quality Specification →

👥 Teams & Users

Organizational structure and ownership

Model your organization:

Users: Individual people
Teams: Groups with hierarchies
Roles: Permission sets
Ownership: Asset assignments

Example: Data Engineering team owns customer_etl pipeline, Jane Doe is the owner

Explore Teams & Users Specification →

📜 Data Contracts

Formal agreements across all assets

Define expectations for any data asset:

Schema requirements
Quality SLAs
Freshness guarantees
Ownership commitments

Not just tables - contracts apply to Topics, Dashboards, ML Models, APIs, and more

Explore Data Contract Specification →

🏢 Domains

Business domain organization

Organize data assets by business area or function:

Domain Hierarchy: Top-level and sub-domains
Asset Assignment: Assign tables, dashboards, pipelines to domains
Domain Ownership: Domain-specific owners and experts
Cross-Domain Dependencies: Track data flows across domains

Example: Sales domain contains customer tables, revenue dashboards, and sales pipelines

Explore Domain Specification →

📦 Data Products

Packaged data for consumption

Define curated data products for specific use cases:

Product Definition: Packaged collection of data assets
Assets: Tables, dashboards, ML models working together
SLAs: Quality, freshness, and availability guarantees
Consumers: Teams and applications using the product

Example: "Customer 360" data product includes customer tables, enrichment pipelines, and analytics dashboards

Explore Data Product Specification →

Deep Dive Documentation

Each metadata entity has comprehensive documentation explaining:

Overview: What it models and why
JSON Schema: Complete field reference
RDF Representation: Ontology classes and properties
JSON-LD: Semantic annotations
Examples: Real-world use cases
Relationships: How it connects to other entities

Example: Table Entity

Table is the core entity representing database tables and views.

Key Fields:

name, fullyQualifiedName, description
columns[]: Array of column definitions with types, constraints
tableType: Regular, View, MaterializedView, External
owner, domain, tags, glossaryTerms
dataModel: SQL query for views
tableConstraints: Primary/foreign keys
tableProfilerConfig: Profiling settings

Relationships:

Belongs to databaseSchema
Contains columns
Referenced by dashboards, mlModels
Has testCases for quality
Participates in lineage

View Complete Table Specification →

Standards in Action

Use Case: Customer Data Pipeline

Assets Modeled:

PostgreSQL Database Service
  └── crm_database
        └── public schema
              └── customers table
                    ├── customer_id (PK)
                    ├── email
                    ├── name
                    └── created_date

Airflow Pipeline Service
  └── customer_etl pipeline
        ├── extract_customers task
        ├── transform_customers task
        └── load_customers task

Tableau Dashboard Service
  └── Customer Analytics dashboard
        ├── Customer Growth chart
        └── Customer Segments chart

Lineage:

customers table
  → customer_etl pipeline
    → warehouse.customers_dim table
      → Customer Analytics dashboard

Governance:

customers.email tagged as PII.Sensitive.Email
customers table linked to "Customer" glossary term
GDPR compliance tag applied

Data Quality:

Test: customer_id is unique
Test: email matches regex pattern
Test: created_date <= today
Profile: Track row count daily

Ownership:

Data Engineering team owns customer_etl
Analytics team owns Customer Analytics
Jane Doe is data steward

Data Contract:

customers table must update within 1 hour
Email completeness >= 99%
Row count between 10,000 - 10,000,000

All modeled in:

✅ JSON Schema with full validation
✅ RDF ontology for semantic queries
✅ JSON-LD for linked data
✅ SHACL for constraint validation

Getting Started

1. Understand the Standards

Start with the JSON Schema overview to understand the core structures.

2. Explore Data Assets

Browse the hierarchical data assets organized by service type.

3. Learn Cross-Cutting Concepts

Understand lineage, governance, and data quality.

4. Deep Dive

Read detailed specifications for entities like Table, Pipeline, or Dashboard.

5. Use the Standards

Integrate OpenMetadata Standards into your tools using the API reference.

Why OpenMetadata Standards?

Open Source

Freely available, community-driven, transparent development

Comprehensive

Covers databases, pipelines, dashboards, ML, governance, quality, and more

Semantic

RDF and ontologies enable reasoning and knowledge graphs

Interoperable

JSON-LD enables integration with any semantic web tool

Extensible

Custom properties and types for your specific needs

Battle-Tested

Used in production by organizations managing petabytes of data

Community & Contribution

GitHub: open-metadata/OpenMetadataStandards
Slack: #openmetadata-standards
Contribute: See Contributing Guide

Next Steps

📋 JSON Schemas

Explore the complete JSON Schema reference

Go to JSON Schemas →

🗂️ Data Assets

Browse all data asset types by service

Go to Data Assets →

🔗 RDF Ontology

Understand the semantic web representation

Go to RDF →

📖 Examples

See real-world use cases and examples

Go to Examples →

FilesExpand file tree

overview.md

Latest commit

History