From 482697aa56510d9ea32f813204e85311ad5dd447 Mon Sep 17 00:00:00 2001 From: Hyejin Yoon <0327jane@gmail.com> Date: Wed, 29 Oct 2025 19:54:07 +0900 Subject: [PATCH 1/3] update outdated links --- README.md | 133 +++++++++++------------------------------------------- 1 file changed, 26 insertions(+), 107 deletions(-) diff --git a/README.md b/README.md index 30c121266d21ca..3cfe96b2d5438d 100644 --- a/README.md +++ b/README.md @@ -52,51 +52,51 @@ HOSTED_DOCS_ONLY--> [Quickstart](https://docs.datahub.com/docs/quickstart) | [Features](https://docs.datahub.com/docs/features) | -[Roadmap](https://feature-requests.datahubproject.io/roadmap) | -[Adoption](#adoption) | +[Adoption](https://datahub.com/resources/?2004611554=dh-stories) | [Demo](https://demo.datahub.com/) | [Town Hall](https://docs.datahub.com/docs/townhalls) ---- - -> 📣 DataHub Town Hall is the 4th Thursday at 9am US PT of every month - [add it to your calendar!](https://lu.ma/datahubevents/) -> -> - Town-hall Zoom link: [zoom.datahubproject.io](https://zoom.datahubproject.io) -> - [Meeting details](docs/townhalls.md) & [past recordings](docs/townhall-history.md) - -> ✨ DataHub Community Highlights: -> -> - Read our Monthly Project Updates [here](https://medium.com/datahub-project/tagged/project-updates). -> - Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At DataHub: [Data Engineering Podcast](https://www.dataengineeringpodcast.com/acryl-data-datahub-metadata-graph-episode-230/) -> - Check out our most-read blog post, [DataHub: Popular Metadata Architectures Explained](https://engineering.linkedin.com/blog/2020/datahub-popular-metadata-architectures-explained) @ LinkedIn Engineering Blog. -> - Join us on [Slack](docs/slack.md)! Ask questions and keep up with the latest announcements. ## Introduction DataHub is an open-source data catalog for the modern data stack. Read about the architectures of different metadata systems and why DataHub excels [here](https://engineering.linkedin.com/blog/2020/datahub-popular-metadata-architectures-explained). Also read our [LinkedIn Engineering blog post](https://engineering.linkedin.com/blog/2019/data-hub), check out our [Strata presentation](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019) and watch our [Crunch Conference Talk](https://www.youtube.com/watch?v=OB-O0Y6OYDE). You should also visit [DataHub Architecture](docs/architecture/architecture.md) to get a better understanding of how DataHub is implemented. -## Features & Roadmap - -Check out DataHub's [Features](docs/features.md) & [Roadmap](https://feature-requests.datahubproject.io/roadmap). - -## Demo and Screenshots There's a [hosted demo environment](https://demo.datahub.com/) courtesy of DataHub where you can explore DataHub without installing it locally. ## Quickstart +``` +python3 -m pip install --upgrade acryl-datahub +datahub docker quickstart +``` + Please follow the [DataHub Quickstart Guide](https://docs.datahub.com/docs/quickstart) to run DataHub locally using [Docker](https://docker.com). -## Development +## Community + +Join our [Slack workspace](https://datahub.com/slack?utm_source=github&utm_medium=readme&utm_campaign=github_readme) for discussions and important announcements. You can also find out more about our upcoming [town hall meetings](docs/townhalls.md) and view past recordings. + + +## Contributing + +We welcome contributions from the community. Please refer to our [Contributing Guidelines](docs/CONTRIBUTING.md) for more details. We also have a [contrib](contrib) directory for incubating experimental features. If you're looking to build & modify datahub please take a look at our [Development Guide](https://docs.datahub.com/docs/developers). -

- - - -

+ +## Adoption + +[Here are the companies](https://datahub.com/resources/?2004611554=dh-stories) that have officially adopted DataHub. Please feel free to add yours to the list if we missed it. + + +## DataHub Cloud + +* [Why DataHub Cloud](https://datahub.com/products/why-datahub-cloud/) +* [DataHub Cloud vs DataHub Core](https://datahub.com/products/cloud-vs-core/) + + ## Source Code and Repositories @@ -109,88 +109,7 @@ If you're looking to build & modify datahub please take a look at our [Developme - [business-glossary-sync-action](https://github.com/acryldata/business-glossary-sync-action): A github action that opens PRs to update your business glossary yaml file. - [mcp-server-datahub](https://github.com/acryldata/mcp-server-datahub): A [Model Context Protocol](https://modelcontextprotocol.io/) server implementation for DataHub. -## Releases - -See [Releases](https://github.com/datahub-project/datahub/releases) page for more details. We follow the [SemVer Specification](https://semver.org) when versioning the releases and adopt the [Keep a Changelog convention](https://keepachangelog.com/) for the changelog format. - -## Contributing - -We welcome contributions from the community. Please refer to our [Contributing Guidelines](docs/CONTRIBUTING.md) for more details. We also have a [contrib](contrib) directory for incubating experimental features. - -## Community - -Join our [Slack workspace](https://datahub.com/slack?utm_source=github&utm_medium=readme&utm_campaign=github_readme) for discussions and important announcements. You can also find out more about our upcoming [town hall meetings](docs/townhalls.md) and view past recordings. - -## Security - -See [Security Stance](docs/SECURITY_STANCE.md) for information on DataHub's Security. - -## Adoption -Here are the companies that have officially adopted DataHub. Please feel free to add yours to the list if we missed it. - -- [ABLY](https://ably.team/) -- [Adevinta](https://www.adevinta.com/) -- [Banksalad](https://www.banksalad.com) -- [Cabify](https://cabify.tech/) -- [ClassDojo](https://www.classdojo.com/) -- [Coursera](https://www.coursera.org/) -- [CVS Health](https://www.cvshealth.com/) -- [DefinedCrowd](http://www.definedcrowd.com) -- [DFDS](https://www.dfds.com/) -- [Digital Turbine](https://www.digitalturbine.com/) -- [Expedia Group](http://expedia.com) -- [Experius](https://www.experius.nl) -- [Geotab](https://www.geotab.com) -- [Grofers](https://grofers.com) -- [Haibo Technology](https://www.botech.com.cn) -- [hipages](https://hipages.com.au/) -- [inovex](https://www.inovex.de/) -- [Inter&Co](https://inter.co/) -- [IOMED](https://iomed.health) -- [Klarna](https://www.klarna.com) -- [LinkedIn](http://linkedin.com) -- [Moloco](https://www.moloco.com/en) -- [N26](https://n26brasil.com/) -- [Optum](https://www.optum.com/) -- [Peloton](https://www.onepeloton.com) -- [PITS Global Data Recovery Services](https://www.pitsdatarecovery.net/) -- [Razer](https://www.razer.com) -- [Rippling](https://www.rippling.com/) -- [Showroomprive](https://www.showroomprive.com/) -- [SpotHero](https://spothero.com) -- [Stash](https://www.stash.com) -- [Shanghai HuaRui Bank](https://www.shrbank.com) -- [s7 Airlines](https://www.s7.ru/) -- [ThoughtWorks](https://www.thoughtworks.com) -- [TypeForm](http://typeform.com) -- [Udemy](https://www.udemy.com/) -- [Uphold](https://uphold.com) -- [Viasat](https://viasat.com) -- [Wealthsimple](https://www.wealthsimple.com) -- [Wikimedia](https://www.wikimedia.org) -- [Wolt](https://wolt.com) -- [Zynga](https://www.zynga.com) - -## Select Articles & Talks - -- [DataHub Blog](https://medium.com/datahub-project/) -- [DataHub YouTube Channel](https://www.youtube.com/channel/UC3qFQC5IiwR5fvWEqi_tJ5w) -- [Optum: Data Mesh via DataHub](https://opensource.optum.com/blog/2022/03/23/data-mesh-via-datahub) -- [Saxo Bank: Enabling Data Discovery in Data Mesh](https://medium.com/datahub-project/enabling-data-discovery-in-a-data-mesh-the-saxo-journey-451b06969c8f) -- [Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At DataHub](https://www.dataengineeringpodcast.com/acryl-data-datahub-metadata-graph-episode-230/) -- [DataHub: Popular Metadata Architectures Explained](https://engineering.linkedin.com/blog/2020/datahub-popular-metadata-architectures-explained) -- [Driving DataOps Culture with LinkedIn DataHub](https://www.youtube.com/watch?v=ccsIKK9nVxk) @ [DataOps Unleashed 2021](https://dataopsunleashed.com/#shirshanka-session) -- [The evolution of metadata: LinkedIn’s story](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019) @ [Strata Data Conference 2019](https://conferences.oreilly.com/strata/strata-ny-2019.html) -- [Journey of metadata at LinkedIn](https://www.youtube.com/watch?v=OB-O0Y6OYDE) @ [Crunch Data Conference 2019](https://crunchconf.com/2019) -- [DataHub Journey with Expedia Group](https://www.youtube.com/watch?v=ajcRdB22s5o) -- [Data Discoverability at SpotHero](https://www.slideshare.net/MaggieHays/data-discoverability-at-spothero) -- [Data Catalogue — Knowing your data](https://medium.com/albert-franzi/data-catalogue-knowing-your-data-15f7d0724900) -- [DataHub: A Generalized Metadata Search & Discovery Tool](https://engineering.linkedin.com/blog/2019/data-hub) -- [Open sourcing DataHub: LinkedIn’s metadata search and discovery platform](https://engineering.linkedin.com/blog/2020/open-sourcing-datahub--linkedins-metadata-search-and-discovery-p) -- [Emerging Architectures for Modern Data Infrastructure](https://future.com/emerging-architectures-for-modern-data-infrastructure-2020/) - -See the full list [here](docs/links.md). ## License From 97515cabf4a474a617192c3e2a64e7d40a5dbfb8 Mon Sep 17 00:00:00 2001 From: Hyejin Yoon <0327jane@gmail.com> Date: Wed, 29 Oct 2025 20:12:41 +0900 Subject: [PATCH 2/3] update sections --- README.md | 66 +++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 49 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index 3cfe96b2d5438d..e4d970dab06ffc 100644 --- a/README.md +++ b/README.md @@ -57,22 +57,64 @@ HOSTED_DOCS_ONLY--> [Town Hall](https://docs.datahub.com/docs/townhalls) -## Introduction +## What is DataHub? -DataHub is an open-source data catalog for the modern data stack. Read about the architectures of different metadata systems and why DataHub excels [here](https://engineering.linkedin.com/blog/2020/datahub-popular-metadata-architectures-explained). Also read our -[LinkedIn Engineering blog post](https://engineering.linkedin.com/blog/2019/data-hub), check out our [Strata presentation](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019) and watch our [Crunch Conference Talk](https://www.youtube.com/watch?v=OB-O0Y6OYDE). You should also visit [DataHub Architecture](docs/architecture/architecture.md) to get a better understanding of how DataHub is implemented. +**DataHub is an enterprise-grade, real-time metadata platform** that enables data discovery, observability, and governance across your entire data ecosystem. Built by LinkedIn and proven at massive scale (100,000+ datasets), DataHub provides a unified catalog where data engineers, analysts, and scientists can find, understand, and trust their data. -There's a [hosted demo environment](https://demo.datahub.com/) courtesy of DataHub where you can explore DataHub without installing it locally. +**The Challenge:** Modern data stacks are fragmented across dozens of tools—warehouses, lakes, BI platforms, ML systems, orchestration engines. Finding the right data, understanding its lineage, and ensuring governance is like searching through a maze blindfolded. + +**The DataHub Solution:** DataHub acts as a real-time metadata graph that continuously streams metadata from all your data tools, creating a single source of truth. Unlike batch-based catalogs that are always outdated, DataHub keeps your metadata fresh and actionable. + +## Why DataHub? + +- **Built for Scale**: Proven at LinkedIn managing 100,000+ datasets, 10M+ daily queries +- **Real-Time Streaming**: Metadata updates in seconds, not hours or days +- **Universal Connectors**: [100+ integrations](https://docs.datahub.com/integrations) for warehouses, databases, BI, ML, orchestration +- **Developer-First**: Rich APIs (GraphQL, REST), Python SDK, CLI tools +- Enterprise Ready: Battle-tested security, authentication, authorization, and audit trails +- **Open Source**: [Apache 2.0 licensed](./LICENSE), vendor-neutral, community-driven + +## Common Use Cases + + +| Use Case | Description | Learn More | +|----------|-------------|------------| +| 🔍 **Data Discovery** | Help users find the right data for analytics and ML | [Guide](https://docs.datahub.com/docs/features) | +| 📊 **Impact Analysis** | Understand downstream impact before making changes | [Lineage Docs](https://docs.datahub.com/docs/lineage) | +| 🏛️ **Data Governance** | Enforce policies, classify PII, manage access | [Governance Guide](https://docs.datahub.com/docs/governance) | +| 🔔 **Data Quality** | Monitor freshness, volumes, schema changes | [Quality Checks](https://docs.datahub.com/docs/tests) | +| 📚 **Documentation** | Centralize data documentation and knowledge | [Docs Features](https://docs.datahub.com/docs/documentation) | +| 👥 **Collaboration** | Foster data culture with discussions and ownership | [Collaboration](https://docs.datahub.com/docs/features) | + ## Quickstart +Please follow the [DataHub Quickstart Guide](https://docs.datahub.com/docs/quickstart) to run DataHub locally using [Docker](https://docker.com). + ``` python3 -m pip install --upgrade acryl-datahub datahub docker quickstart ``` -Please follow the [DataHub Quickstart Guide](https://docs.datahub.com/docs/quickstart) to run DataHub locally using [Docker](https://docker.com). +What you get: +- ✅ DataHub GMS (backend metadata service) +- ✅ DataHub Frontend (React UI) +- ✅ Elasticsearch (search & analytics) +- ✅ MySQL (metadata storage) +- ✅ Kafka + Schema Registry (streaming) +- ✅ Sample data + + + > You can alwasy try our [hosted demo]((https://demo.datahub.com/)) - Explore DataHub with sample data, no installation needed! + + +## Trusted by Industry Leaders +DataHub powers data discovery and governance at some of the world's most data-driven organizations. + +[Here are the companies](https://datahub.com/resources/?2004611554=dh-stories) that have officially adopted DataHub. Please feel free to add yours to the list if we missed it. + + ## Community @@ -86,16 +128,11 @@ We welcome contributions from the community. Please refer to our [Contributing G If you're looking to build & modify datahub please take a look at our [Development Guide](https://docs.datahub.com/docs/developers). -## Adoption - -[Here are the companies](https://datahub.com/resources/?2004611554=dh-stories) that have officially adopted DataHub. Please feel free to add yours to the list if we missed it. - - ## DataHub Cloud -* [Why DataHub Cloud](https://datahub.com/products/why-datahub-cloud/) -* [DataHub Cloud vs DataHub Core](https://datahub.com/products/cloud-vs-core/) +Looking for a fully managed solution? **DataHub Cloud** provides enterprise-grade data catalog with zero infrastructure management. +**☁️ [Request Demo](https://datahub.com/demo/)** | **[Why Cloud?](https://datahub.com/products/why-datahub-cloud/)** | **[Cloud vs Core](https://datahub.com/products/cloud-vs-core/)** | **[Pricing](https://www.acryldata.io/pricing)** ## Source Code and Repositories @@ -109,8 +146,3 @@ If you're looking to build & modify datahub please take a look at our [Developme - [business-glossary-sync-action](https://github.com/acryldata/business-glossary-sync-action): A github action that opens PRs to update your business glossary yaml file. - [mcp-server-datahub](https://github.com/acryldata/mcp-server-datahub): A [Model Context Protocol](https://modelcontextprotocol.io/) server implementation for DataHub. - - -## License - -[Apache License 2.0](./LICENSE). From 511140a4fc2dcf0b320a3eb38df43083988e174d Mon Sep 17 00:00:00 2001 From: Hyejin Yoon <0327jane@gmail.com> Date: Wed, 29 Oct 2025 20:25:38 +0900 Subject: [PATCH 3/3] add imgs --- README.md | 39 +++++++++++++++++++++++---------------- 1 file changed, 23 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index e4d970dab06ffc..cd46248dc22d78 100644 --- a/README.md +++ b/README.md @@ -51,7 +51,7 @@ HOSTED_DOCS_ONLY--> ### 🏠 Docs: [docs.datahub.com](https://docs.datahub.com/) [Quickstart](https://docs.datahub.com/docs/quickstart) | -[Features](https://docs.datahub.com/docs/features) | +[Features](https://datahub.com/products/) | [Adoption](https://datahub.com/resources/?2004611554=dh-stories) | [Demo](https://demo.datahub.com/) | [Town Hall](https://docs.datahub.com/docs/townhalls) @@ -59,12 +59,10 @@ HOSTED_DOCS_ONLY--> ## What is DataHub? -**DataHub is an enterprise-grade, real-time metadata platform** that enables data discovery, observability, and governance across your entire data ecosystem. Built by LinkedIn and proven at massive scale (100,000+ datasets), DataHub provides a unified catalog where data engineers, analysts, and scientists can find, understand, and trust their data. +**DataHub is an open-source metadata platform** that enables data discovery, observability, and governance across your entire data stack. Built by LinkedIn and proven at scale (100,000+ datasets), DataHub provides a unified catalog where teams can find, understand, and trust their data. -**The Challenge:** Modern data stacks are fragmented across dozens of tools—warehouses, lakes, BI platforms, ML systems, orchestration engines. Finding the right data, understanding its lineage, and ensuring governance is like searching through a maze blindfolded. - -**The DataHub Solution:** DataHub acts as a real-time metadata graph that continuously streams metadata from all your data tools, creating a single source of truth. Unlike batch-based catalogs that are always outdated, DataHub keeps your metadata fresh and actionable. +Modern data stacks are fragmented across dozens of tools. DataHub solves this by acting as a real-time metadata graph that continuously streams metadata from all your data sources, creating a single source of truth. ## Why DataHub? @@ -75,17 +73,27 @@ HOSTED_DOCS_ONLY--> - Enterprise Ready: Battle-tested security, authentication, authorization, and audit trails - **Open Source**: [Apache 2.0 licensed](./LICENSE), vendor-neutral, community-driven -## Common Use Cases +## Core Features + +

+ +DataHub + + +DataHub + + +DataHub + +

-| Use Case | Description | Learn More | -|----------|-------------|------------| -| 🔍 **Data Discovery** | Help users find the right data for analytics and ML | [Guide](https://docs.datahub.com/docs/features) | -| 📊 **Impact Analysis** | Understand downstream impact before making changes | [Lineage Docs](https://docs.datahub.com/docs/lineage) | -| 🏛️ **Data Governance** | Enforce policies, classify PII, manage access | [Governance Guide](https://docs.datahub.com/docs/governance) | -| 🔔 **Data Quality** | Monitor freshness, volumes, schema changes | [Quality Checks](https://docs.datahub.com/docs/tests) | -| 📚 **Documentation** | Centralize data documentation and knowledge | [Docs Features](https://docs.datahub.com/docs/documentation) | -| 👥 **Collaboration** | Foster data culture with discussions and ownership | [Collaboration](https://docs.datahub.com/docs/features) | +| Features | Description | +|----------|-------------| +| 🔍 [**Data Discovery**](https://datahub.com/products/data-discovery/) | Effortlessly discover and get context on trustworthy data | +| 👁️ [**Data Observability**](https://datahub.com/products/data-observability) | Detect, resolve, and prevent data quality issues before they impact your business | +| 🏛️ [**Data Governance**](https://datahub.com/products/data-governance)| Ensure every data asset is accounted for by continuously fulfilling governance standards. | +| 📊 [**Impact Analysis**](https://docs.datahub.com/docs/act-on-metadata/impact-analysis) | Understand downstream impact before making changes | [Lineage Docs](https://docs.datahub.com/docs/lineage) | ## Quickstart @@ -132,8 +140,7 @@ If you're looking to build & modify datahub please take a look at our [Developme Looking for a fully managed solution? **DataHub Cloud** provides enterprise-grade data catalog with zero infrastructure management. -**☁️ [Request Demo](https://datahub.com/demo/)** | **[Why Cloud?](https://datahub.com/products/why-datahub-cloud/)** | **[Cloud vs Core](https://datahub.com/products/cloud-vs-core/)** | **[Pricing](https://www.acryldata.io/pricing)** - +**☁️ [Request Demo](https://datahub.com/demo/)** | **[Why Cloud?](https://datahub.com/products/why-datahub-cloud/)** | **[Cloud vs Core](https://datahub.com/products/cloud-vs-core/)** ## Source Code and Repositories