Get startedSign in
Back

Datahub

Datahub is an open-source, extensible metadata platform that enables data discovery, data observability, and data lineage tracing, as well as data governance to tame the complexity of rapidly evolving data ecosystems. It helps teams act on changes in metadata in real-time.

Available providers

Why use Datahub on Plural?

While managing metadata with Datahub is quick and easy, deploying and setting up the application itself is complex and requires specific (cloud) infrastructure, networking, and Kubernetes knowledge.

Plural helps you deploy and manage the lifecycle of open-source applications on Kubernetes. Our platform combines the scalability and observability benefits you get with managed SaaS offerings with the data security, governance, and compliance benefits of self-hosting Datahub.

If you need more than just Datahub, look for other open-source data engineering tools in our marketplace of curated applications to leapfrog complex deployments and get started quickly.

Datahub’s websiteGitHubLicenseInstalling Datahub docs

Deploying Datahub is a matter of executing these 3 commands:

plural bundle install datahub datahub-aws
plural build
plural deploy --commit "deploying datahub"
Read the install documentation

DataHub

DataHub: The Metadata Platform for the Modern Data Stack

Built with ❤️ by Acryl Data and LinkedIn

Version PyPI version build & test Docker Pulls Slack PRs Welcome GitHub commit activity License YouTube Medium Follow

🏠 Hosted DataHub Docs (Courtesy of Acryl Data): datahubproject.io


Quickstart | Features | Roadmap | Adoption | Demo | Town Hall


📣 DataHub Town Hall is the 4th Thursday at 9am US PT of every month - add it to your calendar!

✨ DataHub Community Highlights:

Introduction

DataHub is an open-source metadata platform for the modern data stack. Read about the architectures of different metadata systems and why DataHub excels here. Also read our LinkedIn Engineering blog post, check out our Strata presentation and watch our Crunch Conference Talk. You should also visit DataHub Architecture to get a better understanding of how DataHub is implemented.

Features & Roadmap

Check out DataHub's Features & Roadmap.

Demo and Screenshots

There's a hosted demo environment courtesy of Acryl Data where you can explore DataHub without installing it locally

Quickstart

Please follow the DataHub Quickstart Guide to get a copy of DataHub up & running locally using Docker. As the guide assumes some basic knowledge of Docker, we'd recommend you to go through the "Hello World" example of A Docker Tutorial for Beginners if Docker is completely foreign to you.

Development

If you're looking to build & modify datahub please take a look at our Development Guide.

DataHub Demo GIF

Source Code and Repositories

  • datahub-project/datahub: This repository contains the complete source code for DataHub's metadata model, metadata services, integration connectors and the web application.
  • acryldata/datahub-actions: DataHub Actions is a framework for responding to changes to your DataHub Metadata Graph in real time.
  • acryldata/datahub-helm: Repository of helm charts for deploying DataHub on a Kubernetes cluster
  • acryldata/meta-world: A repository to store recipes, custom sources, transformations and other things to make your DataHub experience magical

Releases

See Releases page for more details. We follow the SemVer Specification when versioning the releases and adopt the Keep a Changelog convention for the changelog format.

Contributing

We welcome contributions from the community. Please refer to our Contributing Guidelines for more details. We also have a contrib directory for incubating experimental features.

Community

Join our Slack workspace for discussions and important announcements. You can also find out more about our upcoming town hall meetings and view past recordings.

Adoption

Here are the companies that have officially adopted DataHub. Please feel free to add yours to the list if we missed it.

Select Articles & Talks

See the full list here.

License

Apache License 2.0.

How Plural works

We make it easy to securely deploy and manage open-source applications in your cloud.

Select from 90+ open-source applications

Get any stack you want running in minutes, and never think about upgrades again.

Securely deployed on your cloud with your git

You control everything. No need to share your cloud account, keys, or data.

Designed to be fully customizable

Built on Kubernetes and using standard infrastructure as code with Terraform and Helm.

Maintain & Scale with Plural Console

Interactive runbooks, dashboards, and Kubernetes api visualizers give an easy-to-use toolset to manage application operations.

Learn more
Screenshot of app installation in Plural app

Build your custom stack with Plural

Build your custom stack with over 90+ apps in the Plural Marketplace.

Data
Stack
Airbyte
DATA
Clickhouse
DATA
Dagster
DATA
Datahub
DATA
Growthbook
DATA
Jitsu
DATA
Lightdash
DATA
Posthog
DATA
Explore the Marketplace

Used by fast-moving teams at

  • CoachHub
  • Digitas
  • Fnatic
  • FSN Capital
  • Justos
  • Mott Mac

Developers love us

We no longer needed a dedicated DevOps team; instead, we actively participated in the industrialization and deployment of our applications through Plural. Additionally, it allowed us to quickly gain proficiency in Terraform and Helm.

Walid El Bouchikhi
Data Engineer at Beamy

I have neither the patience nor the talent for DevOps/SysAdmin work, and yet I've deployed four enterprise-caliber open-source apps on Kubernetes... since 9am today. Bonkers.

Sawyer Waugh
Head of Engineering at Justifi

This is awesome. You saved me hours of further DevOps work for our v1 release. Just to say, I really love Plural.

Ismael Goulani
CTO & Data Engineer at Modeo

Wow! First of all I want to say thank you for creating Plural! It solves a lot of problems coming from a non-DevOps background. You guys are amazing!

Joey Taleño
Head of Data at Poplar Homes

We have been using Plural for complex Kubernetes deployments of Kubeflow and are excited with the possibilities it provides in making our workflows simpler and more efficient.

Jürgen Stary
Engineering Manager @ Alexander Thamm

Plural has been awesome, it’s super fast and intuitive to get going and there is zero-to-no overhead of the app management.

Richard Freling
CTO and Co-Founder at Commandbar

Case StudyHow Fnatic Deploys Their Data Stack with Plural

Fnatic is a leading global esports performance brand headquartered in London, focused on leveling up gamers. At the core of Fnatic’s success is its best-in-class data team. The Fnatic data team relies on third-party applications to serve different business functions with every member of the organization utilizing data daily. While having access to an abundance of data is great, it opens up a degree of complexity when it comes to answering critical business questions and in-game analytics for gaming members.

To answer these questions, the data team began constructing a data stack to solve these use cases. Since the team at Fnatic are big fans of open-source they elected to build their stack with popular open-source technologies.

Fnatic’s Data Stack

Airbyte
Airflow
Clickhouse
Grafana
Metabase
PostgreSQL

FAQ

Plural is open-source and self-hosted. You retain full control over your deployments in your cloud. We perform automated testing and upgrades and provide out-of-the-box Day 2 operational workflows. Monitor, manage, and scale your configuration with ease to meet changing demands of your business. Read more.

We support deploying on all major cloud providers, including AWS, Azure, and GCP. We also support all on-prem Kubernetes clusters, including OpenShift, Tanzu, Rancher, and others.

No, Plural does not have access to any cloud environments when deployed through the CLI. We generate deployment manifests in the Plural Git repository and then use your configured cloud provider's CLI on your behalf. We cannot perform anything outside of deploying and managing the manifests that are created in your Plural Git repository. However, Plural does have access to your cloud credentials when deployed through the Cloud Shell. Read more.