Deploying Airflow on Kubernetes
A step-by-step tutorial on how to deploy Apache Airflow on Kubernetes using Plural's cloud shell.
Apache Airflow is a workflow management system that allows you to orchestrate sequenceable tasks. It is commonly used for ETL workflows, triggering machine learning jobs, and running DevOps operations such as backup and restore.
Engineering teams decide to run Apache Airflow from within a Kubernetes cluster to take advantage of the increased stability and autoscaling options that Kubernetes provides.
Currently, there are three popular options for deploying Airflow in production environments.
- Use a managed service of Airflow on your cloud provider. If you were to look into AWS and its pricing, they charge upwards of $0.99 an hour or $8,672/year per instance which is $17,500 considering Airflow for at least non-prod and prod instances.
- Building it on your own using an EC2 instance would cost $3,363/year for the EC2. Times two for two environments, let's say $6,700, and if you prepay for the instance that cost would be around $4,000. While this number doesn’t sound as bad as the above one, it doesn’t factor in the engineering and operational support your team will have to do themselves.
- Deploy your own Airflow instance on Kubernetes.
The third option has achieved some consensus as the best way to deploy Airflow in production. However, Airflow is a fairly complex stateful application, with a SQL database and a Redis cache, which makes for a tricky setup. Previously, we discussed deploying Apache Airflow on Kubernetes using the Plural CLI.
This article will go over setting up a fresh Kubernetes cluster and installing Airflow onto that cluster using Plural, a free, open-source, Kubernetes DevOps platform that allows you to deploy Kubernetes clusters and open-source applications with little to no management experience necessary.
Before you install Airflow onto a Kubernetes cluster, you will need to create an account with Plural. While it may feel weird to create an account for an open-source tool, by doing so we can provide the following benefits:
- A cloud shell that bypasses the need for a local terminal,
- A source of truth for your installations,
- Auth tracking to make sure that a user has permission to edit and configure the Plural installation,
- An audit log viewer
- … and much more.
To do so, head over to app.plural.sh and follow the on-screen instructions. If you prefer to use Plural in your command line, follow our quickstart guide here.
Note: For this tutorial, we will be using our own GCP cloud credentials since we already have a service account set up with GCP. Check out our documentation to quickly set up your cloud provider to integrate with our platform (and make sure to note the download location of the created service account.)
Airflow on Kubernetes Installation
- After signing up for Plural, you will be taken through our onboarding experience.
2. Click on use your own cloud.
3. Create a GitHub or GitLab repository to store the state of the deployment. Plural manages all cluster configurations via Git and will provision a GitHub repository on your behalf. This repository is set up using scoped deploy keys to store the state of your workspace, and no OAuth credentials are persisted.
4. Choose your cloud provider. You’ll be prompted to enter your service account cloud credentials. Plural is a solution that deploys and manages infrastructure in a user’s cloud environment, it needs relatively high levels of access to your cloud environment. As a result, you need to provide a service account to Plural so that it can authenticate against your cloud environment.
5. Enter a unique cluster name to be created for the deployment. Then, provide a unique bucket prefix and a subdomain for DNS creation.
6. Review that the information you entered is correct, and if so click create. Note: This step can take a few minutes.
7. Plural will spin up a cloud shell environment for you and now you’ll be prompted to select the applications you want to install on a fresh Kubernetes cluster. For this demo, we’ll be installing Airflow and the Plural console (a web-based dashboard that allows you to manage all your Plural applications and clusters in one place.)
8. Set up some basic workspace configuration to ensure we name the cluster to add. You’ll first be prompted to enter a VPC name for the Airflow deployment to reside in.
9. Enter the storage bucket information specific to your cloud environment if prompted (this is for the plural console which provides application health data and automated application upgrades.)
10. If you earlier chose to install the Plural console alongside your Airflow installation you’ll be prompted to configure your Plural console environment.
11. Enter a hostname for you to access your Airflow deployment as well as the DAG Repo and branch for Plural to access. You’ll also be prompted to enter a username and first and last name for your Airflow account. Note: It's recommended to name the hostname after the application.
12. Click Install and Plural will begin deploying the Plural console and Airflow automatically. Plural will run ‘plural build’ and ‘plural deploy’ on your behalf. You can follow the progress through the log output on the right-hand side. This process normally takes 10-15 minutes.
13. Once everything is up and running your screen should look similar to this:
Accessing your Airflow Deployment
Once you are in your Plural console click on the launch button next to Airbyte to access your Airbyte installation. If you set up Plural OIDC earlier you don’t need to worry about managing authentication for logging into the console or the Airflow UI.
Once you allow access you'll enter the Airflow UI where you can go ahead and schedule your DAGs.
Accessing the Plural Console
If you set up Plural OIDC earlier you don’t need to worry about managing authentication for logging into the console or the Airflow UI.
The Plural console acts as your command center for your Plural applications. It comes with a lot of out-of-box functionalities such as:
- Runbooks: Recommended settings and optimal operating procedures for running your application.
- Components: Statuses for each individual component in your Airbyte deployment and Kubernetes cluster. Inside this, you can look into Pod logs and events and use them to drill down into the root cause of any problems with your Kubernetes cluster.
- Nodes: Graphs and detailed information about the utilization of resources and deployments on each node.
- Incidents: View incidents that Plural automatically creates on component failure. Access a direct connection with our support team to troubleshoot any issues.
- Dashboards: View charts that have been tailored for your application. Every Plural application will ship with its own custom console dashboard.
Next Steps with Plural
Through this article, you have learned how to:
- Create a Plural Git repository to store your infrastructure configuration
- Provision a fully configured Kubernetes cluster with no management experience necessary
- Install an instance of Airflow on your fresh Kubernetes cluster
Are you looking to get your Airflow instance up and running on Kubernetes with minimal effort?
Reach out to me and the rest of the team over at Plural to learn more about how Plural works and how we are helping engineering teams worldwide deploy open-source applications in a cloud production environment!
Join our Discord community for deployment help, discussion, and meeting other Plural users.
Ready to effortlessly deploy and operate open-source applications in minutes? Get started with Plural today.
Be the first to know when we drop something new.