Kubernetes cluster hardware setup.

Install a Kubernetes Cluster: A Step-by-Step Guide

Learn how to install a Kubernetes cluster with this step-by-step guide, covering everything from setup to deployment for a seamless Kubernetes experience.

Michael Guarino
Michael Guarino

Installing Kubernetes isn’t just about running a few commands—it’s about building a reliable distributed system. Misconfigured networking, incompatible component versions, or subtle hardware issues can cause persistent, hard-to-trace problems. A successful setup requires more than automation; it demands a methodical approach that covers system prerequisites, configuration best practices, and verification at every step. This guide focuses on installing a production-ready Kubernetes cluster from the ground up—going beyond the defaults to ensure your environment is stable, secure, and workload-ready.

Unified Cloud Orchestration for Kubernetes

Manage Kubernetes at scale through a single, enterprise-ready platform.

GitOps Deployment
Secure Dashboards
Infrastructure-as-Code
Book a demo

Key takeaways:

  • A stable cluster is built on a solid foundation: The initial setup process—from environment preparation and control plane initialization to CNI configuration and node registration—is critical. Getting these steps right prevents complex troubleshooting and ensures your cluster is ready for workloads.
  • Move from operational to production-ready with security and observability: A running cluster is not secure by default. You must implement RBAC, network policies, and robust monitoring to protect your workloads and gain the visibility needed for effective, day-to-day operations.
  • Automate fleet management to scale effectively: The manual processes used to build one cluster do not scale to a fleet of clusters. Adopt a centralized platform like Plural to automate deployments, upgrades, and monitoring, ensuring consistency and control across all your environments.

What Is Kubernetes, and Why Build Your Own Cluster?

Before jumping into setup, it’s worth understanding what Kubernetes is and why you might choose to build your own cluster rather than rely on a managed service. While offerings like EKS, GKE, and AKS simplify operations, running Kubernetes yourself gives you complete control and a deeper understanding of how the system works. This is invaluable whether you're maintaining a single environment or scaling across multiple clusters.

What Kubernetes Does

Kubernetes is an open-source container orchestration system that automates deployment, scaling, and operations of containerized applications. It abstracts away infrastructure concerns, letting you declare the desired state of your workloads—then works continuously to enforce that state. Behind the scenes, it handles container scheduling, service discovery, health checks, and self-healing. Kubernetes runs consistently across on-prem, edge, and cloud, making it a foundational layer for hybrid and multi-cloud platforms.

Why Install Kubernetes Yourself

Using a managed service offloads cluster lifecycle management, but self-hosting unlocks full visibility and control. It’s the most direct way to learn how the control plane, networking, storage, and workload orchestration actually work. That knowledge pays off when diagnosing issues, optimizing performance, or enforcing strict security standards.

Self-managed clusters also let you customize components—from the container runtime to the CNI plugin and API server flags—to match your specific use case. While setting up a single cluster is a solid exercise, managing multiple clusters highlights the need for fleet-level tools and centralized control planes like Rancher or Karmada.

Prepare Your Environment for Kubernetes

Before initializing your cluster, you need a properly configured environment—hardware, OS, and networking included. Many cluster failures stem not from kubeadm itself, but from skipping these foundational steps. Issues like nodes failing to join, unreliable pod communication, or degraded performance often trace back to system misconfiguration. Taking the time to get this right will save hours of debugging later.

Tools like kubeadm simplify cluster bootstrapping, but assume your infrastructure meets certain prerequisites. That means ensuring your machines run a supported OS, have adequate resources, and are prepped for Kubernetes networking. This section walks through the baseline requirements to avoid common pitfalls and lay the groundwork for a stable cluster.

Hardware and OS Requirements

At minimum, Kubernetes requires Linux nodes (e.g., Ubuntu, CentOS, or Debian) with proper CPU and memory allocations. Control plane nodes need at least 2 CPUs and 2 GB of RAM, but realistically, for any meaningful workload or learning environment, you should aim higher. A machine with 16 GB RAM, a modern quad-core CPU, and SSD storage offers much smoother operation.

You can find the full system requirements in this kubeadm setup guide.

Networking Setup

Reliable networking is essential for pod communication and cluster health. All nodes must be able to reach each other over the network with no NAT or firewalls in the way. If your nodes have multiple interfaces or complex routes, you may need to override the default interface detection using the --apiserver-advertise-address flag during kubeadm init.

You’ll also need to prepare your system for the CNI plugin you plan to use. This includes enabling IP forwarding and loading the br_netfilter kernel module:

modprobe br_netfilter
sysctl -w net.bridge.bridge-nf-call-iptables=1
sysctl -w net.ipv4.ip_forward=1

These settings are crucial for routing traffic across the overlay network that links pods across nodes. Without them, your cluster might initialize, but pods won’t communicate reliably.

Initialize the Kubernetes Control Plane

The control plane is the core of your Kubernetes cluster, responsible for maintaining cluster state, scheduling workloads, and orchestrating node behavior. It includes components like the API server, controller manager, scheduler, and etcd. Bringing up the control plane is the first major step in making your cluster operational.

We’ll use kubeadm to bootstrap the control plane—it's the de facto tool for setting up Kubernetes clusters in a standard, modular way.

Bootstrapping with kubeadm init

Run kubeadm init on your designated control-plane node. This command sets up the cluster by:

  • Validating system prerequisites
  • Downloading and running control plane components as containers
  • Generating TLS certificates
  • Configuring etcd and the API server

For production environments or when planning high availability, pass the --control-plane-endpoint flag to define a stable, load-balanced endpoint for the API server. This becomes especially useful when adding additional control plane nodes or setting up external access.

kubeadm init --control-plane-endpoint "k8s-api.example.com:6443"

Once complete, kubeadm will output join tokens and the steps needed to configure your local kubectl client.

Set Up kubectl Access

After initialization, configure your kubectl context by copying the admin kubeconfig:

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

This kubeconfig file allows kubectl to authenticate and communicate with the API server. It contains admin credentials, so handle it securely.

Once set up, verify the control plane is live:

kubectl get nodes

While this manual setup is essential for learning and control, platforms like Plural provide a secure, SSO-integrated interface for managing cluster access without manually handling kubeconfig files—ideal for teams and production environments.

Set Up Cluster Networking

After the control plane is up, the cluster still lacks one critical component: networking. Without a Container Network Interface (CNI) plugin, pods can’t communicate across nodes. Kubernetes leaves this responsibility to the CNI layer, which establishes a unified network across the cluster. Installing and configuring a CNI plugin is essential—your cluster won’t function without it.

Choose a CNI Plugin: Calico, Flannel, or Weave Net

CNI plugins handle IP address management and pod-to-pod communication. The right choice depends on your environment:

  • Calico is ideal for production use, offering high performance and advanced network policies.
  • Flannel is lightweight and easy to deploy. It's well-suited for dev clusters or simple networking needs.
  • Weave Net offers encrypted traffic and automatic peer discovery, striking a balance between simplicity and features.

Each plugin has its own system requirements, so review their documentation before installation.

Install the CNI Plugin

Install your chosen CNI by applying its official manifest:

kubectl apply -f <manifest-url>

This typically creates the required DaemonSets and RBAC roles. It’s critical that the pod network CIDR passed to kubeadm init (via --pod-network-cidr) matches the expectations of your CNI plugin. A mismatch will prevent pods from receiving IPs, breaking inter-pod communication.

If you’re managing multiple clusters, tools like Plural help enforce consistent network configurations using global templates and centralized lifecycle management.

Verify Networking Is Functional

Successful installation doesn’t guarantee that your cluster networking is operational. Confirm everything is healthy:

Check node readiness:

kubectl get nodes

Nodes stuck in NotReady often point to CNI problems.

Check CNI pod health:

kubectl get pods -n kube-system

Make sure pods like calico-node, kube-flannel-ds, or weave-net are running.

Test pod connectivity:
Deploy two pods on different nodes and verify they can ping each other. For example:

kubectl run pod-a --image=busybox --restart=Never -- sh -c "sleep 3600"
kubectl run pod-b --image=busybox --restart=Never -- sh -c "sleep 3600"

Then exec into one pod and ping the other.

Verifying the overlay network early ensures your cluster is ready to schedule real workloads without hidden networking issues.

Add Worker Nodes to Your Cluster

With the control plane running and networking configured, the next step is to bring in worker nodes—the compute layer of your Kubernetes cluster. Worker nodes run the kubelet agent and a container runtime like containerd, allowing them to receive and execute workloads assigned by the control plane.

Each node joins the cluster via a secure handshake, registering itself with the control plane and enabling it to schedule pods. While doing this manually is useful for understanding the architecture, scaling across environments eventually requires automation. Tools like Plural or cloud-native auto-scaling integrations can handle lifecycle operations across clusters. But for now, we’ll walk through the manual join process.

Join the Cluster with kubeadm

During kubeadm init, a kubeadm join command was printed to the terminal. It includes:

  • The API server address
  • A bootstrap token
  • A discovery token CA hash (for verifying the control plane)

To add a worker node, SSH into the target machine and run the saved kubeadm join command:

kubeadm join <api-server>:6443 --token <token> \
    --discovery-token-ca-cert-hash sha256:<hash>

If you lost the original command or the token expired (defaults to 24 hours), regenerate it on the control plane with:

kubeadm token create --print-join-command

This will output a fresh, ready-to-run join command.

Verify Node Registration

Once the join process completes, check the node's status from the control plane:

kubectl get nodes

Your new worker node should appear in the list with a Ready status. If it's marked NotReady, investigate the networking setup—CNI misconfigurations are a common cause. Use:

kubectl describe node <node-name>

to dig into the node's conditions, taints, and runtime state.

Once the node is Ready, it’s eligible to run workloads—and your cluster is officially functional.

Secure and Optimize Your Cluster

Getting Kubernetes up and running is only the beginning. To make it production-ready, you need to secure access, monitor behavior, and plan for growth. Without these foundations, you risk security breaches, downtime, and resource contention. For example, missing RBAC rules can expose sensitive operations, while lack of observability leaves you blind to service failures. This section covers the key steps to harden, observe, and scale your cluster effectively.

Enforce Access Control with RBAC and Network Policies

Start by configuring Role-Based Access Control (RBAC), which lets you define precise permissions across users, groups, and service accounts. Apply the principle of least privilege—developers might get read-only access to their namespace, while SREs get admin privileges where needed.

Add NetworkPolicies to restrict pod-to-pod traffic and isolate workloads by default. Without these in place, all pods can communicate freely, increasing the risk of lateral movement in the event of a compromise.

In multi-cluster or multi-team environments, managing RBAC and network policies declaratively and at scale is essential. Platforms like Plural support policy-as-code workflows that sync security rules consistently across clusters.

Set Up Observability: Monitoring and Logging

Monitoring and logging should be foundational, not an afterthought. Install Prometheus for metrics and Grafana for visualization. This gives you real-time visibility into resource usage, application health, and cluster state.

For logs, integrate a centralized logging stack such as Fluent Bit, Elasticsearch, or Loki. Avoid relying on kubectl logs or node-level access in production.

Plural simplifies observability with a pre-integrated, SSO-enabled Kubernetes dashboard that proxies all traffic securely, eliminating the need to manage kubeconfig files or expose API servers publicly.

Plan for Scaling and High Availability

Production clusters require more than a single control-plane node. To avoid a single point of failure, deploy multiple control-plane nodes behind a load balancer and regularly back up etcd.

For workload stability, enforce resource requests and limits on all pods. This prevents noisy neighbor issues and helps the scheduler make informed decisions.

As usage grows, you’ll need to scale your cluster by adding nodes, rebalancing workloads, and upgrading components. Plural’s GitOps-driven control plane simplifies lifecycle management—from rotating node groups to automating add-on updates—while keeping infrastructure changes consistent and auditable.

Troubleshoot and Verify the Cluster

After installation, don’t assume your Kubernetes cluster is ready. Even with a clean setup, version mismatches, configuration errors, or environmental issues can surface. This final phase is about validating cluster health, diagnosing failures, and confirming that all nodes are ready to schedule workloads. Skipping it can result in hard-to-trace deployment failures later.

Fix Common Installation Errors

Manual setups with kubeadm often expose version drift or configuration mismatches. Kubernetes enforces a version skew policy: the kubelet must be no more than one minor version behind the control plane. Version mismatches here often result in node registration issues or control plane instability.

Another frequent point of failure is the admin.conf file, which contains credentials and API access info for kubectl. If it's missing, misplaced, or misconfigured, kubectl will fail. Make sure it's located at $HOME/.kube/config on the system you’re operating from, and that file permissions are correct.

For in-depth help, refer to the official troubleshooting guide.

Run Post-Install Health Checks

To check if core components are running:

kubectl get pods -n kube-system

Look for pods like etcd, kube-apiserver, kube-scheduler, and kube-controller-manager. All should be in the Running state. Pods in Pending or CrashLoopBackOff require immediate attention—common culprits include resource limits, CNI issues, or image pull failures.

While these manual checks are sufficient for a single cluster, platforms like Plural centralize observability and alerting across environments, so you’re not SSHing into nodes or repeating kubectl commands on every deployment.

Verify Node Readiness

Finally, confirm that all nodes have joined the cluster and are ready to run workloads:

kubectl get nodes

You should see your control plane and worker nodes with a Ready status. If a node appears as NotReady, it's often due to networking issues, missing CNI components, or problems with the kubelet.

To investigate further:

kubectl describe node <node-name>

This provides details on taints, conditions, resource pressure, and component status.

For teams managing multiple clusters, Plural offers a secure, role-based Kubernetes dashboard that aggregates node health and pod status across your fleet, removing the need to rotate kubeconfigs or rely solely on CLI access.

Deploy Your First Application

With your Kubernetes cluster initialized and nodes connected, the next step is to deploy a test workload. This isn’t just a milestone—it’s a practical end-to-end check that verifies your control plane, worker nodes, and CNI plugin are functioning together correctly. Deploying a simple web app like NGINX confirms that pods can be scheduled, images pulled, and containers started—validating the core setup.

Use kubectl to Deploy NGINX

Kubernetes workloads are typically managed using kubectl, the CLI for interacting with the Kubernetes API server. For your first deployment, use the following command to launch NGINX:

kubectl create deployment nginx --image=nginx

This creates a Deployment—a controller that ensures a defined number of pod replicas are running at all times. You can verify the deployment and running pods with:

kubectl get deployments
kubectl get pods

You should see the NGINX pod in the Running state. This confirms the node was able to pull the container image and start the workload.

To test Kubernetes' native scaling capabilities:

kubectl scale deployment nginx --replicas=3

This tells the scheduler to maintain three running NGINX pods. Behind the scenes, Kubernetes will automatically balance them across available worker nodes.

Expose Your Application

By default, your NGINX service runs inside the cluster with no external access. To expose it, you can create a Service or set up an Ingress controller, which handles routing of HTTP(S) traffic from outside the cluster to services within.

Installing an Ingress controller like NGINX Ingress Controller or Traefik is often the next step in enabling external access.

Beyond the Basics: Visibility and Fleet Management

As you scale beyond a single cluster or application, kubectl alone becomes insufficient for managing complexity. You’ll need a GitOps workflow, continuous deployment tooling, and cluster-wide visibility.

While the open-source Kubernetes Dashboard exists, configuring secure access can be error-prone. Plural addresses this with an embedded, secure, SSO-integrated dashboard. It uses an egress-only agent to proxy requests, removing the need to expose API servers or manage kubeconfig files. RBAC rules for dashboard access can be defined via YAML and synchronized across clusters—simplifying governance at scale.

Simplify Cluster Management with Plural

Standing up your first Kubernetes cluster is a milestone, but operating fleets of clusters across environments introduces an entirely new layer of complexity. Manual processes don’t scale, and stitching together tooling for deployment, monitoring, and security quickly becomes brittle. What you need is a unified control plane.

Fleet Management with Plural

Plural offers a centralized platform for managing Kubernetes clusters across cloud, on-prem, and edge environments. It connects securely to any cluster using an egress-only, agent-based architecture—no inbound ports, no VPNs, and no exposed API servers.

With Plural, your team gets a consistent, SSO-integrated interface to monitor infrastructure, manage RBAC, and deploy applications—without context switching or custom scripts. It abstracts the operational overhead so engineers can focus on building and shipping software, not babysitting clusters.

GitOps-Driven Operations at Scale

Plural’s fleet-scale GitOps engine automates deployments and keeps infrastructure state in sync with version-controlled config. This enables reproducible environments and safe rollouts across dozens or hundreds of clusters.

One enterprise cybersecurity provider used Plural to reduce Kubernetes upgrade cycles from three months to one day. That kind of acceleration frees up senior engineers and enables mid-level teams to operate infrastructure with confidence. For day-to-day troubleshooting, the built-in Kubernetes dashboard provides secure, read-only visibility into all your clusters, without violating GitOps discipline or introducing operational risk.

Unified Cloud Orchestration for Kubernetes

Manage Kubernetes at scale through a single, enterprise-ready platform.

GitOps Deployment
Secure Dashboards
Infrastructure-as-Code
Book a demo

Frequently Asked Questions

Why should I build a cluster manually instead of using a managed service like EKS or GKE? Building a cluster yourself provides a deep, practical understanding of how Kubernetes components interact, which is invaluable for effective troubleshooting later on. While managed services are great for production, the hands-on experience of configuring the control plane, networking, and nodes from scratch demystifies the architecture. This knowledge gives you complete control over your environment, allowing you to fine-tune performance and security settings in ways that managed services often restrict.

I ran kubeadm init but didn't save the join command. How can I add a new worker node? This is a common situation. The initial bootstrap token generated by kubeadm init expires after 24 hours for security reasons. You can generate a new, valid join command at any time by running kubeadm token create --print-join-command on your control plane node. This will output a fresh command with a new token that you can use to securely connect additional worker nodes to your cluster.

After installing a CNI plugin, my worker nodes are stuck in a NotReady state. What are the most common causes? A NotReady status after a CNI installation almost always points to a networking problem. The first step is to check the logs of the CNI pods themselves, which typically run in the kube-system namespace. Also, verify that the pod network CIDR you specified during the kubeadm init command matches the network range the CNI plugin is configured to use. A mismatch here is a frequent cause of failure, as it prevents the CNI from assigning IP addresses to pods on the node.

This process works for one cluster, but how do you manage configurations like networking and RBAC across an entire fleet? Managing configurations consistently across many clusters is a significant operational challenge that manual methods don't solve. This is where a fleet management platform becomes essential. For example, Plural's Global Services feature allows you to define a single resource, such as a standard set of RBAC policies or a specific CNI configuration, and automatically apply it to hundreds of clusters. This ensures uniformity and eliminates configuration drift without requiring you to manually apply YAML to each cluster.

How can I give my team visibility into the cluster without sharing administrative kubeconfig files? Sharing administrative kubeconfig files is a security risk and doesn't scale for a team. A better approach is to use a centralized dashboard with proper access controls. Plural provides an embedded Kubernetes dashboard that integrates with your company's SSO provider. It uses a secure, egress-only agent architecture, so you don't need to expose your cluster's API server. Access is managed through standard Kubernetes RBAC, allowing you to create roles and bind them to user or group identities from your SSO, providing secure, read-only access for troubleshooting.

Guides