The `kubectl drain node` Command: A Complete Guide
Running kubectl drain node on a single cluster is manageable but operationally fragile. For platform teams responsible for dozens or hundreds of clusters, manual drains do not scale. Coordinating rolling node maintenance across environments quickly becomes slow, error-prone, and disruptive, tying up engineers with repetitive work and increasing risk during upgrades or incident response. While understanding the mechanics of kubectl drain is foundational, real efficiency comes from automating the entire node lifecycle.
This article starts with a precise breakdown of how the command works, then moves into designing automated, repeatable workflows that let teams manage Kubernetes node maintenance safely and consistently at fleet scale, with Plural as the control plane for enforcing those workflows.
Unified Cloud Orchestration for Kubernetes
Manage Kubernetes at scale through a single, enterprise-ready platform.
Key takeaways:
- Drain nodes safely before maintenance: The
kubectl draincommand is the standard procedure for preparing a node for updates by first marking it unschedulable (cordon) and then gracefully evicting workloads while respecting Pod Disruption Budgets to prevent service interruptions. - Use command flags for precise control: A successful drain depends on using the right options. Use
--dry-run=serverto simulate the eviction,--ignore-daemonsetsto protect critical node-level services, and--grace-periodto give stateful applications enough time to shut down cleanly. - Automate node lifecycle management at scale: Manually draining nodes across a large fleet is inefficient and error-prone. Use a platform like Plural to codify maintenance into automated, GitOps-driven workflows that provide centralized visibility and control over your entire infrastructure.
What Is kubectl drain and When Should You Use It?
kubectl drain is a core operational command used to safely prepare a Kubernetes node for maintenance or removal. Its purpose is to evict all schedulable pods from a node while preserving application availability. The command first cordons the node—marking it unschedulable—then evicts pods in a controlled manner, respecting Pod Disruption Budgets (PDBs) and pod termination grace periods. This ensures workloads are rescheduled elsewhere without triggering unnecessary downtime or replica loss.
From an operational standpoint, kubectl drain is not a forceful shutdown mechanism. It is a coordination tool between the scheduler, controllers, and your availability constraints. Used correctly, it allows you to take nodes offline without violating SLOs or disrupting user traffic.
The Role of kubectl drain in Node Maintenance
In production clusters, node maintenance is unavoidable: OS patching, kernel upgrades, runtime updates, or hardware replacement. kubectl drain is the standard mechanism for making a node safe to touch. By evicting pods gracefully, it gives applications time to shut down cleanly, flush state, and terminate connections. This avoids abrupt pod termination, which can cause data loss, failed requests, or cascading failures in dependent systems. As a result, kubectl drain is a foundational primitive in any reliable node lifecycle or maintenance workflow.
Common Scenarios for Draining a Node
The most common use case is planned maintenance, but kubectl drain is also essential in several other scenarios. When decommissioning nodes—such as scaling down a cluster or retiring aging infrastructure—draining ensures all workloads are migrated before the node is removed. During rolling upgrades, nodes are typically drained one at a time, upgraded, and returned to service to maintain overall cluster availability. Draining is also useful for isolating node-specific issues, such as hardware degradation or networking faults, by removing active workloads before deeper investigation.
How Does kubectl drain Work?
kubectl drain coordinates a controlled workflow to remove workloads from a node without disrupting service availability. It is not a blunt termination tool; it combines scheduler isolation with policy-aware pod eviction to prepare a node for maintenance, upgrades, or decommissioning. The process ensures no new workloads land on the node and that existing pods are terminated in a way that respects your availability and shutdown guarantees.
The Pod Eviction Process Explained
When you run kubectl drain, the first step is to cordon the node, marking it unschedulable so the scheduler will not place new pods on it. Once isolated, Kubernetes begins evicting existing pods. Evictions are API-driven and PDB-aware: if removing a pod would violate a PodDisruptionBudget, the eviction is delayed or blocked. This makes kubectl drain safe for production workloads, as it enforces the availability constraints you have explicitly defined.
How Kubernetes Terminates Pods Gracefully
Each evicted pod follows Kubernetes’ graceful termination flow. The kubelet sends SIGTERM to container processes and waits for the configured terminationGracePeriodSeconds. This window allows applications to complete in-flight requests, flush buffers, and persist state. If the pod does not exit in time, Kubernetes escalates to SIGKILL. This mechanism minimizes data loss and request failures while pods are rescheduled onto healthy nodes.
Preventing New Pods with cordon
Cordoning is the prerequisite that makes draining reliable. By marking the node as unschedulable, Kubernetes ensures no new pods are assigned during the eviction process. Without this step, the scheduler could place fresh workloads on the node while you are trying to empty it, undermining the operation. kubectl drain enforces this isolation first so that, once evictions complete, the node remains empty and safe to take offline.
Key kubectl drain Command Options
kubectl drain provides a set of flags that control how pod eviction is performed. In production environments, these options are not optional details—they determine whether a drain is safe, predictable, and aligned with your availability guarantees. Understanding and using them correctly is essential for reliable node maintenance and for avoiding accidental downtime or data loss.
Control Eviction with Grace Periods and Timeouts
The --grace-period flag defines how long Kubernetes waits for each pod to terminate gracefully during eviction. While pods have a default terminationGracePeriodSeconds, some workloads need more time to flush state, complete requests, or close long-lived connections. For example, kubectl drain node-1 --grace-period=120 allows two minutes for shutdown. Setting the value to -1 tells kubectl drain to fully respect each pod’s configured grace period instead of overriding it. Choosing the correct value is critical to preventing forced termination of stateful or latency-sensitive applications.
Manage DaemonSets with --ignore-daemonsets
By default, kubectl drain refuses to proceed if the node is running pods managed by a DaemonSet. This is intentional: DaemonSets typically provide node-level services such as logging, metrics, or networking, and evicting them is rarely desirable. The --ignore-daemonsets flag instructs kubectl drain to skip these pods and continue evicting all others. This allows node maintenance to proceed while preserving essential cluster infrastructure that must remain present on every node.
Use --force and --dry-run Safely
The --force flag enables eviction of pods that are not managed by a higher-level controller. These bare pods will not be recreated elsewhere, so using this flag effectively deletes them. In most production setups, this is a last resort and should only be used with full awareness of the consequences.
Before running any drain, --dry-run=server should be part of your workflow. It executes the drain logic on the API server without making changes, showing exactly which pods would be evicted and which constraints would apply. This makes it possible to validate the impact of a drain operation in advance and avoid surprises during live maintenance.
What Happens to Your Workloads When You Drain a Node?
Running kubectl drain triggers a controlled workload migration, not an abrupt shutdown. The Kubernetes control plane orchestrates pod eviction and rescheduling to maintain availability while a node is taken out of service. This process relies on built-in scheduling constraints and availability guarantees to ensure workloads continue running on healthy nodes with minimal disruption.
How Pods Are Rescheduled to Healthy Nodes
The drain operation starts by marking the node unschedulable, preventing new pods from landing on it. Existing pods are then evicted via the Kubernetes eviction API. Once evicted, the scheduler immediately attempts to place replacement pods on other nodes, honoring all scheduling constraints such as node affinity, taints, tolerations, and resource requests. This rescheduling is fully automated and requires no manual intervention, assuming the cluster has sufficient capacity.
Minimizing Application Downtime
During eviction, pods undergo Kubernetes’ standard graceful termination flow. Containers receive SIGTERM and are given time to shut down cleanly before being forcibly terminated. In parallel, controllers like Deployments and StatefulSets create replacement pods to maintain the desired replica count. In well-provisioned clusters, new pods often become ready before the old ones fully exit, keeping effective service capacity stable throughout the drain.
Protecting Applications with Pod Disruption Budgets (PDBs)
Pod Disruption Budgets define how many replicas of an application must remain available during voluntary disruptions such as node drains. kubectl drain strictly enforces these budgets. If evicting a pod would violate a PDB, the drain pauses until doing so is safe. This mechanism prevents accidental mass eviction of critical workloads and makes PDBs a key prerequisite for safe, automated node maintenance at scale.
How to Safely Drain a Node: A Pre-Flight Checklist
Draining a node is a routine but high-impact operation in Kubernetes lifecycle management. While kubectl drain is mechanically simple, a safe drain depends on preparation. Skipping checks can leave pods stuck, violate availability guarantees, or complicate recovery. Treat node drains as a planned operation with explicit preconditions, not an ad hoc command.
Perform Pre-Drain Health Checks
Start by validating cluster capacity. Ensure the remaining nodes have sufficient CPU, memory, and allocatable resources to absorb evicted workloads. If the cluster is already near saturation, drains will push pods into Pending due to scheduling failures. Also check for existing issues: pods in CrashLoopBackOff, failed PersistentVolumeClaims, or degraded controllers. Draining does not resolve these problems and often makes root-cause analysis harder. With Plural, you can assess cluster-wide resource utilization and workload health from a single control plane before initiating maintenance.
Monitor the Eviction Process in Real Time
Drains are asynchronous and bounded by pod termination grace periods. Actively monitor progress to confirm pods are terminating and rescheduling as expected. Commands like kubectl get pods -o wide or watching pods on the draining node help identify stalled evictions, especially pods stuck in Terminating. These usually indicate finalizer issues, blocked PDBs, or storage constraints. Plural simplifies this by providing a live, unified view of pod movement across clusters, reducing the need to manually correlate state across multiple terminals.
Account for Storage and Network Dependencies
Successful drains depend on workload portability. Pod Disruption Budgets must be defined for critical services; otherwise, drains may either fail or unintentionally reduce availability. Storage is a common blocker: pods backed by node-local volumes cannot be rescheduled elsewhere. Similarly, strict node affinity, taints, or network policies can leave the scheduler with no valid placement targets. Identifying these constraints ahead of time is essential. A drain that stalls mid-operation is usually a sign that one of these dependencies was overlooked.
How to Solve Common kubectl drain Problems
Even with proper preparation, kubectl drain can fail or stall during node maintenance. Most issues fall into predictable categories: pods that cannot be evicted, stateful workloads with stricter guarantees, and storage-related constraints. Knowing how to diagnose and resolve these problems is essential for keeping maintenance operations moving without compromising cluster stability.
These manual fixes are part of every Kubernetes operator’s toolkit, but they also underscore why node lifecycle management becomes difficult at scale. Platforms like Plural reduce this risk by enforcing pre-flight checks and providing centralized visibility across clusters. When problems do occur, the following approaches help unblock a drain safely.
Fix Pods Stuck in a Terminating State
Pods stuck in Terminating are a common reason drains fail to complete. The usual causes are overly restrictive Pod Disruption Budgets or blocking finalizers. Start by listing PDBs with kubectl get pdb to see whether evicting the pod would violate availability constraints. If so, the drain must wait or the PDB must be adjusted.
If PDBs are not the issue, inspect the pod manifest for finalizers. Finalizers are cleanup hooks that must complete before deletion. If the responsible controller is unhealthy or unreachable, the pod can remain stuck indefinitely. In these cases, identifying and resolving the underlying controller issue is preferable to forcibly removing the finalizer.
Drain Nodes Running Stateful Applications
Stateful workloads require additional care during drains. Pods managed by StatefulSets have stable identities and persistent storage, which makes eviction more constrained than for stateless Deployments. Drains may stall if evicting a pod would violate availability or ordering guarantees.
The correct approach is to define appropriate Pod Disruption Budgets for StatefulSets so Kubernetes can move replicas safely. In some cases, operators may need to delete StatefulSet pods sequentially, allowing the controller to recreate them on healthy nodes before continuing the drain. This preserves ordering and minimizes risk to stateful data.
Manage Persistent Volumes During a Drain
Storage configuration directly affects drain behavior. Pods using emptyDir volumes store data on the node itself, which will be lost when the pod is evicted. For this reason, kubectl drain blocks by default. If the data is truly ephemeral, the --delete-emptydir-data flag allows the drain to proceed.
For pods backed by PersistentVolumes, the eviction process removes the pod but leaves the volume intact. When the pod is rescheduled, it reattaches to the same volume, assuming the storage supports multi-node access or dynamic reattachment. The eviction API still enforces PDBs, ensuring that stateful workloads backed by persistent storage are migrated in a controlled, availability-safe manner.
How to Bring a Drained Node Back into Service
Once maintenance is complete, the node must be safely reintegrated into the cluster. This is a deliberate process: you re-enable scheduling, confirm the node’s readiness, and validate overall cluster health. Skipping verification risks reintroducing an unhealthy node or leaving workloads imbalanced.
Uncordon the Node to Resume Scheduling
Reversing a drain starts by making the node schedulable again. This is done with kubectl uncordon, which removes the unschedulable state set during the drain. After this step, the scheduler can place new pods on the node.
kubectl uncordon NODE_NAME
This command is the functional inverse of kubectl cordon and formally ends the node’s maintenance window.
Verify the Node Is Ready
After uncordoning, confirm that the control plane recognizes the node as schedulable and healthy.
kubectl get nodes
The node should report a status of Ready without SchedulingDisabled. If it does not, investigate kubelet health, node conditions, or underlying infrastructure issues before proceeding.
Validate Cluster Health After Maintenance
Reintroducing the node is only part of the process. You must also ensure the cluster has fully stabilized. Pods that were rescheduled during the drain may rebalance, and any previously blocked workloads may now schedule.
Key checks include:
kubectl get nodes
kubectl get pods --all-namespaces
All nodes should be Ready, and pods should be in Running or Completed states. This final validation confirms that maintenance is complete, workloads are healthy, and the cluster has returned to its expected steady state.
How to Troubleshoot kubectl drain Failures
Even with careful preparation, kubectl drain can fail in ways that are not immediately obvious. Most failures stem from three areas: insufficient permissions, blocked pod evictions, or unsafe flag usage. Effective troubleshooting starts by identifying which category you are in and addressing the root cause rather than forcing the operation through.
Resolve Permission and RBAC Errors
kubectl drain requires more than node-level access. It creates pod eviction requests, which means the caller must have permission to create the pods/eviction subresource. If these permissions are missing, the command fails with a forbidden error referencing pod evictions.
The fix is to grant the appropriate RBAC permissions to the user or service account performing maintenance. In large environments, RBAC drift across clusters is a common source of failure. Plural helps reduce this risk by managing RBAC policies as a global, declarative service, ensuring consistent permissions are enforced across your entire Kubernetes fleet.
Address Pod Eviction Failures and Timeouts
When permissions are correct but drains still stall, application-level protections are usually responsible. Pod Disruption Budgets are the most common blocker. If evicting a pod would violate a PDB, Kubernetes intentionally blocks the eviction to preserve availability. Resolving this typically requires temporarily scaling the workload, adjusting the PDB, or scheduling maintenance during a lower-traffic window.
Another frequent cause is pods stuck in Terminating due to finalizers. Finalizers prevent deletion until cleanup logic completes. If the controller responsible for that cleanup is unhealthy, the pod will never exit, and the drain will eventually time out. Inspect the pod manifest for finalizers and verify the health of the associated controller before taking corrective action.
Know When to Use --force vs. a Graceful Shutdown
Using --force is almost never the correct first response. It forcibly deletes pods that are not managed by a controller and bypasses graceful shutdown, which can result in data loss or corrupted application state. In contrast, increasing the shutdown window with --grace-period allows applications to terminate cleanly and is usually sufficient.
If a drain is blocked, the safest path is to understand why eviction is failing and resolve that condition directly. Resorting to --force should be limited to exceptional cases where the implications are fully understood and acceptable.
Automate Node Lifecycle Management with Plural
While kubectl drain is an essential command for managing individual nodes, its manual nature presents significant challenges when operating at scale. Performing maintenance across a fleet of Kubernetes clusters—draining nodes, applying updates, and bringing them back online—can quickly become a complex and error-prone task. This process requires careful coordination to avoid application downtime and ensure that workloads are gracefully rescheduled. Manually running commands and monitoring their output across dozens or hundreds of nodes simply doesn't scale.
To address this, platform teams need a way to automate node lifecycle management and maintain a clear view of their entire infrastructure from a single control plane. Plural provides the tools to build robust, automated maintenance workflows and offers centralized visibility to monitor these operations across your entire fleet. By shifting from manual commands to a declarative, GitOps-driven approach, you can manage node maintenance with the same consistency and reliability as your application deployments. This ensures that routine tasks like OS patching, kernel upgrades, or hardware decommissioning are handled efficiently without disrupting service availability.
Create Automated Node Maintenance Workflows
The process of using kubectl drain involves safely evicting pods to prepare a node for maintenance. While effective, executing this manually for each node is tedious and introduces risk. A typo or a missed step can lead to application downtime or data loss. Plural allows you to codify and automate these procedures using its Infrastructure as Code management capabilities.
By defining your node maintenance sequences in Terraform or other IaC tools, you can create repeatable workflows that handle draining, updating, and uncordoning nodes automatically. These workflows are managed through Plural’s Continuous Deployment engine, which uses GitOps principles to ensure every change is version-controlled and auditable. A simple commit to a Git repository can trigger a fleet-wide rolling update, with Plural orchestrating the drain and validation steps for each node in a controlled manner.
Gain Centralized Visibility Across Your Fleet
After initiating a drain, you need to verify that pods have been successfully moved and the node is empty. Checking pod status with kubectl is straightforward for a single node but becomes impractical across a large fleet. Without a centralized view, engineers are left switching between terminals and contexts, making it difficult to track the progress of maintenance operations or quickly identify issues.
Plural solves this with its embedded Kubernetes dashboard, which provides a single pane of glass for your entire Kubernetes environment. From the Plural console, you can monitor the pod eviction process in real-time across all clusters, see where workloads are being rescheduled, and confirm that nodes are ready for maintenance. This unified view eliminates the need to juggle multiple kubeconfigs and provides a clear, consolidated picture of your fleet's health before, during, and after any maintenance activity.
Related Articles
Unified Cloud Orchestration for Kubernetes
Manage Kubernetes at scale through a single, enterprise-ready platform.
Frequently Asked Questions
What's the difference between kubectl cordon and kubectl drain? Think of cordon as the first step in the drain process. The kubectl cordon command simply marks a node as unschedulable, which tells the Kubernetes scheduler not to place any new pods on it. The node's existing workloads continue to run. The kubectl drain command does this first and then proceeds to safely evict all the existing pods from that node, ensuring they are rescheduled elsewhere. You would use cordon by itself only if you wanted to prevent new deployments on a node without yet disturbing the pods already running there.
What happens if a drain fails midway through the process? If a drain fails, the node is left in a partially drained state. It will remain cordoned, meaning no new pods can be scheduled on it. However, any pods that were not successfully evicted before the failure will continue running on that node. To resolve this, you must first diagnose the cause of the failure, which is often a restrictive Pod Disruption Budget or a pod that is stuck terminating. After fixing the underlying issue, you can safely re-run the drain command to complete the eviction process for the remaining pods.
Will draining a node cause me to lose data from my Persistent Volumes? No, the drain command does not delete data stored in your Persistent Volumes (PVs). The process only detaches the pod from its volume. When the pod is rescheduled onto a healthy node, it will re-attach to the same PV, assuming your storage is network-accessible. The command is designed to fail by default if it encounters pods using emptyDir volumes, as that data is stored directly on the node and would be lost upon eviction. This behavior acts as a safeguard against accidental data loss.
Why won't kubectl drain work on nodes with DaemonSet pods by default? This is an intentional safety mechanism. DaemonSets are responsible for running essential, node-level services like logging, monitoring, or networking agents on every node in the cluster. Evicting these pods could disrupt critical cluster functions. Therefore, the drain command requires you to explicitly acknowledge this by using the --ignore-daemonsets flag. This confirms you intend to proceed with the drain while leaving these specific pods untouched on the node.
How can I prevent my team from making mistakes when draining nodes? Relying on manual kubectl commands for routine maintenance introduces a high risk of human error, especially across a large fleet of clusters. The most effective way to ensure safety and consistency is to automate the entire node lifecycle workflow. With Plural, you can define node maintenance procedures as code and manage them through a GitOps-driven process. This creates a repeatable, auditable workflow with built-in checks and centralized visibility from a single dashboard, which significantly reduces the risks associated with manual operations.