How to Deploy to Kubernetes: A Step-by-Step Guide
Deploying an application to Kubernetes is relatively simple in a small, isolated environment. But for platform and DevOps teams managing multiple applications across a large fleet of clusters, that simplicity quickly gives way to complexity. Ensuring that every deployment adheres to security policies, resource limits, and configuration standards—across teams and environments—is a significant challenge.
What starts as a basic deployment task often becomes entangled with the broader need for consistency, visibility, and control at scale. To do it right, you need more than just a working YAML file—you need a reliable, auditable, and automated workflow.
This guide breaks down the core building blocks of a Kubernetes deployment and highlights the real-world considerations that come into play when managing deployments across an organization. Whether you're launching your first service or scaling enterprise-wide infrastructure, the practices outlined here will help you build a scalable, secure, and efficient deployment pipeline that works across teams and clusters.
Unified Cloud Orchestration for Kubernetes
Manage Kubernetes at scale through a single, enterprise-ready platform.
Key takeaways:
- Understand the Building Blocks: A Deployment is a controller for Pods, ReplicaSets, and Services. Mastering how these components interact is critical for effective troubleshooting and managing the entire application lifecycle, from initial rollout to network exposure.
- Choose the Right Lifecycle Strategy: Go beyond default settings by selecting a deployment strategy (like rolling updates for zero downtime or blue/green for high-stakes releases) that fits your application's needs. Combine this with autoscaling and rollback plans to build a resilient system.
- Scale Beyond
kubectl
with Automation: Manualkubectl
commands don't scale across a fleet of clusters. Adopt a GitOps workflow on a unified platform like Plural to automate deployments, gain centralized visibility, and enforce consistent configurations, turning manual tasks into a repeatable, auditable process.
What Is Kubernetes and Why Use It?
Kubernetes is an open-source platform that automates the deployment, scaling, and management of containerized applications. It groups related containers into logical units called Pods, making them easier to discover, manage, and maintain. For platform and DevOps teams, Kubernetes offers a robust and extensible framework for operating distributed systems at scale.
The core value of Kubernetes lies in its ability to manage complexity. Tasks that would otherwise require significant manual effort—such as distributing network traffic across multiple instances, rescheduling failed containers, or scaling services during traffic spikes—are handled automatically. This operational automation allows engineering teams to focus on building and shipping features, rather than managing infrastructure.
A key feature of Kubernetes is its use of declarative configuration. You define the desired state of your application—including container images, environment variables, replica counts, and resource limits—in a YAML manifest. Kubernetes then continuously reconciles the actual state with the desired state, ensuring your application behaves as expected. This model reduces the risk of configuration drift and provides a consistent, auditable deployment workflow.
However, as powerful as Kubernetes is, it introduces its own complexities, especially in environments with multiple teams, services, and clusters. Enforcing consistency, applying security policies, and troubleshooting across a fleet can quickly become a challenge. This is where a unified control plane becomes essential. Tools like Plural help platform teams manage Kubernetes at scale by providing a single pane of glass for deployments, policies, and observability across the entire infrastructure.
What Makes Up a Kubernetes Deployment?
To effectively deploy an application on Kubernetes, you need to understand its core building blocks. A Kubernetes Deployment isn't a single object—it’s a combination of several resources that work together to run and maintain your application. Think of them as the essential ingredients in a recipe. When you define a Deployment, you’re orchestrating a set of lower-level components that handle everything from scaling to service discovery. Understanding how these pieces fit together is fundamental for troubleshooting issues and managing applications at scale.
The four key components involved in a Deployment are:
Pods
A Pod is the smallest and most basic deployable unit in Kubernetes. It represents a single instance of a running process in your cluster and can contain one or more containers (usually one). Pods also include storage volumes, environment variables, and a unique network identity.
Because Pods are ephemeral—created and destroyed frequently—they're not typically created directly. Instead, they’re managed by controllers like ReplicaSets or Deployments that ensure resilience and consistency.
ReplicaSets
A ReplicaSet ensures that a specified number of Pod replicas are running at all times. If a Pod crashes or is deleted, the ReplicaSet automatically creates a new one to replace it. This self-healing behavior makes your application resilient to failure.
Although you can define a ReplicaSet manually, it’s more common (and recommended) to manage them through a Deployment. The Deployment takes care of creating and updating ReplicaSets behind the scenes.
Deployments
A Deployment is the declarative resource that defines the desired state of your application—for example, "run 3 replicas of version 1.2 of my app." The Deployment controller ensures this state is achieved and maintained by creating and managing ReplicaSets.
What makes Deployments powerful is their ability to orchestrate rolling updates and rollbacks with zero downtime. This ensures your applications can evolve safely in production. At scale, a platform like Plural can automate syncing and deploying these manifests across all your clusters through GitOps-based continuous deployment.
Services
Because Pods are short-lived and get new IP addresses each time they're recreated, there needs to be a stable way to access them. Enter the Service, which provides a consistent virtual IP and DNS name that routes traffic to a dynamic set of Pods.
Services use label selectors to match the right Pods and load-balance requests between them. This abstraction decouples application components and enables dynamic scaling and upgrades without affecting clients or users, which is essential for building robust, microservice-based architectures.
How to Create a Kubernetes Deployment
Creating a Kubernetes Deployment is the foundational step in getting your application running in a cluster. It involves specifying which container image to run, how many instances (replicas) to maintain, and how Kubernetes should manage the application lifecycle. While these steps can be executed manually using kubectl
and YAML files, most modern teams adopt automation platforms to manage deployments at scale. Still, understanding the manual workflow is crucial for debugging issues and building a strong foundation in Kubernetes.
The following steps walk through the core deployment workflow—providing the context needed to appreciate the value of automation tools like Plural CD, which apply GitOps principles to manage deployments across a fleet of clusters.
1. Prepare Your Application
Before deploying to Kubernetes, your application must be containerized. Kubernetes doesn't run raw source code; it orchestrates containers—typically Docker images—that bundle your application logic with dependencies and configuration.
Once built, push your image to a container registry. This could be Docker Hub, Amazon ECR, Google Artifact Registry, or any private registry supported by your environment. Be sure to tag your image (e.g., my-app:v1.2.0
) to enable predictable, repeatable deployments.
2. Write the Deployment YAML
The next step is to define your application’s desired state using a Deployment manifest written in YAML. This file specifies critical configuration, including:
- The number of replicas to run
- Which container image to use
- Pod labels and selectors
- Container ports, environment variables, and resource limits
Here’s a minimal example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app-container
image: my-registry/my-app:v1.2.0
ports:
- containerPort: 80
YAML is powerful but error-prone—a missing space or bad indentation can break your deployment. Tools like Plural’s self-service catalog help developers avoid mistakes by generating validated manifests tailored to each environment.
3. Apply the Deployment
To create the Deployment in your cluster, use kubectl
:
kubectl apply -f my-deployment.yaml
This command submits the manifest to the Kubernetes API server, which ensures the cluster’s actual state matches your desired configuration. Kubernetes then schedules the Pods, pulls your image from the registry, and launches the containers.
This workflow is idempotent—running it repeatedly only makes changes if the manifest has been modified. While this is ideal for simple deployments, managing a large application fleet with kubectl
alone quickly becomes unsustainable. That’s where GitOps tools like Plural CD shine: you define deployments in Git, and they’re automatically applied to every environment.
4. Verify the Deployment Status
After applying your manifest, verify everything is working as expected:
kubectl get deployments
This shows the current state of your Deployment, including how many replicas are ready. Next, inspect your Pods:
kubectl get pods
If all is well, Pods will show a Running
status. If not, debug with:
kubectl describe deployment my-app
This gives detailed events and condition updates, helping you identify problems like image pull errors, resource limit violations, or misconfigurations.
While these tools are essential for small teams or dev clusters, monitoring deployments across multiple clusters is much harder. Platforms like Plural solve this with a centralized Kubernetes dashboard that gives you visibility into all deployments—making it easier to monitor health and debug issues without hopping between terminals or switching kubectl
contexts.
Choose a Deployment Strategy
Once your application is containerized and your manifests are defined, the next major decision is how to roll out updates. This is where deployment strategies come into play. Your chosen strategy determines how Kubernetes replaces old versions of your application with new ones—impacting availability, risk, resource usage, and the end-user experience.
This decision isn’t purely technical—it often reflects business priorities. A zero-downtime strategy may be non-negotiable for a customer-facing application, while brief interruptions might be acceptable for internal tools or batch processes. As your platform matures, you may apply different strategies to different services based on their criticality.
A Continuous Deployment system like Plural CD can codify and automate these strategies, ensuring safe, repeatable deployments across environments. By embedding your strategy into the Deployment manifest, you ensure consistency across your fleet and enable infrastructure-as-code workflows that are easy to review and audit.
Rolling Updates (Default)
Rolling updates are the default and most widely used deployment strategy in Kubernetes. They allow you to gradually replace old Pods with new ones, ensuring that a minimum number of Pods are always available to serve traffic. This reduces downtime and minimizes user impact.
You can control the rollout behavior using two key parameters:
maxSurge
: How many additional Pods can be created during the update (above the desired count).maxUnavailable
: How many Pods can be unavailable at any time during the update.
These settings give you flexibility to balance speed and stability. For most production services, rolling updates offer a safe, seamless way to release new versions while maintaining service continuity.
Recreate
The Recreate strategy is the simplest method: Kubernetes terminates all existing Pods before spinning up the new ones. This guarantees that only one version of the application is ever running at a time.
This approach is useful when:
- The application cannot tolerate version skew between replicas.
- You’re introducing breaking database changes or incompatible schema migrations.
- You want to avoid side effects from concurrent versions.
However, it comes with a tradeoff: downtime. During the window when the old version is shut down and the new version is still coming up, your application will be unavailable. This makes the Recreate strategy more appropriate for:
- Development or staging environments
- Internal tools with minimal availability requirements
- Batch jobs or one-off tasks
Blue/Green Deployments
Blue/Green deployments provide high availability and safe rollback paths by running two identical environments: one live (Blue) and one staged (Green).
Here’s how it works:
- The Blue environment serves current production traffic.
- A new version is deployed to the Green environment.
- Tests and checks are run on Green before exposing it to users.
- If the Green environment passes validation, you switch traffic from Blue to Green—usually via a Service switch, Ingress update, or load balancer.
- If something breaks, simply route traffic back to Blue.
This strategy offers near-zero downtime, fast rollback, and production-level testing before releasing. The tradeoff is increased infrastructure cost, as both environments need to run in parallel—at least temporarily.
For critical services or applications with high traffic volumes and low tolerance for errors, the added reliability of Blue/Green deployments is often worth the overhead.
How to Scale and Update Deployments
Deploying your application is just the beginning. To maintain performance, resilience, and user satisfaction, you must continuously manage the application lifecycle—scaling workloads to meet demand and deploying updates with minimal disruption. Kubernetes provides powerful primitives to handle these tasks natively, enabling dynamic scaling, zero-downtime updates, and safe rollbacks. These features are foundational to building modern, cloud-native applications.
However, applying these techniques consistently across a large fleet of clusters can be operationally challenging. Without a centralized approach, teams risk configuration drift, inconsistent scaling policies, and slow remediation during incidents. A unified management layer—like Plural—simplifies this complexity by abstracting the operational details and giving platform teams a consistent interface to scale and update applications across all environments.
Use Horizontal Pod Autoscaling
Horizontal Pod Autoscaling (HPA) allows your application to dynamically scale based on load. HPA monitors metrics such as CPU or memory usage, and adjusts the number of running pods in a deployment accordingly. For example, if average CPU usage exceeds 75%, HPA can automatically scale out to ensure responsiveness. When demand drops, it scales back in to reduce resource usage and cost.
To use HPA, your cluster must be running the Metrics Server, which collects resource usage data. You can also configure custom metrics through Prometheus and KEDA for event-based autoscaling (e.g., queue length or HTTP requests per second).
Example HPA YAML:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 75
Plural helps enforce consistent autoscaling policies across all your clusters, ensuring uniform elasticity and observability everywhere.
Perform a Rolling Update
The default and most commonly used update strategy in Kubernetes is the rolling update. When you change a Deployment—such as updating the container image—Kubernetes gradually replaces old Pods with new ones. This ensures zero downtime, as a minimum number of Pods remain available throughout the update.
Rolling updates are highly configurable:
maxUnavailable
: the maximum number of Pods that can be unavailable during the update.maxSurge
: the maximum number of additional Pods created above the desired count during the update.
You can check the status of a rolling update using:
kubectl rollout status deployment my-app
Plural CD automates rolling updates within a GitOps pipeline, tracking every rollout in Git. This provides a single source of truth, enforces policy compliance, and eliminates manual intervention—all while giving your team real-time visibility into progress and potential rollout failures across your fleet.
Roll Back a Deployment
Even with thorough testing, some updates will fail. Kubernetes makes recovery easy with built-in rollback support. Every time you apply a new Deployment, Kubernetes stores a revision history, allowing you to roll back to a previous version:
kubectl rollout undo deployment my-app
This instantly reverts the Deployment to the last working configuration. You can also use:
kubectl rollout history deployment my-app
To view previous versions.
While this works well for single clusters, GitOps offers a more controlled and auditable way to manage rollbacks at scale. With Plural, a rollback is as simple as reverting a Git commit. Plural’s CD engine detects the change, updates the Deployment, and syncs the new state across all clusters, ensuring consistency without manual action.
How to Expose Your Deployment
Once your application is successfully running inside your Kubernetes cluster, the next step is making it accessible—whether that means allowing internal services to communicate or exposing your application to the outside world. Kubernetes provides flexible, built-in networking primitives to manage traffic and route requests to the appropriate destinations. The two most commonly used resources for this are Services and Ingress.
While Services give your workloads stable internal addresses for communication within the cluster, Ingress exposes your application externally, routing traffic based on hostnames or paths. Configuring these resources correctly is essential for ensuring availability, security, and scalability—without them, even a fully functional application remains isolated and unreachable. Below, we explore the key approaches to exposing your application in a way that is production-ready and manageable across your entire fleet.
Create Services for Internal Traffic
In Kubernetes, Pods are ephemeral—their IP addresses change as they are rescheduled or restarted. To provide a stable interface to your workloads, Kubernetes introduces the Service abstraction. A Service selects a set of Pods using labels and gives them a persistent DNS name and a virtual IP address (VIP), enabling reliable internal communication between components.
The default and most common type of Service is ClusterIP
, which is only accessible from within the cluster. This is ideal for service-to-service communication in a microservices architecture, such as connecting your frontend to your backend, or your API gateway to your internal services.
Example ClusterIP
Service:
apiVersion: v1
kind: Service
metadata:
name: my-backend
spec:
type: ClusterIP
selector:
app: my-backend
ports:
- port: 80
targetPort: 8080
By abstracting away the lifecycle of Pods, Services become the backbone of internal service discovery in Kubernetes.
Use Ingress for External Access
For user-facing applications, you need to expose your workloads to the outside world. This is where an Ingress comes in. An Ingress is a Kubernetes API object that defines routing rules for HTTP and HTTPS traffic, typically based on domain names and URL paths. These rules are interpreted and enforced by an Ingress Controller—a component like NGINX, Traefik, or Contour.
An Ingress allows you to:
- Route traffic to different Services based on hostnames (e.g.,
api.myapp.com
,admin.myapp.com
) - Route based on URL paths (e.g.,
/api
,/dashboard
) - Terminate TLS (HTTPS) traffic and manage certificates
Example Ingress YAML:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-frontend
port:
number: 80
You can learn more in the Ingress documentation.
With Plural, your Ingress rules and controller configurations can be version-controlled and automatically deployed across environments using GitOps workflows, providing consistency and visibility across your fleet.
Configure Load Balancing
In production scenarios, you’ll likely need to distribute traffic efficiently and reliably across multiple nodes. The LoadBalancer
Service type is designed for this. When you create a LoadBalancer
Service, Kubernetes requests a cloud-managed load balancer from your infrastructure provider (e.g., AWS, GCP, Azure).
Example:
apiVersion: v1
kind: Service
metadata:
name: my-app-lb
spec:
type: LoadBalancer
selector:
app: my-app
ports:
- port: 80
targetPort: 8080
Once provisioned, you’ll receive an external IP or hostname that routes traffic to your Service. This is the most straightforward way to expose a service externally without setting up an Ingress controller.
While NodePort
is another option—exposing services on each node's IP at a static port—it’s generally not suitable for production due to scalability and security limitations.
Follow Deployment Best Practices
Writing and applying a Kubernetes Deployment manifest is only the beginning. To ensure your applications run reliably, securely, and efficiently at scale, it's essential to adopt best practices that span the entire deployment lifecycle. These practices help reduce downtime, avoid resource contention, and improve the overall maintainability of your Kubernetes environment. By embedding resource management, security policies, and automation into your deployment workflows, you create a production-ready foundation that can scale with your infrastructure.
Manage Resources and Set Limits
In a multi-tenant Kubernetes cluster, CPU and memory are shared resources. Without controls in place, one runaway application can monopolize system resources, causing node pressure, evictions, or outages for other services. That’s why it’s critical to define resource requests and limits in your Pod specs.
requests
: the amount of CPU or memory a container is guaranteedlimits
: the maximum amount a container can consume
Example:
resources:
requests:
cpu: "250m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
This ensures fair scheduling and prevents a single container from exhausting resources. It also improves Horizontal Pod Autoscaler accuracy, since the autoscaler uses requests to calculate scaling decisions.
Secure Your Deployments
Security should be built into every layer of your Kubernetes deployment.
- NetworkPolicies act like firewalls for Pods, enabling you to isolate workloads and reduce lateral movement in the event of a breach.
- RBAC enforces the principle of least privilege, limiting what users and service accounts can access or modify.
- Use ServiceAccounts for workload identity, and avoid using the default service account in production.
- Scan your container images for vulnerabilities using tools like Trivy or Grype.
With Plural, you can manage RBAC policies across your entire fleet from a centralized Git repository. Plural also integrates with your identity provider via SSO, tying permissions to real user identities and simplifying access audits.
Integrate with Your CI/CD Pipeline
Manual deployment introduces risk, slows down iteration, and breaks reproducibility. To ship software faster and more safely, integrate your Kubernetes workflows into a CI/CD pipeline—automating everything from building container images to deploying new versions.
- Use a GitOps model: define your application state in Git and deploy using commits.
- Tools like Argo CD and Flux are commonly used for GitOps-based Kubernetes delivery.
Plural CD takes this one step further with an agent-based GitOps engine, syncing manifests from Git to your target clusters. It continuously monitors for drift and automatically reconciles your clusters with the desired state—so your deployments are consistent, auditable, and self-healing.
By adopting CI/CD integration, your deployments become version-controlled, traceable, and repeatable—exactly what’s needed for operating at scale.
How Plural Simplifies Deployments
While kubectl
and YAML files are the fundamental building blocks for Kubernetes, managing deployments across a fleet of clusters introduces significant operational overhead. Manual processes, inconsistent configurations, and a lack of visibility can slow down development cycles and increase risk. Plural provides a unified platform to automate and standardize the entire deployment lifecycle, from code generation to fleet-wide monitoring.
Streamline Your Deployment Workflow
Kubernetes deployments act as a manager for your applications, automating how they are rolled out and maintained. Plural enhances this by implementing a fully automated, GitOps-based workflow. With Plural CD, every change to your application manifests—whether Helm, Kustomize, or raw YAML—is automatically synced to your target clusters from a Git repository. This ensures a single source of truth and auditable change history.
Beyond the application layer, Plural also manages the underlying infrastructure. Using Plural Stacks, you can manage Terraform configurations with the same API-driven, Git-based workflow, ensuring that your infrastructure and applications are always in sync. This eliminates the friction between application deployment and infrastructure provisioning.
Get Full Visibility and Control
With Kubernetes deployments, you can update your application, scale it, and roll back to previous versions. However, performing these actions confidently requires clear visibility into the state of your resources. Plural’s built-in multi-cluster dashboard provides deep, real-time visibility into every cluster in your fleet without requiring you to manage kubeconfig
files or VPNs.
The dashboard leverages a secure reverse-tunneling proxy, allowing you to inspect pods, deployments, and ReplicaSets in any environment—even private or on-prem clusters—from a single interface. With SSO integration and granular RBAC, you can provide teams with secure, ad-hoc access for troubleshooting, ensuring they have the control they need without compromising security.
Manage Your Fleet from a Single Pane of Glass
Utilizing YAML configuration files is essential for collaboration and version control, but maintaining consistency across a large fleet is a major challenge. Plural is built on a scalable, agent-based architecture designed to solve this problem. A lightweight agent is installed in each workload cluster, which communicates with a central control plane using a secure, egress-only model.
This architecture allows you to manage deployments across your entire fleet from a single pane of glass. You can define a service once and use Plural to ensure it is deployed consistently across hundreds of clusters. This approach provides the centralized control of a unified platform while maintaining the security and isolation of a distributed system, giving you a scalable and secure foundation for managing Kubernetes at scale.
Related Articles
- Managing Kubernetes Deployments: A Comprehensive Guide
- Best Orchestration Tools for Kubernetes in 2024
- How to Monitor a Kubernetes Cluster: The Ultimate Guide
- Kubernetes Deployments: Explained
- Kubernetes Ingress and How Plural Makes It Safer
Unified Cloud Orchestration for Kubernetes
Manage Kubernetes at scale through a single, enterprise-ready platform.
Frequently Asked Questions
Why can't I just use kubectl
for everything? While kubectl
is an essential tool for direct interaction with a single Kubernetes cluster, its effectiveness diminishes as you scale. Managing a fleet of clusters with kubectl
requires juggling multiple configuration files and contexts, which is inefficient and prone to error. It lacks a centralized view for monitoring fleet-wide health or a standardized workflow for rolling out changes consistently. Platforms like Plural provide a necessary abstraction layer, offering a single pane of glass, GitOps-based automation, and unified RBAC to manage your entire fleet securely and efficiently.
What's the practical difference between a Service and an Ingress? A Service provides a stable internal IP address and DNS name for your pods, enabling reliable communication within the cluster. Think of it as the internal phone system for your microservices. An Ingress, on the other hand, manages external access to those services, typically for HTTP and HTTPS traffic. It acts as a smart router or entry point, directing incoming requests from the internet to the correct service based on rules you define, like hostnames or URL paths. You need both: Services for internal connectivity and Ingress for external exposure.
My deployment is failing. Where do I even start looking? When a deployment fails, start by checking the status of the Deployment and its Pods using kubectl get deployments
and kubectl get pods
. Look for statuses like ImagePullBackOff
or CrashLoopBackOff
. Next, inspect the logs of a failing pod with kubectl logs <pod-name>
to find application-level errors. For a more systematic approach, Plural’s built-in Kubernetes dashboard provides a centralized view of all your resources, allowing you to quickly inspect events and logs across your entire fleet from a single UI without needing direct cluster access.
How do I manage configurations consistently across dozens or hundreds of clusters? Manually applying YAML files to each cluster is not a scalable or reliable strategy. The standard solution is to adopt a GitOps workflow, where a Git repository serves as the single source of truth for all your configurations. Plural CD automates this process by using an agent in each cluster to continuously sync its state with the manifests defined in your repository. This ensures every cluster is configured identically and provides a clear, auditable trail for every change, eliminating configuration drift.
The post mentions different deployment strategies. How do I choose the right one? The right strategy depends on your application's tolerance for downtime and risk. Rolling updates are the default and offer a good balance of safety and speed for most applications by updating pods incrementally. The Recreate strategy, which involves brief downtime, is suitable for development environments or applications that can't run two versions simultaneously. For critical, user-facing applications where zero downtime is non-negotiable, a Blue/Green strategy provides the highest level of safety by allowing you to test a new version in parallel before switching traffic over.