Redis cluster on Kubernetes.

Deploying Redis Clusters on Kubernetes: A Step-by-Step Guide

Learn how to deploy and manage a Redis cluster on Kubernetes with step-by-step instructions, best practices, and tips for high availability and performance.

Michael Guarino
Michael Guarino

Managing stateful applications in Kubernetes requires different considerations than running stateless workloads. Databases and caches need persistent storage, stable network identities, and well-defined lifecycle management. Redis clusters add further complexity with requirements for sharding, replication, and failover.

This guide provides a practical walkthrough of deploying and managing a Redis cluster on Kubernetes. You’ll learn the setup process, configuration best practices, and techniques for maintenance and troubleshooting. By the end, you’ll be equipped to run a production-grade Redis cluster that can handle real-world workloads with resilience and reliability.

Unified Cloud Orchestration for Kubernetes

Manage Kubernetes at scale through a single, enterprise-ready platform.

GitOps Deployment
Secure Dashboards
Infrastructure-as-Code
Book a demo

Key takeaways:

  • Prioritize data persistence with StatefulSets: Use Kubernetes StatefulSets to manage your Redis pods. This provides the stable network identities and dedicated persistent storage necessary to protect your data during pod restarts and ensure the cluster remains consistent.
  • Adopt a unified monitoring strategy: Effective troubleshooting requires correlating Redis-specific metrics, like cache hits and latency, with Kubernetes resource data, like pod CPU and memory usage. A single-pane-of-glass platform simplifies this by providing a holistic view of both the application and its underlying infrastructure.
  • Manage configuration as code to prevent drift: Define your entire Redis deployment—including resource limits, security policies, and custom settings—in version-controlled manifests. A GitOps workflow automates the application of these configurations, ensuring consistency and eliminating manual errors across your fleet.

What Is a Redis Cluster?

A Redis Cluster is a distributed setup that splits data across multiple Redis nodes. This design improves both performance and fault tolerance, enabling storage to scale beyond the limits of a single instance. By distributing the dataset, clusters support near-linear scalability and high availability.

Redis Cluster Architecture

A cluster is composed of individual Redis nodes that work together to manage a shared dataset. Data is partitioned into shards and distributed across nodes. Each node owns a subset of the data and communicates with peers to maintain cluster state, coordinate client requests, and handle failover. This architecture allows Redis to efficiently process large workloads while providing resilience.

Key Features and Benefits

The core benefit of Redis Cluster is horizontal scalability. On Kubernetes, you can scale Redis simply by adding nodes, with the platform handling scheduling and recovery. If a node fails, Kubernetes restarts it automatically, while Redis ensures continuity through replication and failover. The combination of Redis’s distributed model with Kubernetes orchestration delivers a fault-tolerant, highly available in-memory datastore suited for production workloads.

Data Sharding in Redis

Redis Cluster divides the keyspace into 16,384 hash slots. A CRC16 hash maps each key to a slot, which is then assigned to a primary node. This guarantees even data distribution and avoids hot spots. The system supports live resharding, allowing hash slots to move between nodes without downtime. As a result, you can scale capacity up or down while keeping applications online.

Prerequisites for Deploying Redis on Kubernetes

Running Redis on Kubernetes requires upfront planning to ensure performance, resilience, and security. By addressing system requirements, networking, storage, and security before deployment, you avoid common pitfalls and set up a production-ready Redis cluster.

Review System Requirements

Even though Redis is lightweight, production clusters demand adequate CPU and memory. Define resource requests and limits in your manifests to avoid performance degradation under load. Ensure your Kubernetes version is compatible with the Redis Operator or Helm chart you plan to use. With Plural’s console, you can monitor utilization across clusters and proactively prevent bottlenecks.

Configure Networking

Redis nodes depend on low-latency communication for sharding, replication, and failover. Use a Headless Service to assign each pod a stable network identity. Enforce NetworkPolicies to restrict traffic—only allowing trusted application pods and Redis nodes to connect—reducing the attack surface. Plural standardizes these security practices across your managed environments.

Plan Storage

To safeguard data during pod rescheduling or node failures, pair StatefulSets with PersistentVolumeClaims. Choose a StorageClass that aligns with your workload; high-IOPS SSD-backed storage is typically best for Redis. Infrastructure as Code (IaC) workflows, such as Plural Stacks, can automate provisioning to ensure consistent, reliable storage across environments.

Address Security

Start with a verified Redis image and configure authentication using the requirepass directive. Store secrets in Kubernetes Secrets rather than hardcoding them. Apply RBAC to tightly control who can access Redis resources. With Plural’s identity-aware dashboard, you can enforce granular RBAC tied to your existing SSO provider, ensuring consistent security across your fleet.

How to Deploy Redis on Kubernetes: A Step-by-Step Guide

Running Redis on Kubernetes involves more than spinning up containers. Because Redis is stateful, you need stable pod identities, persistent storage, and a controlled initialization process. Below is a structured approach with manifest snippets and automation options to help you deploy Redis clusters in production.

Step 1: Create Kubernetes Manifests

Start by defining a basic manifest that specifies the Redis image, ports, and resource allocations. This example uses the official Redis image and exposes port 6379.

apiVersion: v1
kind: Service
metadata:
  name: redis
spec:
  ports:
    - port: 6379
      targetPort: 6379
  clusterIP: None  # Headless service for stable DNS
  selector:
    app: redis
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis
spec:
  serviceName: "redis"
  replicas: 3
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
        - name: redis
          image: redis:7.2
          ports:
            - containerPort: 6379
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"
          volumeMounts:
            - name: data
              mountPath: /data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 5Gi

This manifest:

  • Creates a headless service to give each Redis pod a unique DNS identity.
  • Uses a StatefulSet with predictable pod names (redis-0, redis-1, redis-2).
  • Attaches a PersistentVolumeClaim (PVC) to each pod for durable storage.

Step 2: Set Up StatefulSets

Using a StatefulSet ensures Redis pods retain identity across restarts. For example:

  • Pod redis-0 always maps to the same PVC data-redis-0.
  • Pod ordering ensures Redis cluster discovery works reliably.

If you use Helm, you can achieve the same outcome with:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis-cluster \
  --set cluster.nodes=6 \
  --set persistence.storageClass=fast-ssd \
  --set persistence.size=5Gi

Helm abstracts away much of the StatefulSet and PVC boilerplate while giving you tuning options.

Step 3: Configure Persistent Storage

Redis is an in-memory database, but still writes snapshots (RDB) or append-only logs (AOF) to disk for durability. Ensuring persistent storage prevents data loss during pod rescheduling.

For production workloads:

  • Use SSD-backed StorageClasses with high IOPS.
  • Configure AOF persistence in redis.conf:
appendonly yes
appendfsync everysec

You can inject this configuration into pods using a ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-config
data:
  redis.conf: |
    appendonly yes
    appendfsync everysec
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis
spec:
  template:
    spec:
      containers:
        - name: redis
          image: redis:7.2
          command: ["redis-server", "/usr/local/etc/redis/redis.conf"]
          volumeMounts:
            - name: config
              mountPath: /usr/local/etc/redis
      volumes:
        - name: config
          configMap:
            name: redis-config

Step 4: Initialize the Cluster

Once the pods are running, form the cluster by running redis-cli inside one of the pods:

kubectl exec -it redis-0 -- redis-cli --cluster create \
  redis-0.redis:6379 \
  redis-1.redis:6379 \
  redis-2.redis:6379 \
  --cluster-replicas 1

This command:

  • Connects all Redis pods into a cluster.
  • Assigns slots across nodes automatically.
  • Configures replication (each primary has one replica).

To automate this step, wrap it in a Kubernetes Job:

apiVersion: batch/v1
kind: Job
metadata:
  name: redis-init
spec:
  template:
    spec:
      containers:
        - name: redis-init
          image: redis:7.2
          command:
            - sh
            - -c
            - |
              redis-cli --cluster create \
                redis-0.redis:6379 \
                redis-1.redis:6379 \
                redis-2.redis:6379 \
                --cluster-replicas 1 --cluster-yes
      restartPolicy: OnFailure

How to Manage Your Redis Cluster

Once Redis is deployed on Kubernetes, the priority shifts to day-to-day operations. Proper management ensures scalability, observability, and resilience, while minimizing manual intervention. The main areas to focus on are scaling, monitoring, backups, and failover handling.

Scale and Load Balance

Redis clusters can be scaled horizontally by increasing the number of pods in the StatefulSet. Kubernetes maintains stable identities for new pods, preserving cluster integrity. A headless Service provides a stable DNS entry for applications, while a standard Service distributes traffic evenly across nodes.

Example: scaling from 3 to 6 Redis nodes

kubectl scale statefulset redis --replicas=6

With Plural CD, this scaling action can be version-controlled and propagated across environments through GitOps, preventing drift and simplifying multi-cluster management.

Monitor with Prometheus and Grafana

Monitoring is essential to detect performance issues early. Use the Redis Exporter to expose metrics such as memory usage, CPU load, command latency, and cache hit ratio.

Example: Redis Exporter deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-exporter
  template:
    metadata:
      labels:
        app: redis-exporter
    spec:
      containers:
        - name: redis-exporter
          image: oliver006/redis_exporter:v1.61.0
          ports:
            - containerPort: 9121
  • Prometheus scrapes these metrics and stores them as time-series data.
  • Grafana visualizes them in dashboards for capacity planning and troubleshooting.
  • Plural’s dashboard integrates Kubernetes state directly, consolidating observability in one place.

Implement Backup and Recovery

Even with Redis cluster’s built-in resilience, backups are required to protect against accidental data loss or catastrophic failure. Automate backups with Kubernetes CronJobs that trigger Redis snapshotting (RDB or AOF).

Example: CronJob for nightly RDB snapshots

apiVersion: batch/v1
kind: CronJob
metadata:
  name: redis-backup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: backup
              image: redis:7.2
              command: ["/bin/sh", "-c"]
              args:
                - redis-cli save && \
                  cp /data/dump.rdb /backup/redis-$(date +%F-%H%M).rdb
              volumeMounts:
                - name: data
                  mountPath: /data
                - name: backup
                  mountPath: /backup
          restartPolicy: OnFailure
          volumes:
            - name: backup
              persistentVolumeClaim:
                claimName: backup-pvc

Store these backups in durable object storage (S3, GCS, etc.) for long-term retention.

Handle Failover Scenarios

Redis cluster uses a master–replica model. If a master fails, a replica is automatically promoted. Kubernetes complements this by restarting failed pods and reattaching them to the cluster.

To observe failover:

kubectl logs redis-0
kubectl describe pod redis-0

Plural’s console surfaces these events across your fleet, making it easier to verify automatic promotions and confirm that cluster health is restored after outages.

How to Optimize Redis Performance

Running Redis on Kubernetes gives you scalability and resilience, but performance tuning is what makes it production-ready. Without proper optimization, you risk latency spikes, resource waste, or instability. Key areas to address include resource management, networking, security, and high availability.

Manage Resources Effectively

Redis is memory-bound, so setting the right CPU/memory requests and limits is critical. Under-provisioning can cause evictions or crashes, while over-provisioning wastes cluster capacity. Profile your workload to establish realistic baselines.

Example StatefulSet resource configuration:

resources:
  requests:
    memory: "4Gi"
    cpu: "1"
  limits:
    memory: "6Gi"
    cpu: "2"

Kubernetes enforces these allocations, ensuring Redis pods get guaranteed capacity. With Plural, you can monitor usage across clusters and adjust configurations centrally via GitOps, avoiding both bottlenecks and over-allocation.

Optimize Network Performance

Redis clusters require two ports per node:

  • 6379 → Client connections
  • 16379 → Cluster bus for gossip, health checks, and failover signals

Any latency in the cluster bus can lead to false node failures and unnecessary failovers.

Best practices:

  • Ensure your CNI is tuned for low-latency traffic.
  • Allow both ports in your firewall and NetworkPolicies.
  • Avoid noisy neighbors by using pod anti-affinity and dedicated node pools when possible.

Example NetworkPolicy for Redis traffic:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: redis-allow
spec:
  podSelector:
    matchLabels:
      app: redis
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: my-app
      ports:
        - protocol: TCP
          port: 6379
        - protocol: TCP
          port: 16379

Harden Your Security

Redis should never be exposed directly without protections. Secure the cluster using multiple Kubernetes-native controls:

  1. Pod anti-affinity → Prevents co-location of master and replica pods on the same node.
  2. NetworkPolicies → Restrict access so only trusted application pods can connect.
  3. Authentication → Enable Redis requirepass and store the password in a Kubernetes Secret.

Example Secret for Redis password:

apiVersion: v1
kind: Secret
metadata:
  name: redis-auth
type: Opaque
data:
  password: c3VwZXJzZWNyZXRwYXNz # base64 encoded

Mount this Secret into your StatefulSet and pass it to Redis via environment variables or config maps. Plural ensures these security settings are consistently applied across environments through GitOps.

Configure for High Availability

Redis provides HA through master–replica replication. In Kubernetes, pair this with scheduling strategies to eliminate single points of failure.

Key practices:

  • Deploy replicas for each master shard in your StatefulSet.
  • Use pod anti-affinity rules to spread replicas across different nodes.
  • Monitor failovers with logs and health checks to validate readiness.

Example anti-affinity rule:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
            - key: app
              operator: In
              values:
                - redis
        topologyKey: "kubernetes.io/hostname"

This ensures no two Redis pods from the same shard run on the same node. Combined with Kubernetes self-healing, you get automatic recovery from both pod-level and node-level failures.

How to Troubleshoot Your Redis Cluster

Deploying Redis on Kubernetes provides significant reliability and performance benefits, but like any distributed system, it can present unique troubleshooting challenges. When issues arise, a systematic approach is key to identifying the root cause, whether it lies within the Redis configuration, the Kubernetes environment, or the interaction between the two. Common problems range from initial deployment failures and performance degradation to unexpected behavior caused by Redis Cluster’s specific design trade-offs.

Solve Common Deployment Issues

Deployment failures often stem from misconfigurations in your Kubernetes manifests. Pods stuck in a Pending state may indicate resource shortages or issues with PersistentVolumeClaims (PVCs), while CrashLoopBackOff errors can point to incorrect container images, configuration errors, or failed readiness probes. Start by inspecting pod details with kubectl describe pod <pod-name> to check for events that reveal the underlying problem. Reviewing container logs with kubectl logs <pod-name> is also essential for diagnosing application-level failures. Network policies can sometimes block communication between Redis nodes, preventing the cluster from forming correctly. Plural’s embedded Kubernetes dashboard simplifies this process by providing a unified interface to view logs, events, and resource states across your entire fleet, eliminating the need to juggle multiple kubeconfig files and terminals.

Identify Performance Bottlenecks

Performance issues in a Redis cluster can manifest as high latency or slow command execution. Begin by monitoring key Redis metrics. High memory usage (used_memory) can lead to key eviction, while a low keyspace_hit_ratio suggests your cache is ineffective. High CPU utilization might point to inefficient commands or an overloaded node. Running a Redis cluster on Kubernetes allows you to use native tooling like kubectl to check pod resource consumption (kubectl top pod). By correlating Redis metrics with Kubernetes pod and node metrics, you can determine if the bottleneck is application-specific or caused by infrastructure constraints. Plural provides a single pane of glass for observability, helping you connect application performance data with underlying infrastructure health to quickly pinpoint the source of slowdowns.

Understand Command Limitations

Some issues aren't bugs but are inherent to Redis Cluster’s design. A primary limitation is the handling of multi-key operations. Commands like MSET or transactions involving keys that map to different hash slots will fail. Your application must be designed to handle these constraints, for instance, by using hash tags {...} in keys to ensure related data resides on the same node. Furthermore, Redis Cluster does not promise strong consistency. During a network partition or failover event, the system can lose a small number of writes that were acknowledged by the master but not yet replicated. Understanding this trade-off is critical for applications that require guaranteed data durability.

Fix Data Distribution Problems

A well-balanced cluster is essential for optimal performance. Redis automatically shards data across 16,384 hash slots, which are distributed among the master nodes. However, an imbalance can occur if some nodes manage significantly more slots or data than others, creating "hot spots." This can happen after scaling the cluster or due to keying patterns that concentrate traffic on specific slots. You can inspect the slot distribution using the redis-cli cluster nodes command. If you find a significant imbalance, you can use redis-cli --cluster rebalance to redistribute slots evenly. This operation should be performed carefully during off-peak hours, as it can temporarily increase cluster load while data is being moved between nodes.

Unified Cloud Orchestration for Kubernetes

Manage Kubernetes at scale through a single, enterprise-ready platform.

GitOps Deployment
Secure Dashboards
Infrastructure-as-Code
Book a demo

Frequently Asked Questions

Why is a StatefulSet recommended over a Deployment for a Redis cluster? A StatefulSet is designed for stateful applications like Redis that require stable and unique network identifiers and persistent storage. Unlike a standard Deployment, a StatefulSet provides each pod with a predictable, persistent name (e.g., redis-0, redis-1). This stability is essential for the Redis cluster's discovery mechanism and for maintaining master-replica relationships, ensuring the cluster can correctly identify its members even after pods are restarted or rescheduled.

My application uses multi-key commands. Will they work with Redis Cluster? Generally, no. Multi-key commands like MSET or transactions will fail if the keys involved are stored on different nodes. This is a fundamental design trade-off of Redis Cluster's sharded architecture. The solution is to use "hash tags" by placing a common string within curly braces in your key names, such as {user123}:profile and {user123}:cart. This forces Redis to map all keys with the same tag to the same hash slot, ensuring they reside on the same node and can be used in atomic multi-key operations.

What's the first thing I should check if my Redis cluster performance is poor? Start by examining resource utilization and network latency. High latency is a common culprit, so ensure your Kubernetes network provides low-latency communication between pods, especially on the cluster bus port. Next, check for resource bottlenecks by monitoring CPU and memory usage for each Redis pod. Since Redis is an in-memory store, insufficient memory can lead to key eviction and performance degradation. Tools like Prometheus can help you track these metrics and identify if a specific node is overloaded.

How do I handle upgrades for my Redis cluster without causing downtime? The safest method is a rolling upgrade that leverages the master-replica architecture. For each master node you need to upgrade, you first perform a manual failover, promoting one of its replicas to become the new master. Once the replica has taken over, you can safely take the old master offline, perform the upgrade, and bring it back online. It will rejoin the cluster as a replica for the new master. This process ensures that a primary node is always available to serve requests for each data shard.

How does a platform like Plural simplify the Redis deployment and management process described here? Plural automates and unifies the entire lifecycle described in this post through a GitOps-based workflow. Instead of manually creating manifests, configuring storage, and initializing the cluster with kubectl, you can define your Redis setup as code. Plural's continuous deployment ensures this configuration is applied consistently across your entire fleet. For ongoing management, Plural provides a single-pane-of-glass console with an embedded Kubernetes dashboard, allowing you to monitor resources, manage RBAC with SSO, and troubleshoot issues without juggling multiple tools and contexts. This streamlines operations and reduces the risk of manual error.

Guides