KEDA vs. HPA: Which Kubernetes Autoscaler to Use

In production Kubernetes environments, cloud cost control is tightly coupled to how efficiently workloads scale. The Horizontal Pod Autoscaler (HPA) adjusts replica counts based on resource metrics, but its hard minimum of one replica means services continue to consume resources even when idle. This makes HPA a poor fit for workloads with long idle periods and is a core reason teams evaluate alternatives. The Kubernetes Event-Driven Autoscaler (KEDA) targets this gap by supporting scale-to-zero for event-driven workloads, activating pods only when external signals indicate real demand.

This article breaks down the trade-offs between HPA and KEDA and maps each tool to the workload patterns where it delivers the most cost-efficient results.

Unified Cloud Orchestration for Kubernetes

Manage Kubernetes at scale through a single, enterprise-ready platform.

GitOps Deployment
Secure Dashboards
Infrastructure-as-Code
Book a demo

Key takeaways:

  • Align autoscalers with workload triggers: Use HPA for applications that scale directly with resource consumption like CPU and memory. Employ KEDA for event-driven workloads that must respond to external signals, such as message queue depth or custom metrics.
  • Use KEDA's scale-to-zero for efficiency: KEDA complements HPA by enabling workloads to scale down to zero replicas when idle, eliminating resource waste for intermittent jobs. This hybrid approach optimizes costs without sacrificing responsiveness when new events arrive.
  • Secure and monitor autoscaling configurations: Production autoscaling is not "set and forget." Secure your ScaledObject and HPA configurations with RBAC, and continuously monitor scaling behavior to fine-tune thresholds and prevent performance bottlenecks or cost overruns.

What Is the HPA?

The HPA is a native feature of Kubernetes that automatically adjusts the number of pod replicas for a Deployment, ReplicaSet, or StatefulSet. It provides horizontal scaling by increasing replicas under load and reducing them during low utilization, removing the need for manual intervention. HPA is designed for stateless workloads with fluctuating traffic, where resource-based scaling is sufficient to maintain performance while avoiding overprovisioning.

HPA runs as a control loop that periodically evaluates observed metrics against configured targets. By default, it relies on CPU and memory metrics collected by the Metrics Server. When utilization drifts from the desired threshold, the controller computes a new replica count and updates the target resource accordingly. This reactive model makes HPA effective for demand-driven scaling, but it is not suitable for event-driven workloads that require scaling based on external signals such as queue depth or message rates.

How HPA Scales Using Resource Metrics

HPA determines scaling actions by comparing current resource usage with a target value. For example, if an HPA is configured to maintain 70% average CPU utilization and observed usage rises to 90%, the controller increases the replica count to redistribute load and bring utilization back toward the target. If utilization drops well below the target, HPA scales in by reducing replicas. This feedback loop continuously converges toward the desired utilization level.

Scaling Behavior, Thresholds, and Limits

HPA scaling is inherently reactive, which introduces latency between a traffic spike and the availability of additional pods. To avoid oscillation and unnecessary churn, HPA applies stabilization windows and cooldown periods before scaling decisions take effect. You explicitly define minimum and maximum replica counts, along with utilization targets, to bound scaling behavior. These limits prevent scale-to-zero scenarios and uncontrolled scale-outs, but they also make HPA a suboptimal choice for workloads that sit idle for long periods or need to scale proactively based on external events.

What Is the KEDA?

The KEDA extends Kubernetes with event-driven autoscaling for workloads that cannot be scaled effectively using internal resource metrics alone. While the HPA scales based on CPU and memory, many systems need to react to external signals such as queue depth or stream lag. KEDA addresses this by acting as a lightweight metrics adapter that exposes external event data to the Horizontal Pod Autoscaler, enabling scaling decisions driven by actual work, not just resource saturation.

As a Cloud Native Computing Foundation project, KEDA integrates cleanly into the Kubernetes control plane. It does not replace HPA; instead, it activates and deactivates HPA behavior based on event activity. This makes KEDA well suited for intermittent or bursty workloads such as background processors, data pipelines, and serverless-style services running on Kubernetes, where idle capacity translates directly into wasted spend.

How KEDA Extends HPA for Event-Driven Workloads

KEDA complements HPA by bridging the gap between Kubernetes and external event sources. HPA expects metrics from running pods, but event-driven systems often need to scale before pods exist. KEDA continuously monitors external systems and, when activity is detected, dynamically creates or drives an HPA to scale the target workload.

Its most important capability is scale-to-zero. When no events are present, KEDA can reduce a deployment to zero replicas, fully eliminating idle resource usage. When new events arrive, it scales the workload back up automatically. Native HPA cannot scale below one replica, which makes KEDA essential for cost-efficient event-driven architectures.

Scaling with Event Triggers and Sources

KEDA configuration is centered around the ScaledObject custom resource, which defines the target workload, the event source, and scaling thresholds. KEDA uses pluggable “scalers” to query external systems and translate their state into metrics consumable by HPA.

KEDA supports more than 60 built-in scalers, including queues and streams such as RabbitMQ and Apache Kafka, as well as managed cloud services like AWS SQS and Azure Event Hubs. For example, KEDA can poll a queue for message backlog and scale pods proportionally to the amount of pending work. This allows applications to scale proactively based on demand, rather than reactively after resource pressure has already built up.

HPA vs. KEDA: Key Scaling Differences

Although both tools manage pod scaling in Kubernetes, they solve different problems. The Horizontal Pod Autoscaler is a core, resource-centric controller, while the Kubernetes Event-Driven Autoscaler builds on HPA to support event-driven patterns. The choice affects responsiveness, operational overhead, and cloud cost.

Resource-based vs. event-driven scaling

HPA is reactive. It polls pod-level metrics such as CPU and memory and adjusts replicas to maintain a target utilization. This works well for synchronous, request-driven services with relatively predictable load (for example, HTTP APIs).

KEDA is proactive. It monitors external systems—queues, streams, databases—and scales workloads based on signals that represent pending work. This makes it a better fit for asynchronous and batch workloads where resource usage lags demand and is a poor proxy for backlog.

Scale-to-zero behavior

KEDA can scale workloads down to zero replicas when no events are present and scale them back up when work arrives. This eliminates idle pods entirely and is a major cost lever for intermittent workloads.

HPA cannot scale below one replica. Even with no traffic, an HPA-managed workload always consumes resources for at least one running pod, which limits cost optimization for rarely used services.

Metric sources and operational complexity

HPA supports custom and external metrics, but requires an additional adapter (commonly a Prometheus adapter) and careful cluster-level configuration. This adds operational surface area and can be restrictive, since Kubernetes allows only one Custom Metrics API per cluster.

KEDA simplifies this by acting as a metrics provider itself. It ships with dozens of built-in scalers for common event sources and avoids adapter conflicts. In platforms like Plural, these autoscaling configurations can be observed and audited centrally, making it easier to standardize scaling behavior across clusters while keeping costs predictable.

When to Use HPA vs. KEDA

Choosing between the Horizontal Pod Autoscaler and the Kubernetes Event-Driven Autoscaler depends on what actually drives scaling in your application. Neither tool is universally better—they are optimized for different workload architectures. In most real-world Kubernetes environments, especially those running a mix of synchronous and asynchronous services, you will likely use both. The goal is to align scaling mechanics with demand signals to minimize cost while preserving responsiveness.

Use HPA for predictable, resource-bound workloads

HPA is the default choice when scaling correlates directly with pod-level resource consumption. Stateless web services, REST APIs, and request-driven backends typically scale in proportion to CPU or memory pressure caused by user traffic. In these cases, HPA’s reactive model is sufficient and operationally simple.

You configure a target utilization—such as 75% average CPU—and HPA continuously adjusts replica counts to maintain that target. Because it is a native Kubernetes component, HPA is easy to reason about, easy to debug, and integrates cleanly with standard deployment workflows. For synchronous workloads where performance maps cleanly to resource usage, HPA is the correct and lowest-friction solution.

Use KEDA for event-driven and asynchronous workloads

KEDA is designed for workloads where demand is signaled externally rather than through pod resource saturation. Background workers, stream processors, and queue consumers often appear idle at the CPU level even when large backlogs exist. In these cases, HPA reacts too late.

KEDA monitors external systems directly—such as Apache Kafka, RabbitMQ, or Prometheus-backed metrics—and scales workloads based on backlog, lag, or event rate. Its scale-to-zero capability is critical for jobs that run intermittently, eliminating idle pods entirely and materially reducing cloud spend.

A practical decision framework

You can usually choose the right autoscaler by answering three questions:

First, what triggers demand? If scaling is driven by CPU or memory pressure from live traffic, use HPA. If scaling depends on external signals like queue depth or event volume, use KEDA.

Second, does the workload sit idle for long periods? If yes, KEDA’s ability to scale to zero is a decisive advantage.

Third, is this a mixed environment? Most production platforms are. A common pattern is HPA for synchronous services and KEDA for asynchronous workers. With Plural, you can observe and manage both autoscaling strategies from a single control plane, making hybrid deployments easier to operate consistently across clusters.

Using HPA and KEDA Together

Instead of viewing HPA and KEDA as competing tools, it’s more accurate to see them as complementary components of a comprehensive autoscaling strategy. KEDA is not a replacement for HPA; it is an extension. It acts as a metrics server that feeds custom and external metrics to the HPA controller, allowing it to make more intelligent scaling decisions. By design, KEDA builds upon HPA to enable fine-grained, event-driven scaling that HPA cannot achieve on its own.

This combination allows you to create a hybrid scaling approach. You can continue using HPA for resource-based scaling (CPU and memory) while leveraging KEDA to handle scaling based on external event sources like message queues or database queries. This dual strategy ensures your application is both efficient under predictable loads and responsive to sudden, event-driven demand. Managing these configurations consistently across a large number of clusters can be challenging, but a unified platform like Plural provides a single pane of glass to deploy and monitor autoscaling policies across your entire fleet.

How KEDA can feed external metrics to HPA

KEDA integrates directly with the Kubernetes Horizontal Pod Autoscaler by implementing the external metrics API. It acts as an adapter, polling external event sources—such as a RabbitMQ queue, an AWS SQS queue, or Prometheus—for relevant metrics. KEDA then translates this data into a format that HPA can understand and consume.

For example, KEDA can monitor the number of messages in a queue. It exposes this queue length as a custom metric. You then configure an HPA object to watch this metric exposed by KEDA. When the queue length exceeds a defined threshold, the HPA controller, prompted by KEDA's data, scales up the number of pods. This mechanism effectively extends HPA's abilities, allowing it to react to application-specific workloads rather than just system-level resource utilization.

Combining strategies for advanced autoscaling

A combined HPA and KEDA strategy allows you to build sophisticated autoscaling logic tailored to your application's specific needs. You can configure multiple triggers for a single workload. For instance, a data processing application might scale based on CPU utilization using a standard HPA metric, but also scale based on the number of unprocessed files in a storage bucket using a KEDA scaler.

This approach provides the best of both worlds. HPA manages the baseline scaling for general load, while KEDA handles the event-driven bursts. A key advantage KEDA introduces is the ability to scale to zero. If there are no events to process—for example, an empty message queue—KEDA can scale the corresponding deployment down to zero pods, a capability HPA lacks. This is particularly useful for reducing costs in environments with intermittent workloads. With Plural’s Continuous Deployment engine, you can automate the rollout of these complex, multi-trigger scaling configurations across all your clusters.

Common Autoscaling Challenges in Production

While both HPA and KEDA are powerful tools, implementing them in production environments introduces a unique set of challenges. Moving from a simple test case to a reliable, production-grade autoscaling strategy requires careful consideration of configuration, dependencies, and potential points of failure. Understanding these common pitfalls is the first step toward building a resilient system that scales exactly when you need it to.

HPA: The complexity of custom metrics and reactive scaling

The Horizontal Pod Autoscaler works seamlessly with standard CPU and memory metrics, but its limitations become apparent when you need more sophisticated scaling logic. To scale based on application-specific metrics, such as the number of active user sessions or items in a processing queue, you must implement the custom metrics API. This typically involves deploying and maintaining a separate metrics adapter, like the Prometheus Adapter, which adds significant operational overhead. You are now responsible for the entire metrics pipeline, from instrumenting your application to ensuring the adapter correctly queries and serves the data.

Furthermore, HPA is fundamentally reactive. It triggers scaling actions only after resource utilization has already crossed a defined threshold. For applications facing sudden, bursty traffic, this delay can lead to performance degradation or service unavailability while waiting for new pods to be provisioned and become ready.

KEDA: Network connectivity and event source configuration

KEDA’s strength lies in its extensive library of scalers, but this flexibility also introduces complexity. Each event source requires precise configuration, including connection strings, authentication credentials, and source-specific metadata. A simple typo in a ScaledObject definition can prevent KEDA from polling the event source, causing scaling to fail silently.

A more critical challenge is network connectivity. The KEDA operator must be able to communicate with external event sources like Kafka, RabbitMQ, or a cloud provider's queue service. In production environments with strict firewall rules or private networking, ensuring this connectivity can be difficult. For instance, a common failure mode occurs when the Kubernetes API server cannot reach the KEDA admission webhook due to a misconfigured network policy, as noted in the official KEDA troubleshooting guide. This dependency means your application's scalability is now tied to the availability and accessibility of an external system.

General Pitfalls: Resource conflicts and troubleshooting

Beyond the specifics of HPA and KEDA, several general challenges can complicate any autoscaling strategy. One significant architectural constraint is that a Kubernetes cluster can only have one custom metrics API server registered at a time. This can create conflicts if multiple tools or teams attempt to deploy their own metrics solutions, forcing you to standardize on a single provider.

Another common issue arises from the interaction between pod-level and node-level autoscaling. HPA or KEDA may correctly decide to scale up your application, but if the cluster has no available node capacity, the new pods will remain stuck in a Pending state. Troubleshooting this requires looking beyond the pod autoscaler to the underlying Cluster Autoscaler or Karpenter configuration. This is where Plural's embedded Kubernetes dashboard provides a unified view, allowing you to diagnose issues across the entire stack—from the ScaledObject down to the node level—without context switching.

Comparing Advantages and Limitations

Choosing between HPA and KEDA involves a trade-off between native simplicity and extended flexibility. HPA offers a reliable, built-in solution that is perfect for straightforward, resource-driven scaling. It’s part of the core Kubernetes feature set, meaning there are no additional components to install or manage, which simplifies the operational footprint.

However, this simplicity comes with limitations. KEDA, on the other hand, provides a powerful, event-driven scaling mechanism that can handle complex, asynchronous workloads that HPA cannot. It introduces more components and configuration requirements, which can increase management overhead. Understanding these trade-offs is key to selecting the right autoscaler for your specific application and operational model.

HPA: Built-in reliability vs. limited scope

The primary advantage of the Horizontal Pod Autoscaler is that it is a native Kubernetes feature. This means it’s stable, well-supported, and requires no extra installation. If your application’s scaling needs are directly tied to CPU or memory consumption, HPA is the most direct and reliable tool for the job. It’s easier to set up and manage for these common use cases, as it integrates seamlessly with the Kubernetes metrics server.

The limitation of HPA lies in its scope. While it can be configured to use custom metrics, the process of exposing those metrics via the custom metrics API can be complex. HPA is fundamentally a reactive tool that adjusts replica counts based on observed resource usage, which isn't always the best fit for workloads that need to scale proactively based on external events, like the length of a message queue.

KEDA: Unmatched flexibility vs. operational overhead

KEDA’s main strength is its incredible flexibility. By acting as a metrics server for HPA, it extends Kubernetes autoscaling to be event-driven. With a rich library of built-in scalers, KEDA can monitor event sources like Kafka, RabbitMQ, or Prometheus queries to scale deployments based on the actual event load. This allows applications to scale proactively—and even scale down to zero—ensuring efficient resource use for intermittent or unpredictable workloads.

This flexibility introduces some operational overhead. KEDA is an additional component that must be installed, configured, and maintained in your cluster. Each scaler requires specific configuration and secure access to its corresponding event source, which adds complexity. Managing these configurations consistently across a large fleet of clusters can be challenging, which is why teams often rely on a GitOps-based continuous deployment workflow to ensure every environment is configured correctly and to reduce manual effort.

Best Practices for Production Autoscaling

Deploying an autoscaler is just the first step. To ensure your applications run efficiently and reliably in production, you need to adopt a set of best practices for configuration, monitoring, and security. Effective autoscaling is not a "set it and forget it" process; it requires continuous refinement to adapt to changing workloads and application behavior. Misconfigurations can lead to either excessive costs from over-provisioning or poor performance from under-provisioning, directly impacting your bottom line and user experience. For example, setting a scaling threshold too low might cause rapid, unnecessary scaling events (known as "flapping"), while setting it too high can introduce latency as the system struggles to catch up to demand.

Properly managing your autoscaling strategy involves treating it as a critical component of your infrastructure. This means establishing clear visibility into its performance and locking down its configuration to prevent unauthorized or accidental changes. Without a solid monitoring foundation, you're flying blind, unable to tell if your scaling rules are effective or causing instability. Similarly, without proper security controls, a misconfigured autoscaler could become a vector for resource exhaustion or unauthorized workload modifications. By implementing robust monitoring and security controls, you can build a resilient system that scales predictably and securely, maintaining application availability without constant manual intervention.

Fine-tuning your configuration and monitoring

Effective autoscaling hinges on precise configuration and continuous observation. For KEDA, this means configuring it to scale based on the actual event load, which ensures resources are allocated efficiently without waste. You must verify that the ScaledObject is correctly defined and that KEDA can successfully connect to the specified event source. A broken connection or misconfigured trigger will render the autoscaler useless, leaving your application unable to respond to load changes.

Monitoring is equally critical. You need to observe how your autoscaler behaves under real-world conditions to fine-tune its thresholds and scaling increments. Track metrics like pod counts, resource utilization, and the length of your event queue over time. This data will reveal if your scaling is too aggressive, too slow, or oscillating. Plural’s unified dashboard provides a single pane of glass to monitor these metrics across your entire Kubernetes fleet, simplifying the process of identifying and resolving configuration issues.

Implementing security with RBAC

Autoscalers like HPA and KEDA require permissions to modify workloads by creating and deleting pods. These are powerful capabilities that must be secured using Kubernetes Role-Based Access Control (RBAC). KEDA integrates directly with the Kubernetes security model, meaning you can use standard RBAC policies to control who can create, modify, or delete ScaledObject resources. This is essential for preventing unauthorized changes that could disrupt your application's scaling behavior or lead to resource abuse.

Implementing RBAC is crucial for protecting your autoscaling configurations. You should define specific roles that grant only the necessary permissions to manage autoscaler objects and bind them to specific users or service accounts. For organizations managing many clusters, Plural simplifies this process by allowing you to define a Global Service that syncs a consistent set of RBAC policies across your entire fleet. This ensures that your security posture remains uniform and manageable, even as your environment grows in complexity.

Build a Smarter Autoscaling Strategy

An effective autoscaling strategy doesn't have to be a choice between HPA and KEDA. In many production environments, the most robust solution involves using both tools together to address different scaling needs. This hybrid approach allows you to build a layered, intelligent system that is both resource-efficient and highly responsive to application demand. You can rely on HPA for predictable, resource-bound workloads while leveraging KEDA for specialized, event-driven scaling scenarios.

This strategy works by assigning each tool to what it does best. Use HPA to manage the baseline scaling of your core services based on standard metrics like CPU and memory utilization. This ensures that your primary applications remain stable and performant under typical traffic patterns. At the same time, you can deploy KEDA to manage workloads that are triggered by external events, such as items in a message queue or metrics from a Prometheus query. KEDA excels at interpreting a wide range of custom metrics, making it ideal for complex, asynchronous architectures.

The primary benefit of this combined approach is a significant improvement in both cost efficiency and resource utilization. KEDA’s ability to scale workloads down to zero when there are no events to process is a key advantage, as it eliminates the cost of maintaining idle pods for intermittent jobs. When a new event arrives, KEDA automatically scales the application up to meet the demand. By combining HPA and KEDA, you create a system that maintains stability for core services while dynamically and cost-effectively managing event-driven tasks.

Managing this hybrid setup across a large fleet of clusters can introduce complexity. Plural's unified dashboard provides a single pane of glass to monitor the behavior of both HPA and KEDA scalers, giving you clear visibility into how your applications are scaling in real-time. With Plural's GitOps-based continuous deployment, you can declaratively manage and sync your HPA and KEDA configurations across all your clusters, ensuring consistency and reducing administrative overhead.

Unified Cloud Orchestration for Kubernetes

Manage Kubernetes at scale through a single, enterprise-ready platform.

GitOps Deployment
Secure Dashboards
Infrastructure-as-Code
Book a demo

Frequently Asked Questions

Can I use both HPA and KEDA to scale the same application? Yes, you can configure a single application to scale based on both resource metrics (via HPA) and event-driven metrics (via KEDA). KEDA is designed to work with HPA, not replace it. You can define multiple triggers within a single ScaledObject, including standard CPU or memory metrics alongside a KEDA scaler like queue length. The HPA controller will evaluate all triggers and scale the deployment to meet the replica count required by the most demanding trigger at any given time.

What is the "cold start" impact when KEDA scales an application from zero pods? When KEDA scales a deployment from zero, there is a delay before the first pod is ready to process events. This "cold start" time includes the interval for KEDA to poll the event source and detect a new event, the time for the HPA to react, the Kubernetes scheduler to assign the pod to a node, and the time for the container image to be pulled and the application to initialize. This latency makes the scale-to-zero feature best suited for asynchronous workloads where a delay of a few seconds to a minute is acceptable.

If KEDA is so flexible, is there any reason to use the standard HPA by itself? HPA remains the best choice for simple, resource-driven workloads. As a native Kubernetes component, it has no additional dependencies to install or manage. If your application's scaling needs are directly tied to CPU or memory usage, like a standard web server, HPA provides a stable and straightforward solution. KEDA introduces another layer of abstraction and operational overhead, so it's best reserved for when you specifically need its event-driven capabilities or scale-to-zero functionality.

What is the most common mistake teams make when implementing KEDA in production? A frequent issue is improperly configuring the polling interval and cooldown periods on the ScaledObject. Setting the polling interval too short can create unnecessary load on your event source, while setting it too long can delay scaling actions. Similarly, an incorrectly configured cooldown period can cause the scaler to react too quickly to brief fluctuations in load, leading to rapid and inefficient scaling up and down, often called "thrashing." Fine-tuning these settings requires monitoring your application's behavior under a real-world load.

How does a platform like Plural help manage autoscaling configurations? Managing HPA and KEDA configurations across dozens or hundreds of clusters can lead to inconsistencies and errors. Plural simplifies this by providing a GitOps-based continuous deployment engine to declaratively manage and sync your autoscaling policies across your entire fleet. This ensures every cluster has the correct configuration. Additionally, Plural's unified dashboard gives you a single view to monitor how all your applications are scaling, making it easier to troubleshoot issues where pod scaling might be limited by underlying node capacity.