Brief Background: What is KEDA?
Kubernetes Event-Driven Autoscaling (KEDA) is an autoscaler that can be deployed into any Kubernetes cluster. It works on top of the standard Horizontal Pod Autoscaler (HPA) to extend functionality. Autoscaling exists to dynamically change how many replicas are running depending on current demand. There are two main reasons why you’d want to use KEDA over Kubernetes Horizontal Pod Autoscaler:
- You can scale to 0: Cool.
- You can scale off almost any metric: The list of built-in scalers is massive, and you can even build custom ones. If a metric exists, KEDA can probably scale on it.
The Silly Dream: Scaling to 0
I so badly just wanted to scale everything to 0. It’s so cool and felt like an easy win for cost savings, with the only drawback being a slight startup and activation window delay.
But realistically, unless you’re being incredibly scrappy, few circumstances warrant scaling to 0. Most production workflows need to be “always-on,” and your staging environments should mirror production as closely as possible, so those stay up too. That leaves internal workloads, but those are often small enough that scaling to 0 doesn’t provide a meaningful win—especially if it isn’t enough to actually drop your node count.
However, I’ve seen some success with scaling to 0 in these spots:
- Infrequent, high-resource workloads: Rather than tearing down the infra every time, just let KEDA drop the replicas to 0 until the next job hits.
- Message Queues: Specifically ones that can afford a 10–30 second delay while the pod spins up after a message becomes visible.
The Shiny: Scaling Off Any Metric
This is where KEDA really stands out. Traditional HPA limits you to scaling on CPU or memory, which is inherently reactionary. By the time your CPU spikes, the load is already there, you’re already falling behind, and your app is struggling. Depending on how long your “Time to Ready” is, scaling this way can be too slow.
Take the classic example of high-demand ticket sales: if you wait for the CPU to spike to add replicas, your app is likely already falling over. KEDA allows you to scale ahead of the resource pressure by looking at the source of the traffic (like the number of active sessions or messages in a queue).
Alternatively, your constraints might not be resource-based at all. If your workload can only process x threads at a time, you might have a massive backlog even while your CPU usage looks perfectly fine.
The “Devil in the Details”
I’m a huge fan of KEDA, honestly, it’s a no-brainer to use. But the “devil in the details” is the configuration. Using this tool effectively is knowing how to configure your scaler well, and it’s not always easy.
You think it’s going to be simple:
- The Plan: “I have an app, I have a latency metric, I want to scale on latency. Let’s just set a threshold of 20ms.”
- The Reality: “Oh, this metric is spikier than I thought! The pods are flapping (constantly scaling up and down).”
- The Fix: “Okay, let me just increase the
stabilizationWindowSeconds. That should ignore the spikes.” - The Result: “Now it’s flapping every 30 minutes instead of every 5!”
- The Next Fix: “I’ll just add a rollup to smooth out the metric.”
- The Result: “The rollup didn’t help much, and now I’ve lost precision because the data is averaged over too long a window. I’ll just slow down the scale-up policy.”
- The Final Realization: “Great, now it’s still flapping, but over a 1-hour window. Maybe the
minReplicasis just too low? If we scale too far down, the latency climbs back up instantly!”
And on, and on, and on… Between each of these steps are hours to days of monitoring to see how the changes actually behave in the wild.
Final Thoughts
KEDA gives you incredible flexibility, but it’s easy to get lost in the weeds, especially with volatile workloads. Before you know it, you’re manually calculating every step of the HPA formula just to understand why your replica count is what it is.
Ah metrics, the bane of my existence, our love-hate relationship will continue forever on. KEDA is still an awesome tool, and for the majority of workloads, you can “set and forget” it (just kidding, always monitor; dashboards, alerts, something something). I just wanted to share my tale of the trials involved