Improved Autoscaling with KEDA

Improved Autoscaling with KEDA in Thalassa Cloud Kubernetes

KEDA (Kubernetes Event-Driven Autoscaling) lets you scale Kubernetes workloads automatically based on events and external metrics—not just CPU or memory. While the built-in Horizontal Pod Autoscaler (HPA) can adjust replicas using resource usage, KEDA makes it easy to scale your apps using metrics like queue length, message rates, database activity, or cloud events.

With KEDA in your Thalassa Cloud Kubernetes cluster, you can scale apps such as message queue consumers, API servers, or batch jobs based on real-time demand. KEDA works as an operator that creates and manages HPA resources using metrics from external systems, supporting many sources out of the box (like RabbitMQ, Kafka, PostgreSQL, MySQL, Cron, and more). You can also use custom scalers for other systems.

In short: KEDA combines its operator and built-in scalers to enable simple, flexible event-driven autoscaling for your Kubernetes workloads.

Prerequisites

Before installing KEDA, ensure you have a few things in place. First, you need a running Kubernetes cluster in Thalassa Cloud.
You’ll also need cluster access configured using kubectl. Use tcloud kubernetes connect to configure access, or set up kubeconfig manually. You’ll need cluster administrator permissions to install KEDA, as it requires creating cluster-level resources.
Finally, ensure your cluster has the metrics server installed. KEDA works with the standard Kubernetes metrics server, and Thalassa Cloud clusters include this by default. For more information about the metrics server, see the Metrics Server documentation. Verify it’s running:

kubectl get deployment metrics-server -n kube-system

Installing KEDA

The recommended way to install KEDA is using Helm, which provides a straightforward installation process and makes it easy to configure KEDA for your needs.

First, add the KEDA Helm repository:

helm repo add kedacore https://kedacore.github.io/charts
helm repo update

Install KEDA using Helm:

helm install keda kedacore/keda --namespace keda --create-namespace

This installs KEDA in the keda namespace with default settings. The installation includes the KEDA operator and all necessary CRDs.

Verify that KEDA is running:

kubectl get pods -n keda

You should see pods for keda-operator and keda-operator-metrics-apiserver. Both should be running. Check their status:

kubectl get pods -n keda
kubectl logs -n keda -l app=keda-operator

The logs should show that KEDA is running and ready to manage ScaledObjects.

Understanding ScaledObjects

KEDA uses ScaledObject resources to define how applications should scale. A ScaledObject specifies the deployment to scale, which scaler to use, and the scaling parameters.

Here’s a simple example that scales a deployment based on CPU:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: cpu-scaled-app
  namespace: default
spec:
  scaleTargetRef:
    name: my-app
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
  - type: cpu
    metricType: Utilization
    metadata:
      value: "70"

This ScaledObject scales the my-app deployment based on CPU utilization, maintaining between 1 and 10 replicas, and scaling when CPU utilization exceeds 70%.

Scaling Based on Message Queues

One of KEDA’s most common use cases is scaling message queue consumers based on queue depth. This ensures that consumers scale up when there are many messages to process and scale down when queues are empty.

For RabbitMQ, create a ScaledObject:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: rabbitmq-scaled-app
  namespace: default
spec:
  scaleTargetRef:
    name: message-consumer
  minReplicaCount: 0
  maxReplicaCount: 10
  triggers:
  - type: rabbitmq
    metadata:
      queueName: my-queue
      host: amqp://guest:password@rabbitmq.default.svc.cluster.local:5672/
      queueLength: "5"

This scales the message-consumer deployment based on the number of messages in the my-queue queue. When there are more than 5 messages, KEDA scales up replicas. When the queue is empty, it scales down to 0 replicas (since minReplicaCount is 0).

For Kafka, use the Kafka scaler:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-scaled-app
  namespace: default
spec:
  scaleTargetRef:
    name: kafka-consumer
  minReplicaCount: 1
  maxReplicaCount: 20
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka.default.svc.cluster.local:9092
      consumerGroup: my-consumer-group
      topic: my-topic
      lagThreshold: "10"

This scales based on consumer lag in a Kafka topic, ensuring that consumers keep up with message production.

Scaling Based on HTTP Request Rate

KEDA can scale applications based on HTTP request rates from external monitoring systems. This is useful for API servers that need to scale based on incoming request volume.

For Prometheus metrics, create a ScaledObject:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: prometheus-scaled-app
  namespace: default
spec:
  scaleTargetRef:
    name: api-server
  minReplicaCount: 2
  maxReplicaCount: 50
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
      metricName: http_requests_per_second
      threshold: "100"
      query: sum(rate(http_requests_total[2m]))

This scales the api-server deployment based on HTTP request rate as measured by Prometheus. When requests exceed 100 per second, KEDA scales up replicas.

Scaling Based on Schedule (Cron)

KEDA’s cron scaler allows you to scale applications based on time schedules, which is useful for scaling down workloads during off-hours to save resources or scaling up during expected peak times.

A common use case is scaling down non-critical workloads outside of office hours. For example, you might want to scale development or testing environments to zero during nights and weekends, or scale down batch processing jobs when they’re not needed.

Here’s an example of using a cron trigger to scale an application during business hours (9 AM to 5 PM, Monday to Friday), and scale down to zero at all other times:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: office-hours-scaled-app
  namespace: default
spec:
  scaleTargetRef:
    name: batch-processor
  minReplicaCount: 0
  maxReplicaCount: 10
  triggers:
  - type: cron
    metadata:
      timezone: UTC
      start: "0 9 * * 1-5"      # 9 AM Monday to Friday
      end: "0 17 * * 1-5"       # 5 PM Monday to Friday
      desiredReplicas: "5"
  - type: cron
    metadata:
      timezone: UTC
      start: "0 17 * * 1-5"    # 5 PM Monday to Friday
      end: "0 9 * * 1-5"       # 9 AM Monday to Friday (next day)
      desiredReplicas: "0"

This configuration keeps the application running with 5 replicas during business hours and scales it down to 0 at all other times.

You can combine cron scaling with other scalers. For example, you might use cron to ensure a minimum number of replicas during business hours, while allowing event-based scaling to increase replicas when needed:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: hybrid-scaled-app
  namespace: default
spec:
  scaleTargetRef:
    name: api-server
  minReplicaCount: 2
  maxReplicaCount: 20
  triggers:
  - type: cron
    metadata:
      timezone: UTC
      start: "0 0 * * 0,6"     # Midnight Saturday and Sunday
      end: "0 0 * * 1"         # Midnight Monday
      minReplicaCount: "0"
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
      metricName: http_requests_per_second
      threshold: "100"
      query: sum(rate(http_requests_total[2m]))

This configuration ensures at least 2 replicas during weekdays, scales to 0 on weekends, and allows Prometheus-based scaling to increase replicas above the minimum when request rates are high.

Cron Scaling Use Cases

Cron-based scaling is particularly useful for development and testing environments where workloads don’t need to run continuously, batch processing jobs that run on schedules, and cost optimization by reducing resource usage during off-hours.

Integrating with Thalassa Cloud

KEDA works well with Thalassa Cloud’s autoscaling features. While Thalassa Cloud provides the Node Pool Autoscaler for scaling nodes, KEDA handles pod-level scaling based on events. You can use both together: KEDA scales pods based on events, and the Node Pool Autoscaler adds nodes when pods can’t be scheduled due to resource constraints.

Best Practices

Following best practices helps you use KEDA effectively and maintain reliable autoscaling.

Start with conservative scaling parameters: Set reasonable minReplicaCount and maxReplicaCount values based on your application’s needs. Avoid setting maximums too high initially, as this can lead to resource exhaustion.
Use appropriate threshold values: Threshold values determine when scaling occurs. Set them based on your application’s capacity and performance characteristics.

Scaling to Zero

Consider scaling to zero. For workloads that can start quickly, consider setting minReplicaCount to 0. This allows KEDA to scale applications to zero when there are no events, saving resources. Ensure your applications can handle cold starts appropriately.

Test scaling behavior: Before deploying to production, test how applications scale under various load conditions. This helps you verify that scaling parameters are appropriate.

Extra Information

To learn more, see the KEDA docs or the Thalassa Cloud guides on Horizontal Pod Autoscaling, Node Pool Autoscaler, Node Health, and Nodes.

Deploying with Terraform Resize Persistent Volume