Improved Autoscaling with KEDA in Thalassa Cloud Kubernetes
KEDA (Kubernetes Event-Driven Autoscaling) lets you scale Kubernetes workloads automatically based on events and external metrics—not just CPU or memory. While the built-in Horizontal Pod Autoscaler (HPA) can adjust replicas using resource usage, KEDA makes it easy to scale your apps using metrics like queue length, message rates, database activity, or cloud events.
With KEDA in your Thalassa Cloud Kubernetes cluster, you can scale apps such as message queue consumers, API servers, or batch jobs based on real-time demand. KEDA works as an operator that creates and manages HPA resources using metrics from external systems, supporting many sources out of the box (like RabbitMQ, Kafka, PostgreSQL, MySQL, Cron, and more). You can also use custom scalers for other systems.
In short: KEDA combines its operator and built-in scalers to enable simple, flexible event-driven autoscaling for your Kubernetes workloads.
Prerequisites
- Before installing KEDA, ensure you have a few things in place. First, you need a running Kubernetes cluster in Thalassa Cloud.
- You’ll also need cluster access configured using
kubectl. Usetcloud kubernetes connectto configure access, or set up kubeconfig manually. You’ll need cluster administrator permissions to install KEDA, as it requires creating cluster-level resources. - Finally, ensure your cluster has the metrics server installed. KEDA works with the standard Kubernetes metrics server, and Thalassa Cloud clusters include this by default. For more information about the metrics server, see the Metrics Server documentation. Verify it’s running:
kubectl get deployment metrics-server -n kube-systemInstalling KEDA
The recommended way to install KEDA is using Helm, which provides a straightforward installation process and makes it easy to configure KEDA for your needs.
First, add the KEDA Helm repository:
helm repo add kedacore https://kedacore.github.io/charts
helm repo updateInstall KEDA using Helm:
helm install keda kedacore/keda --namespace keda --create-namespaceThis installs KEDA in the keda namespace with default settings. The installation includes the KEDA operator and all necessary CRDs.
Verify that KEDA is running:
kubectl get pods -n kedaYou should see pods for keda-operator and keda-operator-metrics-apiserver. Both should be running. Check their status:
kubectl get pods -n keda
kubectl logs -n keda -l app=keda-operatorThe logs should show that KEDA is running and ready to manage ScaledObjects.
Understanding ScaledObjects
KEDA uses ScaledObject resources to define how applications should scale. A ScaledObject specifies the deployment to scale, which scaler to use, and the scaling parameters.
Here’s a simple example that scales a deployment based on CPU:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: cpu-scaled-app
namespace: default
spec:
scaleTargetRef:
name: my-app
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: cpu
metricType: Utilization
metadata:
value: "70"This ScaledObject scales the my-app deployment based on CPU utilization, maintaining between 1 and 10 replicas, and scaling when CPU utilization exceeds 70%.
Scaling Based on Message Queues
One of KEDA’s most common use cases is scaling message queue consumers based on queue depth. This ensures that consumers scale up when there are many messages to process and scale down when queues are empty.
For RabbitMQ, create a ScaledObject:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: rabbitmq-scaled-app
namespace: default
spec:
scaleTargetRef:
name: message-consumer
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: rabbitmq
metadata:
queueName: my-queue
host: amqp://guest:[email protected]:5672/
queueLength: "5"This scales the message-consumer deployment based on the number of messages in the my-queue queue. When there are more than 5 messages, KEDA scales up replicas. When the queue is empty, it scales down to 0 replicas (since minReplicaCount is 0).
For Kafka, use the Kafka scaler:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-scaled-app
namespace: default
spec:
scaleTargetRef:
name: kafka-consumer
minReplicaCount: 1
maxReplicaCount: 20
triggers:
- type: kafka
metadata:
bootstrapServers: kafka.default.svc.cluster.local:9092
consumerGroup: my-consumer-group
topic: my-topic
lagThreshold: "10"This scales based on consumer lag in a Kafka topic, ensuring that consumers keep up with message production.
Scaling Based on HTTP Request Rate
KEDA can scale applications based on HTTP request rates from external monitoring systems. This is useful for API servers that need to scale based on incoming request volume.
For Prometheus metrics, create a ScaledObject:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: prometheus-scaled-app
namespace: default
spec:
scaleTargetRef:
name: api-server
minReplicaCount: 2
maxReplicaCount: 50
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
metricName: http_requests_per_second
threshold: "100"
query: sum(rate(http_requests_total[2m]))This scales the api-server deployment based on HTTP request rate as measured by Prometheus. When requests exceed 100 per second, KEDA scales up replicas.
Scaling Based on Schedule (Cron)
KEDA’s cron scaler allows you to scale applications based on time schedules, which is useful for scaling down workloads during off-hours to save resources or scaling up during expected peak times.
A common use case is scaling down non-critical workloads outside of office hours. For example, you might want to scale development or testing environments to zero during nights and weekends, or scale down batch processing jobs when they’re not needed.
Here’s an example of using a cron trigger to scale an application during business hours (9 AM to 5 PM, Monday to Friday), and scale down to zero at all other times:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: office-hours-scaled-app
namespace: default
spec:
scaleTargetRef:
name: batch-processor
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: cron
metadata:
timezone: UTC
start: "0 9 * * 1-5" # 9 AM Monday to Friday
end: "0 17 * * 1-5" # 5 PM Monday to Friday
desiredReplicas: "5"
- type: cron
metadata:
timezone: UTC
start: "0 17 * * 1-5" # 5 PM Monday to Friday
end: "0 9 * * 1-5" # 9 AM Monday to Friday (next day)
desiredReplicas: "0"This configuration keeps the application running with 5 replicas during business hours and scales it down to 0 at all other times.
You can combine cron scaling with other scalers. For example, you might use cron to ensure a minimum number of replicas during business hours, while allowing event-based scaling to increase replicas when needed:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: hybrid-scaled-app
namespace: default
spec:
scaleTargetRef:
name: api-server
minReplicaCount: 2
maxReplicaCount: 20
triggers:
- type: cron
metadata:
timezone: UTC
start: "0 0 * * 0,6" # Midnight Saturday and Sunday
end: "0 0 * * 1" # Midnight Monday
minReplicaCount: "0"
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
metricName: http_requests_per_second
threshold: "100"
query: sum(rate(http_requests_total[2m]))This configuration ensures at least 2 replicas during weekdays, scales to 0 on weekends, and allows Prometheus-based scaling to increase replicas above the minimum when request rates are high.
Cron Scaling Use Cases
Cron-based scaling is particularly useful for development and testing environments where workloads don’t need to run continuously, batch processing jobs that run on schedules, and cost optimization by reducing resource usage during off-hours.
Integrating with Thalassa Cloud
KEDA works well with Thalassa Cloud’s autoscaling features. While Thalassa Cloud provides the Node Pool Autoscaler for scaling nodes, KEDA handles pod-level scaling based on events. You can use both together: KEDA scales pods based on events, and the Node Pool Autoscaler adds nodes when pods can’t be scheduled due to resource constraints.
Best Practices
Following best practices helps you use KEDA effectively and maintain reliable autoscaling.
Start with conservative scaling parameters: Set reasonable
minReplicaCountandmaxReplicaCountvalues based on your application’s needs. Avoid setting maximums too high initially, as this can lead to resource exhaustion.Use appropriate threshold values: Threshold values determine when scaling occurs. Set them based on your application’s capacity and performance characteristics.
Scaling to Zero
Consider scaling to zero. For workloads that can start quickly, consider setting minReplicaCount to 0. This allows KEDA to scale applications to zero when there are no events, saving resources. Ensure your applications can handle cold starts appropriately.
- Test scaling behavior: Before deploying to production, test how applications scale under various load conditions. This helps you verify that scaling parameters are appropriate.
Extra Information
To learn more, see the KEDA docs or the Thalassa Cloud guides on Horizontal Pod Autoscaling, Node Pool Autoscaler, Node Health, and Nodes.