Node Pool Autoscaler
Thalassa Cloud uses the upstream Kubernetes Cluster Autoscaler to automatically adjust the number of nodes in a node pool based on workload demand. Autoscaling is configured per node pool.
What the autoscaler does
- Monitors unschedulable Pods and node utilization
- Scales node pools up when Pods are pending due to insufficient resources
- Scales node pools down when nodes are underutilized and workloads can be consolidated
- Respects the configured minimum and maximum replica bounds of the node pool
Enabling autoscaling
Enable autoscaling on your node pool in the Thalassa Cloud Console, via API/CLI or through IaC such as Terraform.
Required settings:
- Autoscaling: Enabled
- Minimum replicas: The minimum number of nodes that will always be running. Can be set to
0
to allow the pool to scale down completely. - Maximum replicas: The maximum number of nodes that the autoscaler can create for the node pool.
Note: When the minimum is set to
0
, the node pool may scale to zero if there are no schedulable workloads requiring nodes from that pool.
How scaling decisions are made
The autoscaler reacts to pending Pods that cannot be scheduled due to resource constraints (CPU/memory, node selectors, taints/tolerations, topology). The autoscaler considers the node pool’s configured instance type and constraints to determine if adding a node resolves the scheduling bottleneck.
Scale-down occurs when nodes are underutilized and Pods can be rescheduled onto other nodes within the pool without violating Pod disruption budgets or anti-affinity rules.
Per-node-pool autoscaler annotations
You can fine-tune the autoscaler behavior per node pool using annotations on the node pool resource (not to be confused with node annotations). The following annotations are supported:
Annotation | Type | Description |
---|---|---|
cluster-autoscaler.kubernetes.io/scale-down-utilization-threshold | float | Target utilization threshold for considering nodes unneeded during scale-down. |
cluster-autoscaler.kubernetes.io/scale-down-gpu-utilization-threshold | float | GPU utilization threshold for considering GPU nodes unneeded during scale-down. |
cluster-autoscaler.kubernetes.io/scale-down-unneeded-time | duration | How long a node should be underutilized before considered for scale-down (e.g. 10m , 30m ). |
cluster-autoscaler.kubernetes.io/scale-down-unready-time | duration | How long an unready node should be unready before considered for scale-down. |
cluster-autoscaler.kubernetes.io/zero-or-max-node-scaling | bool | If true, scales a node pool either to zero or directly to the max size in certain scenarios. |
cluster-autoscaler.kubernetes.io/ignore-daemonsets-utilization | bool | If true, excludes DaemonSet Pods from utilization calculations for scale-down. |
Example
apiVersion: k8s.thalassa.cloud/v1
kind: NodePool
metadata:
name: general-pool
annotations:
cluster-autoscaler.kubernetes.io/scale-down-utilization-threshold: "0.5"
cluster-autoscaler.kubernetes.io/scale-down-unneeded-time: "10m"
cluster-autoscaler.kubernetes.io/ignore-daemonsets-utilization: "true"
spec:
autoscaling:
enabled: true
minReplicas: 0
maxReplicas: 10
Best practices
- Set realistic min/max bounds that align with your workload patterns and budget
- Use multiple node pools to separate workloads with different requirements (e.g., general purpose vs. GPU)
- Ensure Pod requests/limits are set so the scheduler and autoscaler can make accurate decisions
- Configure PodDisruptionBudgets (PDBs) to protect critical workloads during scale-down
- Use taints and tolerations to control which Pods can land on which node pools
Troubleshooting
- Pods remain Pending but no scale-up happens:
- Verify Pod resource requests, node selectors, tolerations, and topology constraints match the node pool
- Check that max replicas is high enough
- Ensure autoscaling is enabled for the node pool
- You might have reached your Organisation’s CPU or Memory quota. See if you need to request a quota increase.
- Scale-down does not occur:
- Nodes may be blocked by PDBs or anti-affinity
- Workloads may fully utilize the nodes