Node Pool Autoscaler

Thalassa Cloud uses the upstream Kubernetes Cluster Autoscaler to automatically adjust the number of nodes in a node pool based on workload demand. Autoscaling is configured per node pool.

What the autoscaler does

Monitors unschedulable Pods and node utilization
Scales node pools up when Pods are pending due to insufficient resources
Scales node pools down when nodes are underutilized and workloads can be consolidated
Respects the configured minimum and maximum replica bounds of the node pool

Enabling autoscaling

Enable autoscaling on your node pool in the Thalassa Cloud Console, via API/CLI or through IaC such as Terraform.

Required settings:

Autoscaling: Enabled
Minimum replicas: The minimum number of nodes that will always be running. Can be set to 0 to allow the pool to scale down completely.
Maximum replicas: The maximum number of nodes that the autoscaler can create for the node pool.

Note: When the minimum is set to 0, the node pool may scale to zero if there are no schedulable workloads requiring nodes from that pool.

How scaling decisions are made

The autoscaler reacts to pending Pods that cannot be scheduled due to resource constraints (CPU/memory, node selectors, taints/tolerations, topology). The autoscaler considers the node pool’s configured instance type and constraints to determine if adding a node resolves the scheduling bottleneck.

Scale-down occurs when nodes are underutilized and Pods can be rescheduled onto other nodes within the pool without violating Pod disruption budgets or anti-affinity rules.

Per-node-pool autoscaler annotations

You can fine-tune the autoscaler behavior per node pool using annotations on the node pool resource (not to be confused with node annotations). The following annotations are supported:

Annotation	Type	Description
`cluster-autoscaler.kubernetes.io/scale-down-utilization-threshold`	float	Target utilization threshold for considering nodes unneeded during scale-down.
`cluster-autoscaler.kubernetes.io/scale-down-gpu-utilization-threshold`	float	GPU utilization threshold for considering GPU nodes unneeded during scale-down.
`cluster-autoscaler.kubernetes.io/scale-down-unneeded-time`	duration	How long a node should be underutilized before considered for scale-down (e.g. `10m`, `30m`).
`cluster-autoscaler.kubernetes.io/scale-down-unready-time`	duration	How long an unready node should be unready before considered for scale-down.
`cluster-autoscaler.kubernetes.io/zero-or-max-node-scaling`	bool	If true, scales a node pool either to zero or directly to the max size in certain scenarios.
`cluster-autoscaler.kubernetes.io/ignore-daemonsets-utilization`	bool	If true, excludes DaemonSet Pods from utilization calculations for scale-down.

Example

apiVersion: k8s.thalassa.cloud/v1
kind: NodePool
metadata:
  name: general-pool
  annotations:
    cluster-autoscaler.kubernetes.io/scale-down-utilization-threshold: "0.5"
    cluster-autoscaler.kubernetes.io/scale-down-unneeded-time: "10m"
    cluster-autoscaler.kubernetes.io/ignore-daemonsets-utilization: "true"
spec:
  autoscaling:
    enabled: true
    minReplicas: 0
    maxReplicas: 10

Best practices

Set realistic min/max bounds that align with your workload patterns and budget
Use multiple node pools to separate workloads with different requirements (e.g., general purpose vs. GPU)
Ensure Pod requests/limits are set so the scheduler and autoscaler can make accurate decisions
Configure PodDisruptionBudgets (PDBs) to protect critical workloads during scale-down
Use taints and tolerations to control which Pods can land on which node pools

Troubleshooting

Pods remain Pending but no scale-up happens:
- Verify Pod resource requests, node selectors, tolerations, and topology constraints match the node pool
- Check that max replicas is high enough
- Ensure autoscaling is enabled for the node pool
- You might have reached your Organisation’s CPU or Memory quota. See if you need to request a quota increase.
Scale-down does not occur:
- Nodes may be blocked by PDBs or anti-affinity
- Workloads may fully utilize the nodes

Runtime Classes