Kubernetes Cost Optimization: Right‑Size Pods & Autoscaling
Understand Your Current Pod Utilization
Before you can trim anything, you need data.
# Install the metrics‑server if it isn’t already
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Show CPU & memory usage for every pod in the default namespace
kubectl top pods --namespace default
Export the output to CSV for deeper analysis:
kubectl top pods --no-headers | awk '{print $1","$2","$3}' > pod_usage.csv
Look for pods that consistently run at <20 % of their requested CPU or memory. Those are prime candidates for right‑sizing.
Set Accurate Requests and Limits
Kubernetes schedules pods based on requests, not limits. Over‑requesting forces the scheduler to allocate more node capacity than needed.
- Identify the 95th‑percentile usage for each container (use the CSV from the previous step or a tool like Prometheus).
- Update the deployment manifest:
resources:
requests:
cpu: "250m" # 0.25 vCPU, based on observed 95th‑percentile
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
Apply the change without downtime:
kubectl set resources deployment my-app \
--containers=* \
--requests=cpu=250m,memory=256Mi \
--limits=cpu=500m,memory=512Mi
Repeat for each over‑provisioned deployment. Keep limits tight enough to avoid OOM kills but generous enough for traffic spikes.
Enable the Cluster Autoscaler
Right‑sizing pods reduces per‑node pressure, but you still need the cluster to add or remove nodes automatically.
Amazon EKS
eksctl utils associate-iam-oidc --region us-east-1 --cluster my‑eks‑cluster --approve
eksctl create iamserviceaccount \
--cluster my‑eks‑cluster \
--namespace kube-system \
--name cluster‑autoscaler \
--attach-policy-arn arn:aws:iam::aws:policy/AutoScalingFullAccess \
--override-existing-serviceaccounts \
--approve
Add the autoscaler deployment (replace <max‑size> and <min‑size> with your desired limits):
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
kubectl -n kube-system set env deployment/cluster‑autoscaler \
--containers=cluster‑autoscaler \
--env="AWS_REGION=us-east-1" \
--env="CLUSTER_NAME=my‑eks‑cluster" \
--env="MAX_NODES=15" \
--env="MIN_NODES=3"
Google GKE
gcloud container clusters update my‑gke‑cluster \
--enable-autoscaling \
--min-nodes=3 \
--max-nodes=15 \
--node-pool=default-pool
If you use multiple node pools (e.g., standard vs. spot), enable autoscaling per pool.
Azure AKS
az aks nodepool update \
--resource-group my‑rg \
--cluster-name my‑aks \
--name nodepool1 \
--enable-cluster-autoscaler \
--min-count 3 \
--max-count 15
The autoscaler watches pending pods. If a pod cannot be scheduled because of insufficient resources, it adds a node; when nodes become under‑utilized for a configurable period (default 10 minutes), it scales them down.
Use Vertical Pod Autoscaler (VPA) for Fine‑Grained Adjustments
VPA continuously recommends new request values based on actual usage. It works well for stable workloads that don’t need rapid scaling.
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml
Create a VPA object for a deployment:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
The VPA will patch the deployment’s resources.requests automatically. Pair VPA with the Horizontal Pod Autoscaler (HPA)—VPA adjusts size, HPA adjusts replica count.
Audit Node‑Pool Pricing and Use Spot/Preemptible Instances
Even with perfect pod sizing, the underlying VMs dominate cost.
- AWS: Use mixed‑instance node groups with a 70 % On‑Demand / 30 % Spot split. Example with
eksctl:bash eksctl create nodegroup \ --cluster my‑eks‑cluster \ --name spot‑group \ --instance-types t3.large,m5.large \ --spot - GCP: Create a node pool that uses preemptible VMs:
bash gcloud container node-pools create preemptible-pool \ --cluster=my‑gke‑cluster \ --preemptible \ --machine-type=e2-standard-4 \ --num-nodes=2 - Azure: Enable Spot VMs in a node pool:
bash az aks nodepool add \ --resource-group my‑rg \ --cluster-name my‑aks \ --name spotpool \ --node-vm-size Standard_D2s_v3 \ --priority Spot \ --eviction-policy Delete \ --max-count 10 \ --min-count 2Monitor eviction rates; if they exceed 10 % you may need to increase the On‑Demand share.
Continuous Monitoring and Alerts
Automation stops at detection; you need a feedback loop.
- Prometheus alerts for pods running >80 % of their request for >15 minutes: ```yaml
- alert: PodHighUtilization expr: sum(rate(container_cpu_usage_seconds_total{namespace="default"}[5m])) by (pod) / sum(kube_pod_container_resource_requests_cpu_cores) by (pod) > 0.8 for: 15m labels: severity: warning annotations: summary: "Pod {{ $labels.pod }} is using >80% of its CPU request" ```
- Kubernetes Event Exporter can forward autoscaler scale‑up/scale‑down events to Slack or email.
By regularly reviewing these alerts, you keep pod requests tight and the autoscaler responsive, preventing both over‑provisioned nodes and unnecessary scale‑up spikes.
How CloudBudgetMaster helps – Our platform continuously scans your clusters, flags pods with low request‑to‑usage ratios, shows the exact dollar impact, and auto‑generates the kubectl set resources commands needed to right‑size them. It also visualizes autoscaler activity across EKS, GKE, and AKS so you can see savings in real time.
CloudBudgetMaster