Strategy

Kubernetes Cost Optimization: Right‑Size Pods & Autoscaling

June 21, 2026·4 min read·CloudBudgetMaster

Understand Your Current Pod Utilization

Before you can trim anything, you need data.

# Install the metrics‑server if it isn’t already
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Show CPU & memory usage for every pod in the default namespace
kubectl top pods --namespace default

Export the output to CSV for deeper analysis:

kubectl top pods --no-headers | awk '{print $1","$2","$3}' > pod_usage.csv

Look for pods that consistently run at <20 % of their requested CPU or memory. Those are prime candidates for right‑sizing.

Set Accurate Requests and Limits

Kubernetes schedules pods based on requests, not limits. Over‑requesting forces the scheduler to allocate more node capacity than needed.

Identify the 95th‑percentile usage for each container (use the CSV from the previous step or a tool like Prometheus).
Update the deployment manifest:

resources:
  requests:
    cpu: "250m"   # 0.25 vCPU, based on observed 95th‑percentile
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Apply the change without downtime:

kubectl set resources deployment my-app \
  --containers=* \
  --requests=cpu=250m,memory=256Mi \
  --limits=cpu=500m,memory=512Mi

Repeat for each over‑provisioned deployment. Keep limits tight enough to avoid OOM kills but generous enough for traffic spikes.

Enable the Cluster Autoscaler

Right‑sizing pods reduces per‑node pressure, but you still need the cluster to add or remove nodes automatically.

Amazon EKS

eksctl utils associate-iam-oidc --region us-east-1 --cluster my‑eks‑cluster --approve
eksctl create iamserviceaccount \
  --cluster my‑eks‑cluster \
  --namespace kube-system \
  --name cluster‑autoscaler \
  --attach-policy-arn arn:aws:iam::aws:policy/AutoScalingFullAccess \
  --override-existing-serviceaccounts \
  --approve

Add the autoscaler deployment (replace <max‑size> and <min‑size> with your desired limits):

kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
kubectl -n kube-system set env deployment/cluster‑autoscaler \
  --containers=cluster‑autoscaler \
  --env="AWS_REGION=us-east-1" \
  --env="CLUSTER_NAME=my‑eks‑cluster" \
  --env="MAX_NODES=15" \
  --env="MIN_NODES=3"

Google GKE

gcloud container clusters update my‑gke‑cluster \
  --enable-autoscaling \
  --min-nodes=3 \
  --max-nodes=15 \
  --node-pool=default-pool

If you use multiple node pools (e.g., standard vs. spot), enable autoscaling per pool.

Azure AKS

az aks nodepool update \
  --resource-group my‑rg \
  --cluster-name my‑aks \
  --name nodepool1 \
  --enable-cluster-autoscaler \
  --min-count 3 \
  --max-count 15

The autoscaler watches pending pods. If a pod cannot be scheduled because of insufficient resources, it adds a node; when nodes become under‑utilized for a configurable period (default 10 minutes), it scales them down.

Use Vertical Pod Autoscaler (VPA) for Fine‑Grained Adjustments

VPA continuously recommends new request values based on actual usage. It works well for stable workloads that don’t need rapid scaling.

kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vertical-pod-autoscaler.yaml

Create a VPA object for a deployment:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       my-app
  updatePolicy:
    updateMode: "Auto"

The VPA will patch the deployment’s resources.requests automatically. Pair VPA with the Horizontal Pod Autoscaler (HPA)—VPA adjusts size, HPA adjusts replica count.

Audit Node‑Pool Pricing and Use Spot/Preemptible Instances

Even with perfect pod sizing, the underlying VMs dominate cost.

AWS: Use mixed‑instance node groups with a 70 % On‑Demand / 30 % Spot split. Example with eksctl: bash eksctl create nodegroup \ --cluster my‑eks‑cluster \ --name spot‑group \ --instance-types t3.large,m5.large \ --spot
GCP: Create a node pool that uses preemptible VMs: bash gcloud container node-pools create preemptible-pool \ --cluster=my‑gke‑cluster \ --preemptible \ --machine-type=e2-standard-4 \ --num-nodes=2
Azure: Enable Spot VMs in a node pool: bash az aks nodepool add \ --resource-group my‑rg \ --cluster-name my‑aks \ --name spotpool \ --node-vm-size Standard_D2s_v3 \ --priority Spot \ --eviction-policy Delete \ --max-count 10 \ --min-count 2 Monitor eviction rates; if they exceed 10 % you may need to increase the On‑Demand share.

Continuous Monitoring and Alerts

Automation stops at detection; you need a feedback loop.

Prometheus alerts for pods running >80 % of their request for >15 minutes: ```yaml
alert: PodHighUtilization expr: sum(rate(container_cpu_usage_seconds_total{namespace="default"}[5m])) by (pod) / sum(kube_pod_container_resource_requests_cpu_cores) by (pod) > 0.8 for: 15m labels: severity: warning annotations: summary: "Pod {{ $labels.pod }} is using >80% of its CPU request" ```
Kubernetes Event Exporter can forward autoscaler scale‑up/scale‑down events to Slack or email.

By regularly reviewing these alerts, you keep pod requests tight and the autoscaler responsive, preventing both over‑provisioned nodes and unnecessary scale‑up spikes.

How CloudBudgetMaster helps – Our platform continuously scans your clusters, flags pods with low request‑to‑usage ratios, shows the exact dollar impact, and auto‑generates the kubectl set resources commands needed to right‑size them. It also visualizes autoscaler activity across EKS, GKE, and AKS so you can see savings in real time.