Right‑sizing EC2 Instances: A Step‑by‑Step Guide to Stop Overpaying
1. Pull the data you actually need
Before you can decide whether an instance is oversized, you need hard numbers. Two AWS services give you the most reliable view:
- Amazon CloudWatch – provides per‑minute CPU, NetworkIn/Out, DiskRead/Write, and custom metrics.
- AWS Compute Optimizer – analyzes historical utilization and returns concrete instance‑type recommendations.
Quick CLI pull of recent CPU and network usage
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-0abcd1234efgh5678 \
--statistics Average \
--period 300 \
--start-time $(date -u -d '-7 days' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--output json > cpu.json
Replace the instance ID with your own. Do the same for NetworkIn and NetworkOut if network bandwidth is a concern.
Pull Compute Optimizer recommendations
aws compute-optimizer get-recommendations \
--service EC2 \
--account-ids 123456789012 \
--region us-east-1 \
--output json > optimizer.json
The JSON contains recommendationOptions sorted by projected monthly savings. Keep it handy for the next step.
2. Identify truly under‑utilized instances
A common mistake is to look at a single metric (e.g., CPU < 10 %) and assume the instance can be downsized. Instead, evaluate a balanced set:
- CPUUtilization – average < 20 % over the last 30 days?
- Memory – not directly visible in CloudWatch unless you installed the CloudWatch Agent. If you have it, look for average < 30 %.
- Network – sustained bandwidth < 10 % of the instance’s ENI limit.
- EBS IOPS – check
VolumeReadOps/VolumeWriteOpsif you’re using provisioned IOPS.
Create a simple spreadsheet or use jq to extract the top 5 instances where all three metrics are low:
jq -r '.recommendations[] | select(.recommendationOptions[0].performanceRisk=="Low") | .instanceArn, .recommendationOptions[0].instanceType' optimizer.json
Those are your primary candidates for right‑sizing.
3. Choose the target instance type
When you have a candidate, compare the current type with the recommended one:
| Current | vCPU | Memory (GiB) | Network | Storage | Recommended |
|---|---|---|---|---|---|
m5.large |
2 | 8 | Up to 10 Gbps | EBS‑only | t3.medium |
c5.xlarge |
4 | 8 | Up to 10 Gbps | EBS‑only | c5.large |
Use the AWS CLI to list specs for any type:
aws ec2 describe-instance-types \
--instance-types t3.medium c5.large \
--query 'InstanceTypes[*].{Type:InstanceType,VCpus:VCpuInfo.DefaultVCpus,Memory:MemoryInfo.SizeInMiB}' \
--output table
Pick the smallest type that still meets all three thresholds: * CPU – at least 1 vCPU per 2 GHz of sustained load. * Memory – 1 GiB per 1 % of average memory usage (conservative rule of thumb). * Network – 1 Gbps per 10 % of observed throughput.
4. Resize safely with the CLI
EC2 instance type changes require a stop/start cycle for most families. Follow these steps:
- Stop the instance – this avoids data loss for instance‑store volumes.
bash aws ec2 stop-instances --instance-ids i-0abcd1234efgh5678 - Wait for the stopped state
bash aws ec2 wait instance-stopped --instance-ids i-0abcd1234efgh5678 - Modify the instance type
bash aws ec2 modify-instance-attribute \ --instance-id i-0abcd1234efgh5678 \ --instance-type "{\"Value\":\"t3.medium\"}" - Start the instance
bash aws ec2 start-instances --instance-ids i-0abcd1234efgh5678 - Validate – after the instance is running, re‑run the CloudWatch queries for a few hours to confirm performance remains acceptable.
If the instance belongs to an Auto Scaling Group, you must update the launch template or launch configuration instead of modifying a single instance. Example for a launch template:
aws ec2 modify-launch-template \
--launch-template-id lt-0a1b2c3d4e5f6g7h \
--default-version $(aws ec2 create-launch-template-version \
--launch-template-id lt-0a1b2c3d4e5f6g7h \
--source-version 1 \
--instance-type t3.medium \
--query 'LaunchTemplateVersion.VersionNumber' \
--output text)
After the new version is set, trigger a rolling update or let the ASG replace instances automatically.
5. Lock in the savings with a plan
Right‑sizing alone reduces the hourly rate, but you can capture the new lower baseline with a Compute Savings Plan or a Reserved Instance (RI):
- Savings Plan – flexible across families, regions, and even OS. Create a plan that matches the new total vCPU commitment.
- Standard RI – best when you know the exact instance family and region will stay constant.
Use the CLI to purchase a Savings Plan for the new footprint:
aws savingsplans create-savings-plan \
--savings-plan-type Compute \
--commitment 0.05 \
--term 3yr \
--payment-option PartialUpfront \
--region us-east-1
Adjust the --commitment value to the projected hourly spend of the resized fleet.
6. Automate the loop
Manual checks are fine for a handful of instances, but production environments often have dozens. Set up a recurring Lambda that:
1. Calls aws compute-optimizer get-recommendations.
2. Filters for performanceRisk == "Low" and estimatedMonthlySavings > $5.
3. Opens a ticket in your change‑management system with the recommended target type.
4. Optionally tags the instance with RightSize:Pending so you can track progress.
A minimal Python Lambda example (pseudocode, not full error handling):
import boto3, json
opt = boto3.client('compute-optimizer')
ec2 = boto3.client('ec2')
resp = opt.get_recommendations(service='EC2')
for rec in resp['recommendations']:
cur = rec['instanceArn'].split('/')[-1]
opt_type = rec['recommendationOptions'][0]['instanceType']
if rec['recommendationOptions'][0]['performanceRisk'] == 'Low':
ec2.create_tags(Resources=[cur], Tags=[{'Key':'RightSize','Value':opt_type}])
Schedule the function with EventBridge (cron 0 2 * * ? *) to run nightly.
How CloudBudgetMaster helps – Our platform continuously scans EC2 fleets, surfaces the exact dollar impact of each over‑provisioned instance, and offers a one‑click resize that respects your existing Savings Plans, so you can lock in savings without leaving the dashboard.
CloudBudgetMaster