Find and Right‑Size Idle Amazon RDS Databases Safely
1. Pull a Complete Inventory of Your RDS Instances
Start with a single AWS CLI call that returns every DB instance in the account:
aws rds describe-db-instances \
--query 'DBInstances[*].[DBInstanceIdentifier,Engine,DBInstanceClass,MultiAZ,TagList]' \
--output table
The output gives you the identifier, engine type, current instance class, Multi‑AZ flag, and any tags. Export this to CSV for later cross‑reference:
aws rds describe-db-instances --output json > rds-inventory.json
jq -r '.DBInstances[] | [.DBInstanceIdentifier, .Engine, .DBInstanceClass, .MultiAZ] | @csv' rds-inventory.json > rds-inventory.csv
Having a static list is essential before you start mutating anything.
2. Identify Low‑Utilization Candidates
RDS provides three primary CloudWatch metrics that reveal idle behavior:
- CPUUtilization – average % of CPU used.
- DatabaseConnections – number of active client connections.
- ReadIOPS / WriteIOPS – input‑output operations per second.
Run a 30‑day window query for each instance (replace <db-id>):
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name CPUUtilization \
--dimensions Name=DBInstanceIdentifier,Value=<db-id> \
--statistics Average \
--period 86400 \
--start-time $(date -d '-30 days' -u +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--output json > ${db-id}-cpu.json
Repeat for DatabaseConnections, ReadIOPS, and WriteIOPS. A practical rule of thumb:
- CPUUtilization < 5 % average
- DatabaseConnections < 2 for the whole period
- Read/WriteIOPS < 10 total per day
If all three thresholds are met, the instance is a strong idle candidate.
3. Validate Business Context
Metrics alone can be misleading. Before you downsize:
- Check tags – Look for
environment=prodorcritical=true. Production workloads often tolerate lower utilization during off‑peak hours. - Review backup windows – Ensure the instance isn’t the only source for a nightly backup that other services depend on.
- Confirm no scheduled jobs – Search CloudWatch Events, Step Functions, or cron‑like Lambda triggers that might spin up connections briefly.
- Consult owners – A quick Slack or ticket comment can surface upcoming load spikes (e.g., a marketing campaign).
Only proceed when you have documented confirmation that the workload is truly idle.
4. Choose the Right‑Sizing Path
There are three safe options:
- Scale Down the Instance Class – Move from
db.m5.largetodb.m5.mediumordb.t3.mediumif the engine supports burstable instances. - Convert to a Burstable Class – For workloads that occasionally need CPU,
db.t3ordb.t4gcan provide cost savings while still handling spikes. - Terminate and Replace with a Snapshot – If the database is a static test or development copy, snapshot it and delete the instance.
Example: Downsize via CLI
aws rds modify-db-instance \
--db-instance-identifier my-idle-db \
--db-instance-class db.t3.medium \
--apply-immediately
--apply-immediately forces the change now; otherwise the modification waits for the next maintenance window. Always enable Multi‑AZ if you need high availability; the command adds --multi-az.
Example: Create a Snapshot Before Deletion
aws rds create-db-snapshot \
--db-instance-identifier my-idle-db \
--db-snapshot-identifier my-idle-db-predelete-$(date +%Y%m%d)
aws rds delete-db-instance \
--db-instance-identifier my-idle-db \
--skip-final-snapshot
Keep the snapshot for at least 30 days; you can restore it later if needed.
5. Test the Change in a Staging Account
Never right‑size directly in production without a safety net:
- Clone the instance using
aws rds restore-db-instance-from-db-snapshotinto a staging account. - Apply the same downsize and run a synthetic workload (e.g.,
sysbenchfor MySQL orpgbenchfor PostgreSQL) for 24 hours. - Monitor CloudWatch for any metric spikes. If CPU or IOPS breach 80 % for more than 5 minutes, reconsider the target class.
Document the test results and obtain sign‑off before applying the change to the production instance.
6. Automate Ongoing Monitoring
After right‑sizing, set up a low‑threshold alarm to catch future regressions:
aws cloudwatch put-metric-alarm \
--alarm-name "RDS-HighCPU-${db-id}" \
--metric-name CPUUtilization \
--namespace AWS/RDS \
--dimensions Name=DBInstanceIdentifier,Value=${db-id} \
--threshold 70 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 3 \
--period 300 \
--statistic Average \
--alarm-actions arn:aws:sns:us-east-1:123456789012:OpsAlerts
If the alarm fires, you have a clear signal that the instance may need to be upsized again.
How CloudBudgetMaster helps – Our platform continuously scans your RDS fleet, flags instances that meet the idle criteria above, calculates the exact monthly dollar impact of each right‑size or termination, and surfaces a one‑click remediation plan, so you can act fast without manual metric hunting.
CloudBudgetMaster