Strategy

Detecting Zombie Infrastructure: Find Forgotten Cloud Resources

June 25, 2026·5 min read·CloudBudgetMaster

What is a "zombie" resource?

A zombie (or orphan) resource is any cloud asset that is still running, allocated, or reserved but provides no business value. It often lives in a project or account that no longer has an owner, and because it is billable, it silently inflates the monthly spend.

Typical signs: - No recent CloudWatch/Stackdriver metrics. - No tags or tags that belong to a decommissioned team. - Stale creation dates (6+ months old). - No attached workloads (e.g., a load balancer with zero targets).

Detecting these items requires systematic inventory, filtering by activity, and then validation with the responsible team.

1. Scan AWS for common zombies

a) EC2 instances without traffic

aws ec2 describe-instances \
  --filters Name=instance-state-name,Values=running \
  --query 'Reservations[].Instances[?NetworkInterfaces[0].Attachment.Status==`attached`].{ID:InstanceId,Launch:LaunchTime,AZ:Placement.AvailabilityZone}' \
  --output table

Cross‑reference the output with VPC Flow Logs (or CloudWatch metric NetworkIn). Any instance with NetworkIn < 1 KB for the last 30 days is a candidate.

b) EBS volumes not attached for >30 days

aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[?CreateTime<`$(date -d "30 days ago" -Iseconds)`].{ID:VolumeId,Size:Size,AZ:AvailabilityZone,Created:CreateTime}' \
  --output table

If the volume size is >10 GiB, consider snapshotting and deleting.

c) Elastic IPs (EIPs) that are allocated but not associated

aws ec2 describe-addresses \
  --query 'Addresses[?AssociationId==null].{PublicIP:PublicIp,AllocationId:AllocationId,Created:AllocationId}' \
  --output table

Each unattached EIP costs $0.005 / hour. Multiply by 720 hours to see the monthly impact.

d) NAT Gateways with zero bytes processed

aws cloudwatch get-metric-statistics \
  --namespace AWS/NATGateway \
  --metric-name BytesProcessed \
  --statistics Sum \
  --period 86400 \
  --start-time $(date -d "30 days ago" -u +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --dimensions Name=NatGatewayId,Value=<gw-id>

If the sum is zero for the whole period, the NAT Gateway is idle and can be deleted.

2. Scan GCP for orphaned assets

a) Compute Engine VMs with no CPU usage

gcloud compute instances list --format="json" | jq '.[] | select(.status=="RUNNING") | {name, zone, creationTimestamp}' > running.json

gcloud monitoring time-series list \
  --filter='metric.type="compute.googleapis.com/instance/cpu/utilization" AND metric.labels.instance_name=~".*"' \
  --format=json > cpu.json

Cross‑join the two JSON files; any instance with average cpu/utilization < 0.01 over the last 30 days is a zombie.

b) Persistent disks not attached

gcloud compute disks list --filter='users:[]' --format='table(name,zone,sizeGb,creationTimestamp)'

If the disk is >100 GB, snapshot it before deletion.

c) Static external IPs that are not in use

gcloud compute addresses list --filter='status=RESERVED' --format='table(name,region,address,creationTimestamp)'

Each reserved IP costs $0.004 / hour. Identify those older than 90 days.

3. Scan Azure for forgotten resources

a) Virtual machines with low network I/O

Get-AzVM -Status | Where-Object {$_.PowerState -eq "VM running"} | ForEach-Object {
  $metrics = Get-AzMetric -ResourceId $_.Id -MetricName "Network In Total" -TimeGrain "PT1H" -StartTime (Get-Date).AddDays(-30) -EndTime (Get-Date)
  $avg = ($metrics.Data | Measure-Object -Property Average -Average).Average
  if ($avg -lt 1KB) { $_ }
}

b) Unattached managed disks

Get-AzDisk | Where-Object {$_.ManagedBy -eq $null} | Select-Object Name, DiskSizeGB, CreationData

c) Public IPs not associated with a NIC or Load Balancer

Get-AzPublicIpAddress | Where-Object {$_.IpConfiguration -eq $null} | Select-Object Name, IpAddress, Location, AllocationMethod

Each idle Standard Public IP costs $0.005 / hour.

4. Validate before deletion

Tag check – Ensure the resource has a tag like owner or cost-center. If missing, ping the Slack channel #cloud-costs with the resource ID and ask for ownership.
Snapshot/backup – For storage (EBS, Persistent Disk, Managed Disk), create a snapshot: - AWS: aws ec2 create-snapshot --volume-id vol-12345678 --description "pre‑cleanup snapshot" - GCP: gcloud compute disks snapshot my-disk --snapshot-names my-disk-snap - Azure: az snapshot create --resource-group rg-prod --source /subscriptions/.../myDisk --name myDiskSnap
Dry‑run delete – Most CLIs support a --dry-run flag (AWS) or you can list the IDs and manually confirm.
Document – Record the action in a shared spreadsheet or in your IaC repo (e.g., add a comment to the Terraform state file).

5. Automate the hunt

Scheduled Lambda / Cloud Function – Run the AWS snippets daily, push results to an S3 bucket, and send a Slack webhook if any candidate exceeds a cost threshold.
Terraform import guard – Add a lifecycle { prevent_destroy = true } to critical resources, then use terraform state rm only after the zombie check passes.
Policy as code – Use AWS Config rule ec2-instance-no-public-ip combined with a custom rule that flags instances with NetworkIn < 1 KB for 30 days.
Cross‑cloud dashboard – Export all findings to a CSV, ingest into a Grafana panel, and set alerts on “zombie count > 0”.

6. Ongoing governance

Tag enforcement – Require every new resource to have owner, environment, and ttl tags via IAM policies or Service Catalog.
Quarterly review – Run the detection scripts before each fiscal quarter and retire any lingering zombies.
Cost allocation reports – Use AWS Cost Explorer, GCP Billing Export, and Azure Cost Management to verify that the monthly spend for the identified resource types drops after cleanup.
Education – Add a short “Zombie Awareness” slide to onboarding for engineers and product managers.

Even with disciplined processes, manual checks slip. CloudBudgetMaster continuously scans AWS, GCP, and Azure accounts, flags zombie resources in real time, and shows the exact dollar impact per item, letting you remediate with a single click.