EBS cost optimization: volumes, IOPS/throughput, and snapshots
Optimization starts only after you know whether unused volume GB, over-provisioned gp3 performance, snapshot retention growth, or orphaned disks are the real EBS cost driver; otherwise teams cut the wrong thing.
This page is for production intervention: volume cleanup, right-sizing, gp2 to gp3 migration, performance rollback, and snapshot lifecycle control.
EBS cost is usually not "mysterious": it is mostly GB-month, plus IOPS/throughput for some volume types, plus snapshots. The waste comes from unattached volumes, oversized volumes, and default performance settings that are higher than required.
EBS savings checklist
- Right-size: remove over-provisioned volumes.
- gp2 to gp3: lower cost for the same baseline IOPS.
- Snapshots: prune retention and clean unused volumes.
Step 0: identify your dominant driver
- Capacity: large volumes provisioned far above actual usage.
- Performance: provisioned IOPS/throughput set above what workloads use.
- Snapshots: long retention and frequent snapshots on large changing datasets.
If the bill boundary is still fuzzy, go back to EBS pricing before changing production settings.
High-leverage savings levers
- Delete unattached volumes: orphaned volumes accumulate after instance termination and migrations.
- Right-size GB: reduce volume size where safe (after validating used space and growth).
- Choose the right type: gp3 often provides better cost control than gp2 for many workloads.
- Right-size IOPS/throughput: set based on measured utilization, not defaults.
- Snapshot lifecycle: keep only what you need; avoid keeping daily snapshots forever.
Common cost traps
- Oversized root volumes (default AMI settings) across large fleets.
- Provisioned performance far above actual usage (especially for “just in case”).
- Snapshots without lifecycle policies, retained indefinitely.
- Staging/dev volumes with production-sized disks and retention policies.
Snapshot cost drivers (what actually increases snapshot GB)
- Change rate: snapshots store changed blocks over time; write-heavy workloads can grow snapshot usage.
- Retention: keeping daily snapshots for months usually dominates.
- Copies: copied snapshots across regions or accounts create additional stored GB.
If snapshots are a top line item, start by reviewing retention and copies before touching performance settings.
Right-sizing workflow (practical)
- List top volumes by GB-month cost and identify unattached volumes.
- For each class, measure used space, growth, and p95 IOPS/throughput.
- Decide: reduce size, change type (gp2 vs gp3), or reduce provisioned performance.
- Validate in canary, then roll across the fleet with monitoring and rollback.
Related: gp2 vs gp3 cost, gp3 IOPS and throughput sizing.
Use a simple measure-change-remeasure loop
- Measure the baseline for active volume GB, gp3 IOPS and throughput settings, retained snapshot GB, and orphaned disks.
- Change one production lever at a time so the next billing comparison is readable.
- Remeasure the same workload window and keep only the changes that improve spend without creating latency or restore risk.
Validation checklist
- For each volume class, measure used space and growth rate (busy month included).
- Measure IOPS and throughput utilization before changing performance settings.
- For gp2->gp3 changes, validate latency and throughput under representative load.
- After snapshot policy changes, validate restore requirements (RPO/RTO) are still met.
Sources
- EBS pricing: aws.amazon.com/ebs/pricing
- EBS volume types: docs.aws.amazon.com