ECS cost model beyond compute: the checklist that prevents surprise bills
Start with a calculator if you need a first-pass estimate, then use this guide to validate the assumptions and catch the billing traps.
This page is the support total-cost checklist page for ECS: baseline compute should already be modeled, and the goal here is to catch the load balancer, logging, networking, storage, and registry lines that budgets often miss.
Use this page after the compute baseline is credible, not as a replacement for the ECS bill-boundary page or the production intervention page.
ECS cost surprises usually happen when teams budget only for compute (EC2 instance-hours or Fargate vCPU/GB-hours). In real services, the big costs often come from load balancers, logs, and networking. Use this checklist to build a more accurate total cost model.
1) Compute (baseline)
- ECS on Fargate: vCPU-hours + memory GB-hours (running tasks).
- ECS on EC2: instance-hours (plus EBS if you attach volumes).
- Idle capacity: the difference between peak and average matters more than list price.
2) Load balancers (baseline + capacity)
- Hourly baseline per ALB/NLB.
- Capacity units driven by connections, throughput, and rule evaluations.
- Multiple quiet services can still create a large baseline if each has its own load balancer.
3) Logs and metrics (often bigger than compute)
- Log ingestion is proportional to request volume and verbosity.
- Retention creates a steady GB-month storage baseline.
- Query scan costs spike during incidents and heavy dashboards.
- Metrics costs grow with custom metric cardinality and dashboard polling.
4) Networking (NAT, egress, cross-AZ)
- NAT gateway: hourly + GB processed can spike for fleets pulling images and package updates.
- Internet egress: downloads, external APIs, public traffic.
- Cross-AZ transfer: chatty microservices and uneven target placement.
5) Storage (EBS and snapshots)
- If you run ECS on EC2 and attach volumes, include EBS GB-month and performance settings.
- Model snapshots separately: change rate × retention is the key driver.
6) Container registry behavior (ECR is not just storage)
- Storage grows with retention and CI push frequency.
- Pull traffic spikes during cluster scaling and cold starts.
- Cross-region pulls and NAT paths can change transfer costs materially.
7) Put it together (sum line items)
- Pick compute model (Fargate vs EC2) and estimate average monthly capacity.
- Add load balancer baseline and estimate capacity units.
- Add logs (ingestion + retention) and validate verbosity assumptions.
- Add networking (NAT/egress/cross-AZ) for the real traffic paths.
- Add storage (EBS + snapshots) and registry behavior (ECR storage + pulls) where relevant.
If you still need to establish the core compute boundary, return to ECS pricing. If you already know the waste is coming from scaling behavior, go to ECS autoscaling cost pitfalls.
Validation checklist
- Validate average (not peak) running tasks/instances for a representative week.
- Validate log ingestion GB/day and retention settings for the service.
- Validate NAT processed GB and egress GB for the service traffic paths.
- Validate how many load balancers exist and whether they are “shared” or “one per service”.
Sources
- ECS pricing: aws.amazon.com/ecs/pricing
- Fargate pricing: aws.amazon.com/fargate/pricing