ECS cost model beyond compute: the checklist that prevents surprise bills

ECS cost surprises usually happen when teams budget only for compute (EC2 instance-hours or Fargate vCPU/GB-hours). In real services, the big costs often come from load balancers, logs, and networking. Use this checklist to build a more accurate total cost model.

1) Compute (baseline)

  • ECS on Fargate: vCPU-hours + memory GB-hours (running tasks).
  • ECS on EC2: instance-hours (plus EBS if you attach volumes).
  • Idle capacity: the difference between peak and average matters more than list price.

2) Load balancers (baseline + capacity)

  • Hourly baseline per ALB/NLB.
  • Capacity units driven by connections, throughput, and rule evaluations.
  • Multiple quiet services can still create a large baseline if each has its own load balancer.

3) Logs and metrics (often bigger than compute)

  • Log ingestion is proportional to request volume and verbosity.
  • Retention creates a steady GB-month storage baseline.
  • Query scan costs spike during incidents and heavy dashboards.
  • Metrics costs grow with custom metric cardinality and dashboard polling.

4) Networking (NAT, egress, cross-AZ)

  • NAT gateway: hourly + GB processed can spike for fleets pulling images and package updates.
  • Internet egress: downloads, external APIs, public traffic.
  • Cross-AZ transfer: chatty microservices and uneven target placement.

5) Storage (EBS and snapshots)

  • If you run ECS on EC2 and attach volumes, include EBS GB-month and performance settings.
  • Model snapshots separately: change rate × retention is the key driver.

6) Container registry behavior (ECR is not just storage)

  • Storage grows with retention and CI push frequency.
  • Pull traffic spikes during cluster scaling and cold starts.
  • Cross-region pulls and NAT paths can change transfer costs materially.

7) Put it together (sum line items)

  1. Pick compute model (Fargate vs EC2) and estimate average monthly capacity.
  2. Add load balancer baseline and estimate capacity units.
  3. Add logs (ingestion + retention) and validate verbosity assumptions.
  4. Add networking (NAT/egress/cross-AZ) for the real traffic paths.
  5. Add storage (EBS + snapshots) and registry behavior (ECR storage + pulls) where relevant.

Validation checklist

  • Validate average (not peak) running tasks/instances for a representative week.
  • Validate log ingestion GB/day and retention settings for the service.
  • Validate NAT processed GB and egress GB for the service traffic paths.
  • Validate how many load balancers exist and whether they are “shared” or “one per service”.

Sources


Related guides


Related calculators


FAQ

Why is compute-only a bad ECS estimate?
Because ECS bills often include large non-compute line items: ALB/NLB baseline and capacity units, log ingestion and retention, NAT gateway processed traffic, and internet egress.
What line item is most commonly underestimated?
Logs and networking. Verbose services can generate large ingestion and retention costs, and container fleets can generate substantial NAT processed traffic (image pulls, updates, external APIs).

Last updated: 2026-01-27