ECS task sizing: how to pick CPU and memory (and estimate task count)
ECS task sizing is a balancing act: too small and you thrash (restarts, timeouts, throttling), too large and you pay for idle capacity. The best approach is to size from measured averages, keep headroom with a deliberate utilization target, and validate the result with a busy-week scenario.
Step 1: measure demand (use a representative window)
- CPU: average and p95 usage for the service (not only peak).
- Memory: average and p95 usage (include deploy/cold-cache windows).
- Traffic shape: steady, bursty, or time-of-day (helps estimate average tasks).
- Error/retry signals: timeouts and retries can multiply load and invalidate "normal week" sizing.
Step 2: pick per-task vCPU and memory
- Choose the smallest task size that meets stability and latency targets at steady load.
- Prefer scaling out (more tasks) over a single huge task when it improves utilization and reduces tail latency.
- Keep memory headroom for spikes, GC, caching, and temporary buffers. Memory is often the real limiter.
A common failure mode is to pick "round numbers" (1 vCPU / 2 GB) and never revisit. Treat task size as a model input that evolves with the service.
Step 3: pick a utilization target
Utilization target is the "planning headroom" you keep so scaling and deploys work without timeouts.
- Lower target (more headroom): more stable, higher cost.
- Higher target (less headroom): cheaper, but more sensitive to bursts and slow dependencies.
- For spiky services, separate baseline and burst capacity instead of one target.
Step 4: estimate average task count (the billing driver)
The bill tracks average running tasks over time, not the peak moment. A practical sizing model is:
tasks ~= max(
cpu_demand / (task_vcpu * target_utilization),
mem_demand / (task_mem_gb * target_utilization)
) Worked example (order-of-magnitude)
- Average CPU demand: 6 vCPU
- Average memory demand: 18 GB
- Task size: 1 vCPU / 3 GB
- Target utilization: 0.7
- CPU-based tasks: 6 / (1 * 0.7) ~= 9
- Memory-based tasks: 18 / (3 * 0.7) ~= 9
If you size tasks larger (2 vCPU / 6 GB) but your service is not actually CPU-bound, you can reduce task count but also reduce packing flexibility and increase idle within each task. Validate with real utilization.
Convert task sizing to cost (Fargate vs EC2)
- ECS on Fargate: vCPU-hours + memory GB-hours for running tasks.
- ECS on EC2: instance-hours (plus EBS and snapshots if you attach volumes).
Cost pitfalls that look like "bad sizing"
- Noisy scaling: CPU% spikes trigger oscillation and keep average tasks high.
- Retry storms: timeouts multiply requests, tasks, logs, and transfer.
- Logs: ingestion + retention can exceed compute for high-traffic or verbose services.
- NAT/egress: image pulls and external calls can create large variable network costs.
Full model: ECS cost model beyond compute.
Validation checklist
- Validate average and p95 CPU/memory over at least 7 days (include a busy day).
- Validate task count over time (average vs peak) and compare to the sizing model.
- Validate latency and error rate during deploys and scaling events.
- Validate non-compute: log ingestion GB/day and NAT processed GB during busy windows.
Sources
- ECS task definitions: docs.aws.amazon.com
- ECS service autoscaling: docs.aws.amazon.com