Fargate cost optimization (high-leverage fixes)
Start with a calculator if you need a first-pass estimate, then use this guide to validate the assumptions and catch the billing traps.
Optimization starts only after you know whether idle task-hours, oversized task definitions, logging overhead, or networking drag are the real Fargate cost driver; otherwise teams tune the wrong lever.
This page is for production intervention: task right-sizing, idle reduction, scaling cleanup, log control, and networking-cost containment.
Fargate cost optimization is usually about two things: reduce idle (average running tasks) and avoid hidden line items (logs and networking). Use this checklist to find the biggest levers first, then validate savings in billing after you ship.
First: understand what you’re paying for
- Compute: vCPU-hours + memory GB-hours for running tasks.
- Infrastructure around it: load balancers, logs, NAT/egress, and data transfer.
Tool: Fargate cost calculator
If the bill boundary is still fuzzy, go back to Fargate pricing before changing production settings.
1) Reduce average running tasks (the biggest lever for many teams)
- Scale to real demand: keep min capacity honest; if traffic is low overnight, don’t run peak baseline.
- Schedule non-prod: dev/test often doesn’t need 730 hours/month.
- Batch and queue: for bursty workloads, process in batches so you run fewer tasks for fewer hours.
A simple check: if peak tasks is 50 but average is 5, you have room to reduce idle. Measure “average running tasks” before you tweak vCPU.
2) Rightsize vCPU and memory (use p50/p95, not peak-only)
- Measure steady CPU/memory usage and reduce oversizing.
- Keep headroom for deploys and incident windows; avoid sizing to a single perfect number.
- Prefer multiple smaller tasks when it improves utilization (within latency and connection constraints).
Related: task sizing workflow
3) Fix autoscaling (scale on real signals, not noisy CPU percent)
- Use request rate, queue depth, or latency as primary signals for services that are not CPU-bound.
- Avoid scaling loops: deploy storms, retry storms, and aggressive cooldown settings can inflate tasks for hours.
- Confirm that your autoscaling target produces the average task count you planned (cost is driven by average).
4) Use pricing levers: Spot and commitment discounts
- Fargate Spot: good for fault-tolerant, retryable workloads (queues, batch, workers).
- Savings Plans: useful when you have a predictable baseline and want a lower effective rate.
Don’t apply a commitment until you’ve reduced idle; otherwise you lock in waste at a discount.
5) Reduce logging cost (often a larger win than micro-optimizing CPU)
- Drop noisy debug logs in production; keep structured “what changed” logs.
- Sample high-volume access logs where acceptable.
- Set retention intentionally; delete what you’ll never query.
6) Reduce networking surprises (NAT, cross-AZ, and egress)
- Avoid routing AWS-service traffic through NAT when private endpoints or private networking is available.
- Watch cross-AZ chatter for chatty microservices and service discovery patterns (it can become a steady baseline).
- Model internet egress explicitly for public APIs and downloads; don’t assume it’s “small”.
Common pitfalls
- Optimizing vCPU before fixing average running tasks and autoscaling behavior.
- Sizing from peak-only and then paying for idle headroom 730 hours/month.
- Ignoring load balancer count and log ingestion volume (they often dominate).
- Running non-prod always-on without schedules.
- Not validating savings in billing (you can “optimize” performance and still not reduce spend).
How to validate savings
- Compare vCPU-hours and GB-hours before/after (compute usage, not only total bill).
- Check average running tasks: if it didn’t drop, expect limited savings.
- Verify log ingestion GB/day and retention costs didn’t grow after changes.
- Re-check NAT processed GB and cross-AZ transfer after routing changes.
Use a simple measure-change-remeasure loop
- Measure the baseline for average running tasks, task size, log ingestion, and networking lines.
- Change one production lever at a time so the next billing comparison is readable.
- Remeasure the same workload window and keep only the changes that reduce spend without hurting scaling or latency behavior.