Lambda concurrency and cold starts (cost pitfalls)
Concurrency and cold starts matter for both latency and cost. Cold starts often increase duration, which increases GB-seconds. Provisioned concurrency can reduce cold starts, but it adds a baseline “always-on” cost. This page explains how to think about the trade-off and what to measure.
Concurrency model inputs
- Peak RPS: highest sustained traffic window.
- Avg duration: ms per request drives concurrency.
- Provisioned window: hours where cold starts hurt.
How concurrency relates to cost
- On-demand: cost ≈ requests + GB-seconds. Concurrency is a result of traffic; it’s not a separate fee.
- Provisioned concurrency: adds baseline cost because you pay to keep capacity warm.
Pricing checklist: Lambda pricing
Why cold starts can raise your bill
If a cold start adds extra initialization time, it increases the billed duration for that invocation. If cold starts happen often (spiky traffic, low steady usage), the “extra duration” can become a meaningful portion of monthly GB-seconds.
- Spiky workloads: many cold starts per day/week.
- Large bundles and heavy init: bigger cold start penalty.
- Downstream dependency latency: cold starts often correlate with other slow paths.
A practical way to decide on provisioned concurrency
- Identify the latency-sensitive path (user-facing API vs background job).
- Measure how often cold starts happen and how much duration they add.
- Estimate the monthly baseline cost of provisioned concurrency for the hours you need it.
- Decide if the SLA/UX improvement is worth the baseline.
Tip: apply provisioned concurrency only during business hours for endpoints that need it; don’t blanket-enable.
Cost pitfalls to watch for
- Enabling provisioned concurrency globally (adds baseline cost everywhere).
- Keeping a tiny steady traffic pattern that triggers frequent cold starts and long-tail latency.
- Retry storms during incidents multiplying invocations (and cold starts) in a short time window.
- Large initialization work (loading big dependencies, scanning config) inflating duration.
- Running functions in a VPC and then discovering NAT/egress costs and longer startup times.
What to measure (so you can validate a change)
- Invocation count and duration distribution (p50/p95) before vs after.
- Error rate and retries (spikes often explain spend jumps).
- For provisioned concurrency: baseline hours enabled vs actual demand windows.
- Log ingestion GB/day (cold start debugging often increases log volume).
Related guides and tools
Sources
Related guides
AWS Lambda cost optimization (high-leverage fixes)
A practical Lambda cost optimization checklist: reduce GB-seconds (duration × memory), control retries, right-size concurrency, and avoid hidden logging and networking costs.
AWS Lambda pricing (what to include)
A practical checklist for estimating AWS Lambda-style costs: requests, duration × memory (GB-seconds), provisioned concurrency when used, logs, and common hidden line items.
Lambda vs Fargate cost: a practical comparison (unit economics)
Compare Lambda vs Fargate cost with unit economics: cost per 1M requests (Lambda) versus average running tasks (Fargate), plus the non-compute line items that often dominate (logs, load balancers, transfer).
Aurora Serverless v2 pricing: how to estimate ACUs and avoid surprise bills
A practical way to estimate Aurora Serverless v2 costs: ACU-hours, storage GB-month, backups/retention, and how to model peaks so your estimate survives real traffic.
AWS ECS Pricing & Cost Guide (EC2 vs Fargate drivers)
ECS cost model for compute, storage, and networking. Compare EC2 vs Fargate and identify real cost drivers.
AWS Fargate pricing (cost model + pricing calculator)
A practical Fargate pricing guide and calculator companion: what drives compute cost (vCPU-hours + GB-hours), how to estimate average running tasks, and the non-compute line items that usually matter (logs, load balancers, data transfer).
FAQ
Does concurrency itself cost money?
Not directly for on-demand Lambdas. Cost usually comes from invocations and GB-seconds. However, provisioned concurrency adds a baseline cost because you pay for pre-initialized capacity.
How do cold starts affect cost?
Cold starts often increase duration (more GB-seconds). If cold starts happen frequently, they can noticeably increase monthly compute cost and worsen latency.
When is provisioned concurrency worth it?
For latency-sensitive paths with frequent cold starts where the business value of lower tail latency justifies the baseline cost. It’s usually not worth it for spiky background jobs.
Last updated: 2026-02-07