Lambda concurrency and cold starts (cost pitfalls)

Concurrency and cold starts matter for both latency and cost. Cold starts often increase duration, which increases GB-seconds. Provisioned concurrency can reduce cold starts, but it adds a baseline “always-on” cost. This page explains how to think about the trade-off and what to measure.

Concurrency model inputs

  • Peak RPS: highest sustained traffic window.
  • Avg duration: ms per request drives concurrency.
  • Provisioned window: hours where cold starts hurt.

How concurrency relates to cost

  • On-demand: cost ≈ requests + GB-seconds. Concurrency is a result of traffic; it’s not a separate fee.
  • Provisioned concurrency: adds baseline cost because you pay to keep capacity warm.

Pricing checklist: Lambda pricing

Why cold starts can raise your bill

If a cold start adds extra initialization time, it increases the billed duration for that invocation. If cold starts happen often (spiky traffic, low steady usage), the “extra duration” can become a meaningful portion of monthly GB-seconds.

  • Spiky workloads: many cold starts per day/week.
  • Large bundles and heavy init: bigger cold start penalty.
  • Downstream dependency latency: cold starts often correlate with other slow paths.

A practical way to decide on provisioned concurrency

  1. Identify the latency-sensitive path (user-facing API vs background job).
  2. Measure how often cold starts happen and how much duration they add.
  3. Estimate the monthly baseline cost of provisioned concurrency for the hours you need it.
  4. Decide if the SLA/UX improvement is worth the baseline.

Tip: apply provisioned concurrency only during business hours for endpoints that need it; don’t blanket-enable.

Cost pitfalls to watch for

  • Enabling provisioned concurrency globally (adds baseline cost everywhere).
  • Keeping a tiny steady traffic pattern that triggers frequent cold starts and long-tail latency.
  • Retry storms during incidents multiplying invocations (and cold starts) in a short time window.
  • Large initialization work (loading big dependencies, scanning config) inflating duration.
  • Running functions in a VPC and then discovering NAT/egress costs and longer startup times.

What to measure (so you can validate a change)

  • Invocation count and duration distribution (p50/p95) before vs after.
  • Error rate and retries (spikes often explain spend jumps).
  • For provisioned concurrency: baseline hours enabled vs actual demand windows.
  • Log ingestion GB/day (cold start debugging often increases log volume).

Related guides and tools

Sources


Related guides


FAQ

Does concurrency itself cost money?
Not directly for on-demand Lambdas. Cost usually comes from invocations and GB-seconds. However, provisioned concurrency adds a baseline cost because you pay for pre-initialized capacity.
How do cold starts affect cost?
Cold starts often increase duration (more GB-seconds). If cold starts happen frequently, they can noticeably increase monthly compute cost and worsen latency.
When is provisioned concurrency worth it?
For latency-sensitive paths with frequent cold starts where the business value of lower tail latency justifies the baseline cost. It’s usually not worth it for spiky background jobs.

Last updated: 2026-02-07