AWS Lambda pricing (what to include)
Start with a calculator if you need a first-pass estimate, then use this guide to validate the assumptions and catch the billing traps.
This is the AWS Lambda bill-boundary page. Use it when you need to decide what belongs inside the Lambda bill before you debate cold starts, retries, or runtime tuning.
This guide is about bill boundaries: request charges, GB-seconds, provisioned concurrency baseline cost, and the adjacent logging, networking, and downstream service costs that should be tracked beside Lambda rather than blended into it.
Go back to the serverless parent guide if the broader system model is still unclear and you still need to map event amplification, observability, downstream services, or transfer before pricing Lambda itself.
Start here: what belongs inside the Lambda bill
- Inside the Lambda bill: requests, billed duration, configured memory, and provisioned concurrency when enabled.
- Beside the Lambda bill: log ingestion and retention, NAT or egress transfer, queue traffic, database reads, and storage activity.
- Why the split matters: teams make worse budget decisions when they hide CloudWatch, VPC transfer, or downstream database spend inside a single "Lambda" bucket.
1) Start with the two core line items
- Requests: billed per invocation (often priced per 1M requests).
- Compute: billed as GB-seconds (duration × configured memory).
Tool: AWS Lambda cost calculator
2) Model duration as a range (not a single number)
Duration usually has a long tail. For planning, keep at least two scenarios:
- Typical: a representative median or steady period.
- Busy/incident: p95-ish or “known heavy window” where retries, cold starts, or downstream latency increases duration.
3) Memory settings directly change compute cost
Lambda compute cost scales with configured memory, even if your function rarely uses it. Over-allocating memory increases GB-seconds linearly. Under-allocating can increase duration and still raise cost.
- Right-size memory using measured duration vs memory experiments.
- Watch for “downstream bound” functions: adding memory might not reduce duration if the bottleneck is a database/API.
4) Add provisioned concurrency if you use it
Provisioned concurrency can improve cold-start behavior, but it adds a baseline cost. Treat it like “always-on capacity” and model it separately from on-demand invocations.
If you need to model cold-start frequency, warm capacity windows, or duration inflation, continue with the concurrency and cold starts guide.
5) Common Lambda-adjacent costs
- Logging: ingestion + retention; large JSON logs add up quickly.
- Networking: NAT/egress and cross-AZ transfer patterns depending on architecture.
- Downstream services: databases, queues, storage; they often dominate total spend in real systems.
6) Sanity-check with unit economics
Convert your estimate into “cost per 1M requests” for a typical duration/memory combination. It makes regressions obvious and helps compare to always-on compute.
Once the bill boundary is clear, use the Lambda cost optimization guide for production changes, or compare deployment shapes with Lambda vs Fargate cost.
Common pitfalls
- Budgeting from a single duration number and ignoring long-tail behavior.
- Over-allocating memory “just in case” without measuring the duration trade-off.
- Retry storms inflating invocation count and total duration during incidents.
- Ignoring log volume and retention until it becomes a top driver.
- Forgetting data transfer and NAT patterns when functions run in a VPC.
How to validate after you deploy
- Confirm billed request count and billed duration match your model for a representative week.
- Check p50/p95 duration and error/retry rates; duration inflation often explains spend spikes.
- Measure log ingestion GB/day and confirm retention settings match your policy.