AWS Lambda cost optimization (high-leverage fixes)
Start with a calculator if you need a first-pass estimate, then use this guide to validate the assumptions and catch the billing traps.
Optimization starts only after you know whether GB-seconds, memory sizing, retry storms, logging overhead, or provisioned concurrency baseline cost are the real Lambda cost driver; otherwise teams tune the wrong path.
This page is for production intervention: duration reduction, memory right-sizing, retry cleanup, logging control, and selective concurrency tuning.
Lambda cost optimization is mostly about reducing GB-seconds (duration × memory) and preventing accidental multipliers (retries, verbose logs, and expensive networking paths). Use this checklist only after the billing model and measurement story are credible.
High-leverage Lambda knobs
- Memory vs duration: right-size to reduce total GB-seconds.
- Provisioned concurrency: use only for latency-critical paths.
- Request volume: reduce retries and noisy clients first.
Before you touch production knobs
- Confirm the bill boundary: keep Lambda charges separate from logging, transfer, and downstream service cost.
- Name the dominant driver: duration, memory, retries, logging, or baseline warm capacity should each lead to different fixes.
- Freeze a baseline window: compare against a representative period so you can tell whether the intervention helped.
1) Reduce duration (fix the slow parts first)
- Remove cold-start bloat: smaller bundles, fewer dependencies, faster initialization.
- Reduce downstream latency: cache hot reads and avoid chatty per-request calls.
- Batch work where possible (especially for stream/queue processing).
2) Right-size memory by testing (duration vs memory curve)
Memory affects both cost and performance. Try a few memory sizes and compare total cost per 1M requests (or per job run), not just duration.
- Measure p50 and p95 duration at each memory size.
- Pick the best “cost per unit of work” point, not the smallest memory number.
3) Eliminate retries and wasted invocations (the common spike driver)
- Set timeouts and retry policies intentionally; don’t let defaults amplify incidents.
- Use idempotency where retries are unavoidable.
- Watch for upstream retry storms: one incident can multiply invocations and duration.
4) Be deliberate about concurrency and cold starts
- Cold starts often increase duration; frequent cold starts can raise GB-seconds.
- Provisioned concurrency can reduce cold starts but adds baseline cost.
- Separate “SLA paths” from background jobs; they need different concurrency choices.
Guide: concurrency and cold starts
5) Reduce logging and networking bills
- Logs: reduce verbosity, sample high-volume logs, and set retention deliberately.
- Networking: avoid routing through NAT by accident; model egress and transfer explicitly.
6) Validate savings (don’t guess)
- Compare billed GB-seconds and request count before/after changes.
- Check duration distribution (p50/p95) and error/retry rate; spikes usually correlate with these.
- Verify log ingestion GB/day and retention dropped if that was a target.
Use a simple measure-change-remeasure loop
- Measure the baseline for requests, GB-seconds, retries, log volume, and any provisioned concurrency window.
- Change one production lever at a time so the next billing comparison is readable.
- Remeasure the same window and keep only the interventions that improve both spend clarity and operational behavior.
Common pitfalls
- Reducing memory without measuring duration (can increase GB-seconds).
- Optimizing code but leaving retries/timeouts to multiply invocations.
- Using provisioned concurrency broadly instead of only for latency-sensitive paths.
- Ignoring logs and transfer until they become top line items.
- Not re-checking after traffic growth (optimization needs a feedback loop).