Load balancer cost optimization (high-leverage fixes)
Start with a calculator if you need a first-pass estimate, then use this guide to validate the assumptions and catch the billing traps.
Optimization starts only after you know whether load balancer-hours, unit-hours, bytes processed, connection churn, or retry-driven spikes are the real cost driver; otherwise teams consolidate, compress, or tune the wrong path. This page is for production intervention: load balancer consolidation, connection-pattern cleanup, byte reduction, rule simplification, and retry control.
Do not optimize yet if the model is still weak
- If you do not know whether LB-hours or unit-hours dominate, go back to the pricing page.
- If you do not know which usage dimension is dominant, go back to the estimate page.
- If you only need to understand how LCU or NLCU works, use the explainer page.
Fast optimization checks
- Consolidate: reduce one-LB-per-service patterns.
- Cross-zone: disable when safe to reduce cross-AZ transfer.
- Idle LBs: remove unused listeners and test environments.
Step 1: reduce the number of load balancers (LB-hours)
- Consolidate services behind shared ingress where feasible (especially in Kubernetes).
- Delete abandoned LBs from migrations and experiments (they often remain “quietly expensive”).
- Standardize patterns: “one public LB per environment” is usually cheaper than “one per microservice”.
Start by listing all LBs and tagging ownership; cost reduction is often “delete what nobody owns”.
Step 2: reduce LCU/NLCU drivers (requests, connections, bytes)
- Reduce connection churn: keep-alive and fewer short timeouts reduce new connections.
- Reduce bytes processed: compress payloads, avoid routing large downloads through the LB, offload to CDN/object storage.
- Reduce request amplification: cache hot responses and avoid “polling every second” patterns.
- Simplify routing rules: avoid unnecessary rule complexity that adds evaluation overhead.
If you can’t tell which driver dominates, run the LCU estimator from metrics and look at which dimension is highest: estimate LCU/NLCU.
Step 3: remove incident multipliers (the most common “spike” root cause)
- Fix retry storms: set sane timeouts, jittered backoff, and circuit breakers for downstream outages.
- Rate-limit abusive clients and bot traffic (a small amount of unwanted traffic can dominate LCU).
- Watch for deploy storms: rolling deploys can temporarily multiply connections and error retries.
Step 4: quantify savings before changing architecture
Optimization gets easier when you model the before/after in the same terms:
- LB-hours saved = LBs removed × 730 hours/month (or your scheduled hours)
- Usage saved = (avg LCU/NLCU before − after) × hours/month
Validation plan (what to measure for a week)
- LB count and hours (did LB-hours actually drop?)
- LCU/NLCU drivers (connections, bytes, rule evals) for avg and p95
- Incident windows: did retries and errors drop after fixes?
- Related side effects: cross-AZ transfer and NAT/egress if routing changed
The safest loop is measure, change one lever, re-measure, then confirm the bill moved where you expected instead of simply shifting cost into another network surface.
Related cost domains: NAT gateway costs and VPC data transfer.