Load balancer LCU/NLCU explained (for cost estimates)
Many load balancers charge (1) a fixed hourly fee plus (2) a usage fee billed in capacity unit-hours. For budgeting, you don’t need perfect precision—you need a defendable average units/hour and a peak scenario so incident hours don’t blow up the plan.
What “capacity unit-hours” means
Think of LCU/NLCU as a normalized “how busy was this load balancer this hour?” score. The unit is typically derived from multiple dimensions, and the billed unit-hours often follow the maximum of those dimensions.
- Connections: new connections and/or active connections
- Bytes processed: traffic volume through the LB
- Request processing: rules/routing work (depends on product and configuration)
Why the same RPS can produce very different unit-hours
- Payload size: 1kB responses vs 1MB downloads are not comparable.
- Connection churn: short timeouts and frequent reconnects inflate new connections.
- Long-lived connections: streaming/WebSockets increase active connections.
- Incidents: retries can multiply requests and connections without increasing “real” business volume.
A practical mental model for optimization
- If you have many LBs, LB-hours dominate: reduce load balancer count.
- If you have a few hot LBs, unit-hours dominate: reduce bytes processed and connection churn.
- If you have spikes, the “peak scenario” dominates: fix retries and bot traffic.
Optimization playbook: load balancer cost optimization
How to estimate units/hour (without getting lost)
- Pick a representative week and a peak definition (p95 hour or incident hour).
- Collect driver metrics: new connections/sec, active connections, bytes processed GB/hour.
- Use a calculator to convert driver metrics to units/hour.
- Price units/hour + fixed LB hourly fee, then validate after a week.
Common pitfalls
- Budgeting from peak unit-hours for the whole month (hides the real average).
- Ignoring payload size and estimating from requests only.
- Missing retry storms and noisy clients as the “hidden multiplier”.
- Mixing units (GB vs GiB, bits vs bytes) and silently breaking the estimate.
- Not re-checking after architecture changes (CDN, compression, routing).
Quick diagnostic: which driver is dominating?
When you look at a week of metrics, one driver is usually “the max” most of the time. You can often predict the culprit from traffic shape:
- Bytes dominate: large responses, downloads, missing compression, no CDN offload.
- New connections dominate: short timeouts, clients reconnecting, lack of keep-alive.
- Active connections dominate: streaming, long polling, WebSockets.
- Rules dominate: complex routing rules that evaluate frequently.
Once you know the dominant driver, optimization becomes targeted instead of guesswork.