Kubernetes requests & limits: practical sizing (and cost impact)
Scheduling is based on requests. That is why most capacity planning starts with requests and then uses limits to reason about burst and risk. If you mix them up, you usually end up with oversized clusters (or unpredictable performance).
1) Requests drive node count
A simple approach: total requests = pods x per-pod request. Then divide by allocatable per node. Our calculator does that and uses the larger of CPU-based and memory-based counts.
Tool: Kubernetes Requests & Limits Calculator
2) Leave allocatable headroom
Nodes are not 100% allocatable. System overhead, daemonsets, and kubelet reservations reduce usable capacity. Planning with 85-95% allocatable is common depending on your environment.
- If you run many daemonsets or have strict headroom targets, use a lower allocatable %.
- If you have a stable workload and validated overhead, you can increase allocatable %.
3) CPU vs memory: why one of them usually dominates
A cluster can be CPU-bound or memory-bound. The safe workflow is to calculate node count from CPU requests and from memory requests, then take the larger number.
- CPU-heavy services: watch throttling and p95 latency when requests are too low.
- Memory-heavy services: watch OOM kills when limits are too tight (and watch wasted memory when requests are too high).
4) Limits matter for burst behavior
- CPU limits can throttle bursts (good for fairness, bad if you rely on bursts for latency).
- Memory limits can cause OOM kills if pods exceed limits (risk and instability).
Limits help you manage risk, but they are not a stable baseline for node count unless your workload frequently runs at the limit.
5) Two constraints people forget (and then undercount nodes)
- Max pods per node: CNI/IP limits and kubelet settings cap pods/node even if CPU/memory looks fine.
- Topology constraints: zone spread, affinities, taints, and disruption budgets reduce packing efficiency.
Worked sizing template (copy/paste)
- Pick representative per-pod requests (baseline, not peak).
- Compute total CPU and memory requests = pods x per-pod requests.
- Compute allocatable per node = node capacity x allocatable%.
- Compute nodes_cpu and nodes_mem, then take max(nodes_cpu, nodes_mem).
- Add a peak scenario (deployments, incident retries, seasonal traffic) and compare.
Common sizing pitfalls
- Ignoring daemonset overhead: per-node agents eat capacity (logging, CNI, monitoring).
- Forgetting max pods per node: IP limits and kubelet settings can cap pods/node.
- Using peak traffic 24/7: budget with average usage, sanity-check with peak scenarios.
Next: turn node count into dollars
After you estimate node count, price it with Kubernetes Node Cost Calculator and then add other line items using the Kubernetes cost checklist: what to include.