Kubernetes requests & limits: practical sizing (and cost impact)
Start with a calculator if you need a first-pass estimate, then use this guide to validate the assumptions and catch the billing traps.
This is the Kubernetes node-sizing workflow page. Its job is to move from requests to allocatable capacity, headroom, and a defendable node count in sequence.
Scheduling is based on requests. That is why most capacity planning starts with requests and then uses limits to reason about burst and risk. If you mix them up, you usually end up with oversized clusters (or unpredictable performance).
If the open question is simply why requests and limits play different roles, go to the concept clarifier page first and then come back to this workflow. Concept clarifier.
Go back to the Kubernetes parent page if the wider budget map is still unclear and you have not yet separated node sizing from traffic, load balancers, and observability. Kubernetes costs.
1) Requests drive node count
A simple approach: total requests = pods x per-pod request. Then divide by allocatable per node. Our calculator does that and uses the larger of CPU-based and memory-based counts.
Tool: Kubernetes Requests & Limits Calculator
2) Leave allocatable headroom
Nodes are not 100% allocatable. System overhead, daemonsets, and kubelet reservations reduce usable capacity. Planning with 85-95% allocatable is common depending on your environment.
- If you run many daemonsets or have strict headroom targets, use a lower allocatable %.
- If you have a stable workload and validated overhead, you can increase allocatable %.
3) CPU vs memory: why one of them usually dominates
A cluster can be CPU-bound or memory-bound. The safe workflow is to calculate node count from CPU requests and from memory requests, then take the larger number.
- CPU-heavy services: watch throttling and p95 latency when requests are too low.
- Memory-heavy services: watch OOM kills when limits are too tight (and watch wasted memory when requests are too high).
4) Limits matter for burst behavior
- CPU limits can throttle bursts (good for fairness, bad if you rely on bursts for latency).
- Memory limits can cause OOM kills if pods exceed limits (risk and instability).
Limits help you manage risk, but they are not a stable baseline for node count unless your workload frequently runs at the limit.
5) Two constraints people forget (and then undercount nodes)
- Max pods per node: CNI/IP limits and kubelet settings cap pods/node even if CPU/memory looks fine.
- Topology constraints: zone spread, affinities, taints, and disruption budgets reduce packing efficiency.
Worked sizing template (copy/paste)
- Pick representative per-pod requests (baseline, not peak).
- Compute total CPU and memory requests = pods x per-pod requests.
- Compute allocatable per node = node capacity x allocatable%.
- Compute nodes_cpu and nodes_mem, then take max(nodes_cpu, nodes_mem).
- Add a peak scenario (deployments, incident retries, seasonal traffic) and compare.
Keep this page on the workflow layer. Non-node completeness belongs on the beyond-nodes checklist, and concept-only confusion belongs on the requests-vs-limits explainer.
Common sizing pitfalls
- Ignoring daemonset overhead: per-node agents eat capacity (logging, CNI, monitoring).
- Forgetting max pods per node: IP limits and kubelet settings can cap pods/node.
- Using peak traffic 24/7: budget with average usage, sanity-check with peak scenarios.
Next: turn node count into dollars
After you estimate node count, price it with Kubernetes Node Cost Calculator and then add other line items using the Kubernetes costs guide: what to include.