Kubernetes requests vs limits: why requests drive node count (and cost)
If you search for a Kubernetes cost calculator, you will quickly see "requests" and "limits". The short version: requests are what the scheduler uses for capacity planning; limits are guardrails for bursting and safety. Mixing them up often leads to oversized clusters (or unpredictable performance risk).
Sizing decision rules
- Requests: use for scheduling and node count.
- Limits: use for burst control, not capacity.
- Overhead: include daemonsets and system reserve.
Requests: the scheduling baseline
Requests are the resources a pod asks for. Kubernetes tries to ensure that capacity exists for the sum of requests on each node (plus overhead). That makes requests the right baseline for "how many nodes do I need?"
- CPU requests affect packing and scheduling decisions.
- Memory requests are often the real limiter because memory is not compressible the way CPU can be.
Limits: the ceiling (risk and stability)
Limits cap how much a container can use. CPU limits can cause throttling; memory limits can cause OOM kills. Limits affect performance and risk, but they are not used for scheduling capacity the same way requests are.
- If CPU limits are too low, p95 latency can spike when traffic bursts.
- If memory limits are too low, OOM churn can create retries, errors, and extra traffic.
Common mistakes (and how they inflate cost)
- Using limits as requests: many teams set limits to 2-4x requests for burstability; treating that as a baseline inflates node estimates.
- Ignoring overhead: kube-system, daemonsets, and headroom reduce allocatable capacity.
- Assuming perfect packing: affinities, max pods/node, and topology spread constraints raise node count beyond the math minimum.
- Using peak traffic 24/7: plan baseline and peak as separate scenarios.
A simple sizing workflow
- Pick representative requests (baseline month, not incident peak).
- Estimate total requests = pods x per-pod requests (CPU and memory).
- Apply allocatable % to node capacity (leave headroom for overhead).
- Compute node count from CPU and memory, then take the larger number.
- Add a peak scenario and compare the delta (autoscaling vs always-on capacity).
Tool: Kubernetes Requests & Limits Calculator. Once you have node count, price it with Kubernetes Node Cost.