Azure Kubernetes Service (AKS) pricing: what to include
AKS cost planning fails when you only count nodes. A realistic estimate is a checklist across compute, networking, storage, and observability - and it validates packing assumptions.
0) Define the cluster scope (what exists and where)
- Environments: prod + staging + dev clusters (non-prod baselines are commonly missed).
- Node pools: system pools + user pools, plus GPU/spot pools if used.
- Peak vs average: autoscaling makes averages look cheap; peaks decide capacity and cost.
1) Node pools (VM hours)
Model each node pool separately: instance type, number of nodes, and hours per month. If you autoscale, model average and peak node counts.
Tool: Compute instance cost calculator.
2) Workload packing (requests/limits)
Requests/limits determine packing and therefore node count. Over-requesting CPU/memory is one of the most common cost leaks in Kubernetes.
Tool: Requests/limits helper.
3) Networking and egress
Egress from pods (to the internet, to other regions, or to external services) is frequently underestimated. Split internal traffic from billable egress and validate cache behavior when using CDNs.
Tool: Egress cost calculator.
4) Load balancing and ingress
Many clusters pay for ingress and L7 features separately (load balancers, gateways, WAF, and access logs). Budget ingress explicitly so it does not hide inside "Kubernetes cost".
Related: Application Gateway pricing.
5) Storage (persistent volumes)
Persistent storage is a GB-month driver. Include snapshots/backups if you rely on them, and validate retention windows.
Tool: Storage pricing (generic).
6) Observability (logs/metrics)
Logging and metrics scale with pod count and verbosity. Model ingestion and retention separately; logs can exceed node compute if you ingest too much.
Tool: Log cost calculator.
Worked estimate template (copy/paste)
- Compute = sum(node pool nodes * $/hour * hours) for baseline + peak
- Egress = outbound GB/month (internet + cross-region) * $/GB
- Storage = PV GB-month + snapshots/backups (if used)
- Observability = ingestion GB/month + retention GB-month + query scans
Common pitfalls
- Over-requesting CPU/memory and paying for empty headroom.
- Forgetting DaemonSets and system overhead (CNI, monitoring, logging) that consume capacity on every node.
- Budgeting only average autoscaling and missing frequent peak scale-outs.
- Missing outbound transfer and origin egress when using CDNs.
- Letting logs grow without sampling, retention, and query guardrails.
Validation checklist
- Validate kube-system overhead and DaemonSets (logging, CNI, monitoring) that consume node capacity.
- Validate autoscaling behavior: average vs peak nodes, and whether scale-outs are frequent.
- Validate egress boundaries (internet vs internal) and measure real outbound GB in a representative window.