Azure Kubernetes Service (AKS) pricing: what to include

AKS cost planning fails when you only count nodes. A realistic estimate is a checklist across compute, networking, storage, and observability - and it validates packing assumptions.

0) Define the cluster scope (what exists and where)

  • Environments: prod + staging + dev clusters (non-prod baselines are commonly missed).
  • Node pools: system pools + user pools, plus GPU/spot pools if used.
  • Peak vs average: autoscaling makes averages look cheap; peaks decide capacity and cost.

1) Node pools (VM hours)

Model each node pool separately: instance type, number of nodes, and hours per month. If you autoscale, model average and peak node counts.

Tool: Compute instance cost calculator.

2) Workload packing (requests/limits)

Requests/limits determine packing and therefore node count. Over-requesting CPU/memory is one of the most common cost leaks in Kubernetes.

Tool: Requests/limits helper.

3) Networking and egress

Egress from pods (to the internet, to other regions, or to external services) is frequently underestimated. Split internal traffic from billable egress and validate cache behavior when using CDNs.

Tool: Egress cost calculator.

4) Load balancing and ingress

Many clusters pay for ingress and L7 features separately (load balancers, gateways, WAF, and access logs). Budget ingress explicitly so it does not hide inside "Kubernetes cost".

Related: Application Gateway pricing.

5) Storage (persistent volumes)

Persistent storage is a GB-month driver. Include snapshots/backups if you rely on them, and validate retention windows.

Tool: Storage pricing (generic).

6) Observability (logs/metrics)

Logging and metrics scale with pod count and verbosity. Model ingestion and retention separately; logs can exceed node compute if you ingest too much.

Tool: Log cost calculator.

Worked estimate template (copy/paste)

  • Compute = sum(node pool nodes * $/hour * hours) for baseline + peak
  • Egress = outbound GB/month (internet + cross-region) * $/GB
  • Storage = PV GB-month + snapshots/backups (if used)
  • Observability = ingestion GB/month + retention GB-month + query scans

Common pitfalls

  • Over-requesting CPU/memory and paying for empty headroom.
  • Forgetting DaemonSets and system overhead (CNI, monitoring, logging) that consume capacity on every node.
  • Budgeting only average autoscaling and missing frequent peak scale-outs.
  • Missing outbound transfer and origin egress when using CDNs.
  • Letting logs grow without sampling, retention, and query guardrails.

Validation checklist

  • Validate kube-system overhead and DaemonSets (logging, CNI, monitoring) that consume node capacity.
  • Validate autoscaling behavior: average vs peak nodes, and whether scale-outs are frequent.
  • Validate egress boundaries (internet vs internal) and measure real outbound GB in a representative window.

Related tools

Sources


Related guides


Related calculators


FAQ

What usually drives AKS cost?
Node compute (VMs) is typically the largest line item, but egress, load balancing, persistent storage, and logging can be large in production.
How do I estimate quickly?
Estimate node-hours by node pool, then add separate estimates for egress, ingress/L7 components, storage GB-month, and log ingestion.
How do I validate?
Validate requests/limits and overhead, then compare estimated node utilization against real cluster metrics during peak periods.

Last updated: 2026-01-27