EKS vs GKE vs AKS cost: a practical comparison checklist (beyond node price)
The best way to compare EKS vs GKE vs AKS cost is to normalize capacity first (node count) and then add the costs that live "around the cluster": load balancers, egress/NAT, storage, and observability. This avoids apples-vs-oranges comparisons like "node price only".
0) Start with the same line items (for every provider)
- Worker nodes: node count × $/hour × billable hours (baseline + peak).
- Control plane: fixed fees, multi-cluster strategy, and management overhead.
- Load balancers / ingress: hourly + data processing charges depending on product.
- Storage: volumes, snapshots, object storage, and growth over time.
- Observability: logs ingestion/retention, metrics series, and scan/query.
- Egress: internet egress, NAT, and cross-zone/region transfer legs.
1) Normalize capacity (node count) from workload requests
Start from pod requests/limits and a realistic allocatable headroom. Do not size from "average CPU" alone; Kubernetes is constrained by bin packing, headroom, and spikes.
Tools: Requests & limits, Node cost, Cluster cost.
- Build two scenarios: average month and peak month.
- If you run autoscaling, model the fraction of time you are at peak (peaks are rarely 24/7).
2) Normalize pricing (effective $/hour)
Use an effective blended $/hour that matches your plan: on-demand, commitments, spot mix, and discounts. Early-stage models should prioritize being directionally correct over being SKU-perfect.
3) Add the "around the cluster" costs (where winners change)
Many teams discover that node price is not the deciding factor. These line items often dominate once traffic grows.
- Load balancers: hourly + data processing; traffic-heavy services pay more here than expected.
- NAT/egress: chatty microservices and cross-zone patterns create hidden transfer legs.
- Observability: logs and metrics scale with traffic and retention, not with node count.
- Storage: persistent volumes, snapshots, and growth are long-lived multipliers.
Tools: Egress, Log cost, Metrics series, Storage growth.
4) Worksheet template (copy/paste)
- Nodes = baseline nodes + peak nodes (and % of time at peak)
- Control plane = fixed monthly fee (multiply by number of clusters)
- Load balancers = count × hours + data processed (if applicable)
- Egress = GB/month by destination (internet, cross-region)
- Logs = GB/day ingestion + retention days + scans
- Metrics = active series × sample rate × retention
Common pitfalls
- Comparing only node price and ignoring control plane + "around the cluster" costs.
- Using peak node count 24/7 when autoscaling is bursty (overstates cost).
- Ignoring egress and NAT: chatty services pay for transfer legs.
- Letting observability grow unbounded (logs retention drift, high-cardinality metrics).
- Not validating with a representative week (good-day data is not enough).
How to validate
- Validate allocatable headroom and bin packing (do pods actually fit the way you think?).
- Validate autoscaling patterns: how often are you at peak?
- Validate egress by destination and identify the top talkers.
- Validate log ingestion and retention; estimate scans from dashboard/alert frequency.