Azure NAT Gateway cost: model hours, GB processed, and the real spike drivers

NAT Gateway pricing is simple on paper and tricky in production. The math is "hours + GB processed", but the bill is driven by what traverses NAT during peak periods: deploys, scale-outs, incidents, and dependency retries. This guide helps you build a model you can validate and improve.

0) Inventory what routes through NAT (the boundary)

Before you estimate anything, list the outbound paths that actually use NAT.

  • Which subnets have default routes that egress through NAT?
  • Which workloads live in those subnets (AKS nodes, VMSS, build runners, jump boxes)?
  • Which destinations are hit at scale (container registries, package repos, external APIs, SaaS, telemetry)?

If you cannot answer this, you will estimate the wrong thing. NAT costs are not "internet egress" in general; they are "traffic that your routing sends through NAT".

1) Baseline hours (count NAT gateways)

The baseline is the number of NAT gateways times hours per month. If you have multiple environments or regions, model each separately (prod/stage/dev). If your architecture is hub-and-spoke, be explicit about whether one NAT is shared or whether each spoke has its own.

2) GB processed (the throughput driver)

The practical way to estimate GB processed is to list the big outbound flows and estimate each flow's monthly GB. Do not guess with one blended number if you have a few dominant flows.

Tools: Data egress cost, Response transfer, Unit converter.

  • Container image pulls: AKS/VMSS churn, new node pools, autoscaling.
  • Package downloads: OS updates, language deps, CI caches.
  • External APIs: request volume × response sizes (retries amplify).
  • Telemetry/log shipping: logs and metrics exporters often run "always on".

3) Model the peak month (retries + churn)

A realistic model has at least two scenarios: baseline and peak. Peak is where NAT costs surprise teams.

  • Retries/timeouts: each retry repeats the full payload transfer; add a multiplier for incident windows.
  • Node churn: new nodes pull images and dependencies with cold caches.
  • Cold-start spikes: deployment rollouts can temporarily increase outbound dependency calls.

4) Practical levers to reduce cost (without breaking security)

  • Cache the big outbound flows: registry mirrors, package proxies, artifact caching.
  • Move large dependencies off the hot path: avoid downloading large artifacts at runtime.
  • Reduce retries: fix timeout budgets and backoff; retries are paid traffic.
  • Consider private access for high-volume Azure services (compare against Private Link).

Related: Azure Private Link costs.

Worked estimate template (copy/paste)

  • NAT gateways = count per env/region
  • Hours/month = 24 × days
  • Baseline GB/month = sum of big outbound flows routed via NAT
  • Peak add-on GB = deploy + incident windows (image pulls + retries)
  • Retry multiplier = 1 + retry_rate (apply to the affected flows)

How to validate

  • Validate routing: which subnets and workloads are actually using NAT.
  • Validate the top outbound destinations and their bytes in a representative week.
  • Validate deploy/scale-out periods: compare GB/hour during peak vs baseline.
  • After changes, re-measure outbound GB and confirm the model moves in the same direction as the bill.

Related tools

Sources


Related guides

Azure Application Gateway pricing: how to model L7 load balancer costs
Model Application Gateway costs using measurable drivers: hours, request volume, traffic processed, WAF, and logs - plus a validation checklist.
Azure Private Link costs: model endpoint-hours, data processed, and trade-offs vs NAT
A practical Private Link estimate: endpoint-hours baseline plus data processed (GB). Includes a workflow to count endpoints, model traffic through them, and validate DNS/routing so you don't pay for both private and NAT paths.
Azure API Management pricing: model requests, transfer, and log volume
A practical API Management estimate: request volume, response transfer, and logs/observability. Includes a checklist to validate retries, payload size, and usage tiers.
Azure Front Door pricing: model requests, bandwidth, and origin traffic
A practical Azure Front Door cost model: edge bandwidth, request volume, logging, and origin traffic (cache fill). Includes a checklist to validate hit rate and avoid double-counting egress.
Azure CDN pricing: estimate bandwidth, requests, and cache fill
A practical Azure CDN estimate: edge bandwidth, request volume, and origin egress (cache fill). Includes validation steps for hit rate, purge behavior, and big endpoints.
Azure Key Vault pricing: estimate operations, keys/secrets, and request spikes
A practical Key Vault cost model: baseline objects (keys/secrets/certs) plus operation volume. Includes a workflow to map traffic to Key Vault calls and validate caching, retries, and hot-path mistakes.

Related calculators


FAQ

Why do NAT bills spike?
Retry storms, dependency timeouts, container image pulls during node churn, and large outbound downloads can multiply GB processed quickly. NAT sits on the hot path for outbound traffic, so incident windows are the peak scenario.
How do I estimate quickly?
Count NAT gateways and hours per month, then estimate monthly outbound GB that routes through NAT. Add a separate peak line for deploy/incident windows and apply a retries multiplier.
How do I validate?
Validate which subnets route through NAT, then measure outbound GB from a representative window (flow logs, monitoring, or egress metrics). Verify that the biggest flows are the ones you think they are (registries, package downloads, APIs, telemetry).
What is the most common mistake?
Modeling only the steady-state egress and ignoring deploy/scale-out behavior (node churn + cold caches) which can create short, expensive spikes.

Last updated: 2026-01-27