NAT Gateway vs VPC endpoints cost: when PrivateLink wins
NAT Gateway is often the first surprise “networking bill” teams see. A common fix is to keep AWS-service traffic private with VPC endpoints (Interface endpoints / PrivateLink, or Gateway endpoints where available). This guide shows how to compare costs without hand-waving: estimate NAT processed GB, estimate endpoint-hours, and model how much NAT traffic endpoints actually remove.
What you’re comparing (two different cost shapes)
- NAT Gateway: gateway-hours + GB processed (traffic-driven; spikes with downloads and retries).
- VPC endpoints: endpoint-hours (often per AZ) + sometimes per-GB processing (more fixed; scales with sprawl).
Step 1: estimate NAT processed GB (baseline + peak)
You need a rough monthly processed GB. Use metrics/flow logs when possible: estimate NAT GB processed.
Keep two scenarios: a normal week (baseline) and an incident/busy week (peaks often dominate monthly variance).
Step 2: estimate how much of that GB endpoints can remove
Endpoints don’t remove all NAT traffic — they remove the portion that is headed to services you can keep on the private path. A practical way to estimate is to group NAT egress by destination.
- AWS-service destinations: good candidates for endpoints/private access.
- Public internet/SaaS: endpoints won’t help; you still need NAT or other egress.
If you can’t attribute yet, start with scenarios (30% / 60% / 90% NAT reduction) and refine later.
Step 3: compare monthly costs with a scenario table
- NAT cost = gateway-hours × $/hour + GB processed × $/GB
- Endpoint cost = endpoint-hours × $/hour (+ optional per-GB) + remaining NAT cost for residual internet egress
The breakeven is usually “does the endpoint remove enough processed GB to beat its hourly baseline?”
When endpoints usually win
- Large fleets pulling the same artifacts (images, packages) from AWS services repeatedly.
- High-volume private workloads calling AWS APIs frequently (steady GB processed through NAT).
- Teams that can standardize endpoints and avoid endpoint sprawl (ownership and lifecycle).
When endpoints may not win
- Most NAT traffic is to the public internet or third-party SaaS (endpoints don’t remove it).
- Non-prod sprawl creates many endpoints across many AZs that stay always-on.
- Routing changes introduce cross-AZ transfer that offsets NAT savings.
Don’t forget transfer boundaries (where “savings” can move)
Some migrations reduce NAT GB processed but increase cross-AZ transfer or internet egress in a different line item. Model transfer explicitly if your traffic crosses boundaries: VPC data transfer guide.
Validation checklist (after rollout)
- Confirm NAT processed GB dropped by the expected percentage.
- Confirm endpoint-hours match the intended footprint (no unexpected AZ/sprawl).
- Check for shifted spend: cross-AZ transfer and internet egress.
- Re-check incident windows; retries can recreate NAT spikes even after endpoints.