GCP Cloud Run Pricing & Cost (requests, CPU/memory, egress)

Cloud Run-style platforms are predictable when you model measurable drivers: requests, CPU/memory time (duration), and outbound transfer. The main mistakes are ignoring response size and logging, and ignoring peak periods where retries multiply costs.

0) Pick the right unit of analysis

  • Requests/month: the driver for HTTP services.
  • Duration: the multiplier that turns requests into compute.
  • Concurrency: affects instance count and can change per-request CPU time under contention.
  • Egress: response bytes and external calls create transfer costs.
  • Logs: per-request bytes logged × request volume (often overlooked).

1) Requests (monthly volume)

Convert request rate to monthly requests and keep baseline + peak separate. Peaks include bot spikes, marketing events, and incident retry storms.

Tool: RPS to monthly requests.

2) Duration and concurrency (compute time)

Use percentiles (p50 and p95) instead of one average so you can model normal vs slow-path behavior. Slow paths often come from upstream latency (DB/APIs), cold starts, and throttling.

  • If you increase concurrency, validate latency and CPU contention; concurrency is not free for CPU-bound handlers.
  • Track retries/timeouts: they multiply both request count and total compute time.

3) Response transfer and egress

If each request returns significant data, egress becomes a major driver. Estimate average response size and multiply by monthly requests, then split transfer by destination if needed.

Tools: Response transfer, Egress cost.

  • Model heavy-tail endpoints separately (downloads, exports) so they do not disappear into a blended average.
  • Include external dependency calls that return large payloads (they create egress too).

4) Logs (often the second spike)

Logging can outweigh compute if you log too much per request. Estimate log bytes per request and multiply by request volume, then model retention and scan/search if you query heavily.

Tools: Log ingestion, Tiered log storage, Log scan.

Worked estimate template (copy/paste)

  • Requests/month = baseline + peak (include retries)
  • Duration = p50 + p95 scenario (seconds)
  • Egress GB/month = requests/month × avg response size (GB) (split heavy endpoints)
  • Log GB/month = requests/month × avg log bytes/request (baseline + peak)

Common pitfalls

  • Using only average duration and ignoring p95 (slow path drives cost and capacity).
  • Ignoring retries/timeouts which multiply requests, duration, and downstream calls.
  • Not modeling response size (egress dominates for large payloads).
  • Verbose logs per request (ingestion dominates at scale).
  • Not separating baseline vs peak months (incidents change the cost shape).

How to validate

  • Validate p50/p95 latency and cold start behavior.
  • Validate retries/timeouts and incident traffic windows.
  • Validate top endpoints by bytes (not just by request count).
  • Validate log bytes per request and retention settings.

Related tools

Sources


Related guides

Cloud CDN pricing (GCP): bandwidth, requests, and origin egress (cache fill)
A practical Cloud CDN cost model: edge bandwidth, request volume, and origin egress (cache fill). Includes validation steps for hit rate by path, heavy-tail endpoints, and purge/deploy events that reduce hit rate.
Cloud Functions pricing (GCP): invocations, duration, egress, and log volume
A practical Cloud Functions cost model: invocations, execution time, outbound transfer, and logs. Includes a workflow to estimate baseline + peak and validate retries, cold starts, and log bytes per invocation.
GCP load balancing pricing: hours, requests, traffic processed, and egress
A driver-based approach to load balancer cost: hours, request volume, traffic processed, and (separately) outbound egress. Includes a worked estimate template, pitfalls, and a workflow to estimate GB from RPS and response size.
Google Kubernetes Engine (GKE) pricing: nodes, networking, storage, and observability
GKE cost is not just nodes: include node pools, autoscaling, requests/limits (bin packing), load balancing/egress, storage, and logs/metrics. Includes a worked estimate template, pitfalls, and validation steps to keep clusters right-sized.
Artifact Registry pricing (GCP): storage + downloads + egress (practical estimate)
A practical Artifact Registry cost model: stored GB-month baseline, download volume from CI/CD and cluster churn, and outbound transfer. Includes a workflow to estimate GB-month from retention and validate layer sharing and peak pull storms.
Cloud Armor pricing (GCP): model baseline traffic, attack spikes, and logging
A practical Cloud Armor estimate: baseline request volume plus an attack scenario (peak RPS × duration). Includes validation steps for spikes, rule footprint, and the secondary cost driver most teams miss: logs and analytics during incidents.

Related calculators


FAQ

What usually drives Cloud Run cost?
CPU/memory time is the core driver, but request volume, egress, and log ingestion can dominate for high-traffic or large-response services.
How do I estimate quickly?
Estimate monthly requests, request duration (p50/p95), and response size. Add separate line items for outbound transfer and logs/retention.
What is the most common mistake?
Sizing from one average duration and ignoring retries and the slow path. Incidents often increase both request volume and duration at the same time.
How do I validate?
Validate p50/p95 latency, concurrency behavior, retries/timeouts, top endpoints by bytes, and log bytes per request.

Last updated: 2026-02-23