GCP Cloud Run Pricing Guide: Cost Calculator Inputs for Requests, CPU, and Egress

Reviewed by CloudCostKit Editorial Team. Last updated: 2026-03-12. Editorial policy and methodology.

Start with a calculator if you need a first-pass estimate, then use this guide to validate the assumptions and catch the billing traps.


This is the Cloud Run service behavior and pricing decision page. Use it when you are budgeting Cloud Run from the way the service behaves in production: request volume, execution time, concurrency, outbound transfer, and logging.

Go back to the serverless parent guide if the broader architecture model is still unclear and you still need to map retries, downstream amplification, and observability before narrowing the estimate to Cloud Run.

What Cloud Run teams usually pay for in the real world

Most teams start with request count because it is easy to understand, but request count alone is not the bill. Cloud Run cost is usually shaped by a stack of related drivers that need to be read together.

  • Requests tell you how often the service executes, but they only become meaningful after you split baseline traffic from peak and retry traffic.
  • CPU and memory time decide how expensive each request becomes, which is why a slow path or poor concurrency setting can change the monthly total quickly.
  • Egress matters when responses are large, clients are global, or the service is exporting files rather than returning small JSON payloads.
  • Logs matter because request-by-request observability scales linearly with traffic and can become a second bill long before the application feels large.

The practical takeaway is simple: if you only model requests, you will usually undercount the expensive month. If you model requests, execution time, transfer, and logs together, Cloud Run becomes much easier to budget.

Build the first estimate from service behavior, not from one average

A useful Cloud Run estimate starts with the inputs that map cleanly to how the service behaves under normal and stressed conditions. The goal is not to create a perfect finance model on day one. The goal is to capture the drivers that explain why this service is cheap in one month and surprisingly expensive in another.

  • Requests per month: convert traffic into a monthly number, but keep baseline and peak separated instead of blending them.
  • Duration by percentile: use p50 and p95 so you can see what normal execution looks like and what the slow path costs.
  • CPU and memory shape: capture what each request actually consumes while it is running, especially for handlers that are not purely I/O-bound.
  • Concurrency behavior: note whether higher concurrency improves efficiency or causes latency and contention that erase the savings.
  • Response size and egress: isolate heavy endpoints, downloads, exports, or media responses so they do not disappear inside one blended average.
  • Log bytes per request: estimate logging separately from application payload size because verbose logs often scale differently from response bodies.

Tools that help with these inputs: RPS to monthly requests, response transfer, egress cost, log cost.

The strongest habit on this page is to separate baseline service behavior from peak behavior. A launch week, retry storm, or upstream slowdown does not just increase one line item. It changes requests, duration, transfer, and log volume together.

How compute, concurrency, transfer, and logs interact

This is the part many quick guides skip. Cloud Run does not become expensive because one input goes up in isolation. It gets expensive when several cost drivers reinforce each other.

  • High latency with low effective concurrency means compute time dominates faster because every request holds resources longer.
  • Large responses can push network transfer ahead of compute, especially for export, download, or media endpoints.
  • Retry storms and timeouts multiply requests, compute time, downstream calls, and log volume together.
  • Verbose request logging creates a second scale curve that keeps rising even if the application code itself is lightweight.

Concurrency deserves special attention because it changes the economics of the service. Higher concurrency can make an I/O-bound handler much more efficient, but the same setting can harm a CPU-bound endpoint, increase tail latency, and create a misleadingly optimistic estimate. When you model Cloud Run, treat concurrency as an operating choice that needs evidence, not as a constant you can pick once and forget.

  • CPU-bound handlers: lower concurrency is often safer because it protects p95 latency and keeps contention visible.
  • I/O-bound handlers: higher concurrency can improve cost efficiency, but only if latency stays controlled under load.
  • Mixed workloads: split batch-like or heavy endpoints into separate services so one concurrency decision does not distort the entire estimate.

Scenario planning: the fastest way to avoid a weak budget

A Cloud Run estimate becomes useful when it explains more than one month shape. Instead of asking "what is my Cloud Run cost," ask what a normal month, a peak month, and a bad month look like. That framing usually catches the same risks that later show up in billing surprises.

Scenario What changes Why it matters
Baseline month Expected traffic, normal latency, standard log volume Gives you the operating floor for a stable period
Peak month Higher request volume, larger response mix, busier endpoints Shows whether network and logs scale faster than compute
Failure month Retries, timeouts, slow upstreams, noisy logging Reveals how incidents turn one service into several simultaneous cost spikes

If your estimate only covers a blended average month, it will look cleaner than the real system. That is exactly why first-pass Cloud Run budgets tend to fail during launches or incidents.

What usually goes wrong, and how to validate before you trust the estimate

Most weak Cloud Run estimates fail for operational reasons, not because the spreadsheet math is hard. Teams often have the right variables but the wrong shape of data behind them.

  • One average duration is used for every endpoint, which hides the slow path that drives cost and tail latency.
  • Retry traffic appears in dashboards but never makes it into the budget model.
  • Large-response endpoints are blended into a harmless-looking average response size.
  • Logging assumptions stay frozen even after traffic growth or more verbose instrumentation.
  • Baseline and peak periods are merged, which makes the model look stable when the system is not.

Before you sign off on the estimate, validate the service against real operating signals rather than intuition.

  • Check p50 and p95 latency for the endpoints that dominate traffic or spend.
  • Check concurrency behavior under load so you know whether your efficiency assumptions survive real traffic.
  • Check the top endpoints by response bytes, not just by request count.
  • Check retries, timeout windows, and incident periods separately from normal traffic.
  • Check log bytes per request and retention settings so observability is not treated as a rounding error.

A practical sign-off rule works well here: every major number in the model should map back to something measurable in production or a billing export. If you cannot explain where a number comes from, the estimate is not ready for budget decisions yet.

Next actions if you are budgeting Cloud Run now

If you are building a review packet for finance or for an internal architecture discussion, pair those calculators with the cloud cost estimation checklist so your estimate is tied to measurable inputs and a validation step.

Sources


Related guides

Cloud Functions pricing (GCP): invocations, duration, egress, and log volume
A practical Cloud Functions cost model: invocations, execution time, outbound transfer, and logs. Includes a workflow to estimate baseline + peak and validate retries, cold starts, and log bytes per invocation.
Cloud CDN pricing (GCP): bandwidth, requests, and origin egress (cache fill)
A practical Cloud CDN cost model: edge bandwidth, request volume, and origin egress (cache fill). Includes validation steps for hit rate by path, heavy-tail endpoints, and purge/deploy events that reduce hit rate.
GCP load balancing pricing: hours, requests, traffic processed, and egress
A driver-based approach to load balancer cost: hours, request volume, traffic processed, and (separately) outbound egress. Includes a worked estimate template, pitfalls, and a workflow to estimate GB from RPS and response size.
Google Kubernetes Engine (GKE) pricing: nodes, networking, storage, and observability
GKE cost is not just nodes: include node pools, autoscaling, requests/limits (bin packing), load balancing/egress, storage, and logs/metrics. Includes a worked estimate template, pitfalls, and validation steps to keep clusters right-sized.
Cloud Armor pricing (GCP): model baseline traffic, attack spikes, and logging
A practical Cloud Armor estimate: baseline request volume plus an attack scenario (peak RPS × duration). Includes validation steps for spikes, rule footprint, and the secondary cost driver most teams miss: logs and analytics during incidents.
Serverless costs explained: invocations, duration, requests, and downstream spend
A practical serverless cost model: invocations and duration (compute time), request-based add-ons, networking/egress, and the log/metric drivers that often dominate totals.

Related calculators


FAQ

What usually drives Cloud Run cost?
CPU/memory time is the core driver, but request volume, egress, and log ingestion can dominate for high-traffic or large-response services.
Is Cloud Run priced per request or by CPU and memory?
Treat request volume as the entry point, but cost is usually shaped by CPU time, memory time, request duration, and egress. There is not one single blended 'price per request' that stays accurate across workloads.
How do I estimate quickly?
Estimate monthly requests, request duration (p50/p95), and response size. Add separate line items for outbound transfer and logs/retention.
What is the most common mistake?
Sizing from one average duration and ignoring retries and the slow path. Incidents often increase both request volume and duration at the same time.
How do I validate?
Validate p50/p95 latency, concurrency behavior, retries/timeouts, top endpoints by bytes, and log bytes per request.

Last updated: 2026-03-12. Reviewed against CloudCostKit methodology and current provider documentation. See the Editorial Policy .