GCP Cloud Run Pricing Guide: Cost Calculator Inputs for Requests, CPU, and Egress
Start with a calculator if you need a first-pass estimate, then use this guide to validate the assumptions and catch the billing traps.
This is the Cloud Run service behavior and pricing decision page. Use it when you are budgeting Cloud Run from the way the service behaves in production: request volume, execution time, concurrency, outbound transfer, and logging.
Go back to the serverless parent guide if the broader architecture model is still unclear and you still need to map retries, downstream amplification, and observability before narrowing the estimate to Cloud Run.
What Cloud Run teams usually pay for in the real world
Most teams start with request count because it is easy to understand, but request count alone is not the bill. Cloud Run cost is usually shaped by a stack of related drivers that need to be read together.
- Requests tell you how often the service executes, but they only become meaningful after you split baseline traffic from peak and retry traffic.
- CPU and memory time decide how expensive each request becomes, which is why a slow path or poor concurrency setting can change the monthly total quickly.
- Egress matters when responses are large, clients are global, or the service is exporting files rather than returning small JSON payloads.
- Logs matter because request-by-request observability scales linearly with traffic and can become a second bill long before the application feels large.
The practical takeaway is simple: if you only model requests, you will usually undercount the expensive month. If you model requests, execution time, transfer, and logs together, Cloud Run becomes much easier to budget.
Build the first estimate from service behavior, not from one average
A useful Cloud Run estimate starts with the inputs that map cleanly to how the service behaves under normal and stressed conditions. The goal is not to create a perfect finance model on day one. The goal is to capture the drivers that explain why this service is cheap in one month and surprisingly expensive in another.
- Requests per month: convert traffic into a monthly number, but keep baseline and peak separated instead of blending them.
- Duration by percentile: use p50 and p95 so you can see what normal execution looks like and what the slow path costs.
- CPU and memory shape: capture what each request actually consumes while it is running, especially for handlers that are not purely I/O-bound.
- Concurrency behavior: note whether higher concurrency improves efficiency or causes latency and contention that erase the savings.
- Response size and egress: isolate heavy endpoints, downloads, exports, or media responses so they do not disappear inside one blended average.
- Log bytes per request: estimate logging separately from application payload size because verbose logs often scale differently from response bodies.
Tools that help with these inputs: RPS to monthly requests, response transfer, egress cost, log cost.
The strongest habit on this page is to separate baseline service behavior from peak behavior. A launch week, retry storm, or upstream slowdown does not just increase one line item. It changes requests, duration, transfer, and log volume together.
How compute, concurrency, transfer, and logs interact
This is the part many quick guides skip. Cloud Run does not become expensive because one input goes up in isolation. It gets expensive when several cost drivers reinforce each other.
- High latency with low effective concurrency means compute time dominates faster because every request holds resources longer.
- Large responses can push network transfer ahead of compute, especially for export, download, or media endpoints.
- Retry storms and timeouts multiply requests, compute time, downstream calls, and log volume together.
- Verbose request logging creates a second scale curve that keeps rising even if the application code itself is lightweight.
Concurrency deserves special attention because it changes the economics of the service. Higher concurrency can make an I/O-bound handler much more efficient, but the same setting can harm a CPU-bound endpoint, increase tail latency, and create a misleadingly optimistic estimate. When you model Cloud Run, treat concurrency as an operating choice that needs evidence, not as a constant you can pick once and forget.
- CPU-bound handlers: lower concurrency is often safer because it protects p95 latency and keeps contention visible.
- I/O-bound handlers: higher concurrency can improve cost efficiency, but only if latency stays controlled under load.
- Mixed workloads: split batch-like or heavy endpoints into separate services so one concurrency decision does not distort the entire estimate.
Scenario planning: the fastest way to avoid a weak budget
A Cloud Run estimate becomes useful when it explains more than one month shape. Instead of asking "what is my Cloud Run cost," ask what a normal month, a peak month, and a bad month look like. That framing usually catches the same risks that later show up in billing surprises.
| Scenario | What changes | Why it matters |
|---|---|---|
| Baseline month | Expected traffic, normal latency, standard log volume | Gives you the operating floor for a stable period |
| Peak month | Higher request volume, larger response mix, busier endpoints | Shows whether network and logs scale faster than compute |
| Failure month | Retries, timeouts, slow upstreams, noisy logging | Reveals how incidents turn one service into several simultaneous cost spikes |
If your estimate only covers a blended average month, it will look cleaner than the real system. That is exactly why first-pass Cloud Run budgets tend to fail during launches or incidents.
What usually goes wrong, and how to validate before you trust the estimate
Most weak Cloud Run estimates fail for operational reasons, not because the spreadsheet math is hard. Teams often have the right variables but the wrong shape of data behind them.
- One average duration is used for every endpoint, which hides the slow path that drives cost and tail latency.
- Retry traffic appears in dashboards but never makes it into the budget model.
- Large-response endpoints are blended into a harmless-looking average response size.
- Logging assumptions stay frozen even after traffic growth or more verbose instrumentation.
- Baseline and peak periods are merged, which makes the model look stable when the system is not.
Before you sign off on the estimate, validate the service against real operating signals rather than intuition.
- Check p50 and p95 latency for the endpoints that dominate traffic or spend.
- Check concurrency behavior under load so you know whether your efficiency assumptions survive real traffic.
- Check the top endpoints by response bytes, not just by request count.
- Check retries, timeout windows, and incident periods separately from normal traffic.
- Check log bytes per request and retention settings so observability is not treated as a rounding error.
A practical sign-off rule works well here: every major number in the model should map back to something measurable in production or a billing export. If you cannot explain where a number comes from, the estimate is not ready for budget decisions yet.
Next actions if you are budgeting Cloud Run now
If you are building a review packet for finance or for an internal architecture discussion, pair those calculators with the cloud cost estimation checklist so your estimate is tied to measurable inputs and a validation step.