Cloud Spanner cost estimation: capacity, storage, backups, and multi-region traffic
Spanner-like databases are capacity-planned. Reliable cost estimation comes from separating four drivers: provisioned capacity, stored data + indexes, backups/retention, and multi-region/network patterns. Underestimates usually come from using averages and ignoring peak windows (and the slow path).
0) What to measure (the inputs that matter)
- Capacity baseline + peak: capacity-hours per month (or equivalent) across normal and peak periods.
- Average storage GB-month: data GB + index overhead, averaged across the month (not end-of-month size).
- Backups/retention: backup GB-month and retention windows.
- Traffic topology: which clients/services are cross-zone or cross-region.
1) Capacity: baseline vs peak (capacity-hours)
Model at least two scenarios: a baseline month and a peak month. If you scale capacity over time, estimate capacity-hours rather than using one static number.
- Baseline: steady reads/writes plus background work.
- Peak: deployments, reprocessing/backfills, incident retry storms.
- Slow path: p95 queries and timeouts are often what forces headroom.
Tool (for capacity-hours math): Compute instance cost.
2) Storage: data + index overhead (GB-month)
Storage is predictable if you model growth and index overhead explicitly. The safest approximation is to treat indexes as a separate multiplier (even a rough percentage is better than ignoring it).
- Data GB: current size and monthly growth rate.
- Index overhead: estimate as a % of data if you do not have exact numbers.
- Average GB-month: use mid-month average for linear growth, not end-of-month size.
Tool: Database storage growth.
3) Backups and retention
Backups are typically billed separately from primary storage. Long retention and compliance requirements can turn backups into a meaningful line item even when the primary dataset is stable.
- Keep retention settings explicit (days/months).
- Plan for restore validation (you need a realistic drill, not just a checkbox).
4) Multi-region patterns: transfer and replication
Distributed topologies change the bill shape. If readers/writers are cross-region, model outbound transfer separately and avoid blending it into "capacity".
Tools: Egress cost, Cross-region transfer.
- Separate inter-zone vs inter-region so you target the right optimization lever.
- If you front services with a CDN, avoid double-counting edge bandwidth and origin egress.
Worked estimate template (copy/paste)
- Capacity = baseline + peak (capacity-hours/month)
- Primary storage = avg (data + index) GB-month
- Backups = backup GB-month (retention window)
- Network transfer = outbound GB/month (split inter-zone vs inter-region)
Common pitfalls
- Sizing from averages and ignoring peak windows (deploys, backfills, incident retries).
- Ignoring inefficient queries and index patterns (the slow path drives headroom).
- Using end-of-month storage instead of average GB-month for growing datasets.
- Forgetting multi-region client traffic and cross-region replication patterns.
- Long backup retention silently growing month over month.
How to validate (practical checklist)
- Validate peak windows and incident behavior (retries and timeouts multiply operations).
- Validate query efficiency before you buy more capacity (slow queries, hot keys, missing indexes).
- Validate storage growth, retention policies, and index overhead.
- Validate cross-zone/cross-region traffic using flow logs or service metrics.