RDS snapshot retention policy: cost model and safe defaults

Snapshot retention is a trade-off between recovery objectives and cost. Most cost blowups come from long retention combined with high churn, plus manual snapshots that never expire.

1) Define what you actually need (RPO/RTO → retention)

  • Operational recovery: typical restore windows (days to weeks).
  • Compliance retention: long-term retention if required (months to years).
  • RPO/RTO: how far back you must be able to restore, and how quickly.

If you can’t describe the use case for long-term retention (audit requirement, contract, policy), you probably don’t need it on every database.

2) Model cost with churn x retention

If churn is meaningful, backup storage tends to scale with daily changed GB x retention days.

Use a low and high churn scenario if you do not have strong measurements yet.

Related: Estimate backup GB-month.

3) Avoid the common retention traps (the “silent” costs)

  • Same retention everywhere (dev/staging backups that linger for months).
  • Manual snapshots without a lifecycle policy.
  • Frequent snapshots for fast-changing datasets without cost guardrails.
  • Cross-region copies that are never cleaned up after a project ends.

4) Use two tiers: short operational + targeted long-term

Keep the short tier for day-to-day recovery. Add long-term retention only where required and keep it scoped (critical databases, monthly snapshots, etc.).

  • Example operational tier (prod): 7–14 days; (staging): 3–7 days; (dev): 1–3 days.
  • Example long-term tier: monthly snapshots kept for 6–12 months, only for regulated or business-critical databases.
  • Prefer explicit ownership: tag snapshots with owner/team and enforce lifecycle rules so “no owner” snapshots expire.

Cost guardrails (prevent retention from drifting)

  • Set a monthly review: list snapshots by age and owner and delete anything that violates policy.
  • Alert on backup storage growth (GB-month) by account/environment so drift is visible within days, not quarters.
  • Require a reason for exceptions (long retention) and tie it to an audit ticket or compliance requirement so it can be revisited.

Validation checklist (don’t shorten retention blindly)

  • Test restore workflows (PITR / snapshot restore) for the retention window you propose.
  • Use Cost Explorer to compare backup storage GB-month before/after the policy change.
  • Audit manual snapshots monthly (or automate cleanup) so they can’t accumulate indefinitely.

Next steps

Sources


Related guides


FAQ

What retention policy keeps costs predictable?
Use short operational retention (days to weeks) and keep long-term retention only where required. Model costs using churn x retention and validate with real snapshot growth.
Why do manual snapshots often create surprise bills?
Because they can accumulate without a lifecycle policy. Long-lived manual snapshots can quietly dominate backup GB-month over time.
Should every environment have the same retention?
Usually no. Prod often needs longer operational retention, while dev/staging can use much shorter retention to avoid paying for non-critical history.
How do I pick a safe default if I'm unsure?
Start with a modest operational retention window (for example, 7–14 days), implement a lifecycle policy for long-term retention, and validate restore needs with real incident and recovery data.

Last updated: 2026-01-27