Azure Event Hubs Pricing & Cost Guide (throughput, retention, egress)
Event streaming cost planning works best when you model bytes and replays. The same stream can be read multiple times by consumers, and that replay multiplier is where estimates often break down.
Event Hub pricing quick model
- Ingest GB/month = events/sec x bytes/event x seconds/month (then convert to GB).
- Retention = ingest GB/day x retention days.
- Replay multiplier = reprocessed GB/month from consumer replays/backfills.
0) Define the stream scope
- Producers: which services emit events and at what peak rate.
- Consumers: number of consumer groups and whether they reprocess data.
- Retention: how long data is kept and how often replays happen.
1) Ingestion volume (GB/month)
Estimate events/second * bytes/event to get bytes/second, then convert to GB/day and GB/month. Model top sources separately (audit logs, telemetry, clickstream) instead of using one blended average.
Tool: Ingestion calculator.
- Keep a peak scenario: ingestion spikes during incidents, deploys, and backfills.
- Track event size distribution: a small fraction of "large events" can dominate GB/month.
2) Retention
Retention is the "how long do we keep data" multiplier. Long retention increases stored data and makes replays/backfills more likely.
Tool: Retention storage.
3) Consumer replays and downstream costs
The hidden cost is re-reading and processing the same data. If you have multiple consumer groups, frequent replays, or backfills, model a replay multiplier and validate consumer lag patterns.
- Replay multiplier: how many times the same day of data is reprocessed (debugging, re-indexing, ML training).
- Downstream: compute, logs, and data transfer in the consumer pipelines often dominate the Event Hubs line item.
Worked estimate template (copy/paste)
- Ingest GB/month = events/sec * bytes/event * seconds/month / 1e9 (approx)
- Retention = ingest GB/day * retention days (order-of-magnitude)
- Replay GB/month = ingest GB/month * replay multiplier (if consumers reprocess)
Common pitfalls
- Using average events/sec and missing burst traffic (capacity and cost depend on peaks).
- Ignoring consumer groups and replays/backfills (multipliers matter).
- Using one blended bytes/event when a few event types dominate size.
- Missing downstream cost: logs and compute in consumers can exceed the streaming bill.
- Keeping long retention by default "just in case".
How to validate
- Validate peak ingestion vs average; keep the peak scenario in the model.
- Validate consumer lag and replay/backfill frequency.
- Validate retention settings and whether stored data is actually used.