Capacity Planning

💤0
Lv 10 XP
← 🛡️ Platform Engineering & SRE · Reliability

Capacity Planning

Advanced ⭐ 120 XP ⏱ 16 min #sre#capacity#scaling

Ensure you have enough resources to meet demand reliably, without overspending.

📖Theory

Capacity planning makes sure you can serve demand reliably while controlling cost. The process:

  1. Forecast demand — from growth trends, seasonality, and planned events
  2. Load test to find each component’s limits and per-unit capacity
  3. Provision with headroom — run below max so spikes and failures don’t tip you over
  4. Autoscale for elastic demand; reserve a buffer for the N+1 case (survive losing one unit/zone)

Key tension: too little capacity causes outages; too much wastes money. Watch saturation (one of the Golden Signals) and lead times — some resources can’t be acquired instantly.

🌍Real-World Example
Capacity for a Black Friday spike:
  Baseline: 2,000 req/s, autoscaled
  Forecast: 6x peak = 12,000 req/s
  Load test: one instance handles 500 req/s safely
  Plan: 24 instances at peak + N+1 buffer + headroom for the unknown
  Pre-scale before the event (autoscaling lag + provisioning lead time)
✍️Hands-On Exercise
  1. Outline the steps of capacity planning.
  2. Explain N+1 provisioning and why it matters.
  3. Why provision headroom rather than run at 100% utilization?
  4. How does load testing feed capacity decisions?
🧾Cheat Sheet
StepDetail
ForecastGrowth, seasonality, events
Load testFind per-unit limits
HeadroomRun below max
N+1Survive losing one unit
AutoscaleElastic demand
SaturationGolden signal to watch
Lead timeSome capacity isn’t instant
💬Common Interview Questions
What is N+1 capacity planning?

Provisioning enough capacity to keep serving demand even after losing one component (instance, zone, or region) — so a single failure doesn’t cause an outage.

Why not run systems at 100% utilization?

There’s no headroom for traffic spikes, failures, or deploys — any of which would immediately overload the system. Headroom turns a spike or failure into a non-event.

📚Official Documentation

📝 My notes on this topic

Auto-saves as you type