Multi-AZ architecture is frequently applied as a default — a standard pattern that every production workload should follow. This default is often right. But applying Multi-AZ indiscriminately across all workloads adds significant cost — in cross-AZ data transfer, in doubled or tripled infrastructure spend, and in operational complexity — for availability guarantees that many workloads do not actually need.
What Multi-AZ actually protects against
Multi-AZ architecture protects against availability zone failures — events where an entire AWS data center becomes unavailable. AWS AZ failures do occur, but they are infrequent and typically affect a small subset of services within the AZ. The resilience benefit of Multi-AZ is real and meaningful for workloads where downtime has direct business impact.
Multi-AZ does not protect against region-level failures, which require Multi-Region architecture with significantly higher complexity and cost. It also does not protect against application-level failures — a bad deployment that crashes your application pods simultaneously across all AZs is not mitigated by Multi-AZ infrastructure.
The cost of Multi-AZ
The primary costs of Multi-AZ architecture are cross-AZ data transfer charges and doubled infrastructure for stateful components. At $0.01 per gigabyte for cross-AZ traffic, high-throughput microservices architectures can accumulate substantial data transfer costs simply from service-to-service communication across AZ boundaries. RDS Multi-AZ deployments double the cost of the database tier. EKS with nodes spread across three AZs requires capacity planning for three times the baseline node count.
When Single-AZ is the right answer
Single-AZ architecture is appropriate for several categories of workload. Development and staging environments rarely have availability requirements that justify Multi-AZ cost — a development database going offline for 30 minutes during an AZ failure is an inconvenience, not a business incident. Batch processing workloads with restart capability can tolerate AZ failures without Multi-AZ infrastructure, since a failed batch job can be restarted in a healthy AZ. Internal tooling with low criticality and defined maintenance windows can similarly accept AZ-level failure without business impact.
Making the decision explicit
The most important principle is that the Multi-AZ decision should be explicit and documented for every production workload — not applied by default or omitted by oversight. The question to answer for each workload is: what is the cost of an AZ failure for this specific service, and does that cost justify the ongoing Multi-AZ premium? Where the answer is clearly yes, Multi-AZ is essential. Where the answer is no, Single-AZ with a documented recovery procedure is the more economically rational choice.