AWS recently published a postmortem on the October 2025 us-east-1 outage [1]. A DNS race condition in DynamoDB cascaded across EC2, Lambda, Redshift, and NLB, leading to ~14 hours of degraded operations for new instance launches and knock-on effects on multiple services.
Has anyone quantitatively modelled AWS’s effective availability once you account for inter-service dependencies inside their control plane and data plane?
In other words: if EC2 depends on DynamoDB, and Lambda depends on EC2 + NLB, what’s the composite availability in practice?
[1] - https://aws.amazon.com/message/101925/