Amazon has posted their summary of this week’s S3 disruption in us-east-1. While this was just 1 of 60 services in 1 of 16 regions, it had an outsized impact on operations. A number of AWS components and third party services depend on S3 in us-east-1, and the outage cased widespread service disruptions across the internet.
S3 was the first publicly available Amazon service, and us-east-1 was the first AWS region, which helps explain why so many services were built on this particular instance of the service.
In the summary, Amazon transparently details what went wrong as well as the measures they’re taking to ensure that this class of mistake cannot reoccur. The lesson I’m taking from this is to expect failures, but ensure that you never fail the same way twice.