The library. Every incident, structured.
A growing archive of public postmortems, broken down into a consistent shape: what broke, why it cascaded, and what to take from it. New incidents added regularly.
28+
incidents
11+
years
13
organizations
/
sort
2 results · filtered
topic: aws
id
incident
org
date
duration
severity
tags
FM-010
When Slack's Autoscaler Made the Network Outage WorseOn the first Monday after the holiday break, an AWS Transit Gateway saturated under Slack's return-to-work traffic. Packet loss hit the web tier just as autoscaling tried to add 1,200 instances, and the provisioning service collapsed under its own quota and file-descriptor limits.
Slack
2021-01-04
~3h 40m
SEV-2
awstransit-gatewayautoscaling
FM-012
When Heroku's Whole Platform Shared One AWS Failure DomainWhen AWS's US-East EBS storage entered a re-mirroring storm, Heroku's dynos, Heroku Postgres databases, and management API all failed together. The platform had no other region to run in and no path to recover without AWS.
Heroku
2011-04-21
~3 days
SEV-1
awsebsus-east