The library. Every incident, structured.
A growing archive of public postmortems, broken down into a consistent shape: what broke, why it cascaded, and what to take from it. New incidents added regularly.
28+
incidents
11+
years
13
organizations
/
sort
1 result · filtered
topic: transit-gateway
id
incident
org
date
duration
severity
tags
FM-010
When Slack's Autoscaler Made the Network Outage WorseOn the first Monday after the holiday break, an AWS Transit Gateway saturated under Slack's return-to-work traffic. Packet loss hit the web tier just as autoscaling tried to add 1,200 instances, and the provisioning service collapsed under its own quota and file-descriptor limits.
Slack
2021-01-04
~3h 40m
SEV-2
awstransit-gatewayautoscaling