The library. Every incident, structured.
A growing archive of public postmortems, broken down into a consistent shape: what broke, why it cascaded, and what to take from it. New incidents added regularly.
28+
incidents
11+
years
13
organizations
/
sort
2 results · filtered
topic: automation
id
incident
org
date
duration
severity
tags
FM-020
The DynamoDB DNS Race That Emptied US-EAST-1A DynamoDB DNS automation race emptied the US-EAST-1 regional endpoint, then cascaded into EC2 launch, NLB, Lambda, and console failures.
AWS
2025-10-19
14h 32m
SEV-1
dnsautomationdynamodb
FM-014
The Automation Bug That Took Google's Network Control Plane OfflineA bug in Google's datacenter maintenance automation descheduled the network control plane in multiple physical locations at once. BGP withdrew within minutes, and traffic flowed onto an oversubscribed fail-static path until engineers could rebuild the configuration.
Google Cloud
2019-06-02
4h 25m
SEV-1
networkingautomationbgp