The library. Every incident, structured.
A growing archive of public postmortems, broken down into a consistent shape: what broke, why it cascaded, and what to take from it. New incidents added regularly.
28+
incidents
11+
years
13
organizations
/
sort
4 results · filtered
topic: storage
id
incident
org
date
duration
severity
tags
FM-025
One storage outage broke many Cloudflare productsCloudflare products shared a key-value database called Workers KV. When its backing storage failed, uncached reads and writes stopped, taking authentication, builds, AI requests, and other features with them.
Cloudflare
2025-06-12
2h 28m
SEV-1
storagekvthird-party
FM-003
The S3 Command That Took Out us-east-1A maintenance command with the wrong scope argument removed too much S3 subsystem capacity in us-east-1, forcing the index and placement subsystems through full restarts.
AWS
2017-02-28
4h 17m
SEV-1
s3us-east-1storage
FM-012
When Heroku's Whole Platform Shared One AWS Failure DomainWhen AWS's US-East EBS storage entered a re-mirroring storm, Heroku's dynos, Heroku Postgres databases, and management API all failed together. The platform had no other region to run in and no path to recover without AWS.
Heroku
2011-04-21
~3 days
SEV-1
awsebsus-east
FM-013
The EBS Self-Repair Storm That Couldn't Stop ItselfA network change in US-East shifted traffic to a low-capacity path. EBS nodes that lost their replication connection began re-mirroring at the same time, exhausting free capacity and stranding volumes in a loop the cluster could not exit on its own.
AWS
2011-04-21
~4 days
SEV-1
ebsrdsus-east