Library — Failure Modes — Failure Modes

/

sort

newsletter

Get the next incident.

One production failure case study every week. Read the pattern before it shows up in your own system.

4 results · filtered

topic: storage

id

incident

org

date

duration

severity

tags

One storage outage broke many Cloudflare productsCloudflare products shared a key-value database called Workers KV. When its backing storage failed, uncached reads and writes stopped, taking authentication, builds, AI requests, and other features with them.

storagekvthird-party

The S3 Command That Took Out us-east-1A maintenance command with the wrong scope argument removed too much S3 subsystem capacity in us-east-1, forcing the index and placement subsystems through full restarts.

s3us-east-1storage

When Heroku's Whole Platform Shared One AWS Failure DomainWhen AWS's US-East EBS storage entered a re-mirroring storm, Heroku's dynos, Heroku Postgres databases, and management API all failed together. The platform had no other region to run in and no path to recover without AWS.

The EBS Self-Repair Storm That Couldn't Stop ItselfA network change in US-East shifted traffic to a low-capacity path. EBS nodes that lost their replication connection began re-mirroring at the same time, exhausting free capacity and stranding volumes in a loop the cluster could not exit on its own.