The library. Every incident, structured.
A growing archive of public postmortems, broken down into a consistent shape: what broke, why it cascaded, and what to take from it. New incidents added regularly.
28+
incidents
11+
years
13
organizations
/
sort
3 results · filtered
topic: cache
id
incident
org
date
duration
severity
tags
FM-025
One storage outage broke many Cloudflare productsCloudflare products shared a key-value database called Workers KV. When its backing storage failed, uncached reads and writes stopped, taking authentication, builds, AI requests, and other features with them.
Cloudflare
2025-06-12
2h 28m
SEV-1
storagekvthird-party
FM-027
The Runner Cache Bug That Queued Ubuntu CI JobsA backend cache misconfiguration after failover caused duplicate GitHub Actions job assignments, reducing Ubuntu-24 runner capacity for public repos.
GitHub
2025-05-28
5h
SEV-2
cischedulerfailover
FM-011
The Consul Restart That Turned Slack's Cache ColdAn incremental Consul agent upgrade caused memcached nodes to be deregistered and replaced. The replacements came up empty, cache hit rates collapsed, and scatter queries from the cold cache overloaded the database.
Slack
2022-02-22
~5h
SEV-2
consulmemcachedcache