GitHub partially deployed a feature flag intended to scale down rate limiting for a subset of Copilot users. An edge case put the global limiter configuration into an invalid state, and requests were rejected with 403 errors.
GitHub Copilot's rate limiter was supposed to protect the service by deciding which requests could enter. That made it part of the availability path: if the limiter rejected healthy traffic, the rest of Copilot could be working and users would still see failures. Protective infrastructure becomes production infrastructure the moment every request has to pass through it.
On September 15, 2025, GitHub partially deployed a feature flag to the global rate limiter. The flag was intended to scale down rate limiting for a subset of users. Instead, an undetected edge case put the configuration into an invalid state and caused 100% request limiting on the affected path.
The symptom was direct: Copilot returned HTTP 403 errors and degraded availability for the majority of features. This was not a model-serving capacity problem or an API backend outage. The admission control layer made the decision before requests could reach those systems.
GitHub reverted the feature flag, and recovery was immediate by 18:20 UTC. Feature flags create intermediate states that full and rollback states never touch. A partial rollout was the only configuration that triggered this bug — which is why the test suite missed it, and why tests for admission-control flags need to cover mixed, missing, rollback, and subset states before they guard global traffic.
The flag triggered behavior that unintentionally limited 100% of requests, returning 403 errors.// GitHub availability report, September 2025
A partial flag rollout created an invalid limiter state.
The immediate cause was a feature flag partially deployed to GitHub Copilot's global rate limiter. The flag was intended to scale down rate limiting for a subset of users, but an undetected edge case put the rate-limiting configuration into an invalid state that rejected requests with 403 errors.
The deeper cause was insufficient validation around rate-limit scaling states. The rollout path allowed a partial flag state to affect global request admission, while tests and monitors did not catch the configuration edge case before it reached users.
What turned a limiter adjustment into request denial.
01
The component sat on the admission path.
Rate limiters fail before application logic can recover. A bad limiter state can make healthy downstream systems appear unavailable by rejecting traffic at the front door.
The feature flag did not need to reach all users to create impact. A partial rollout was enough to produce a configuration combination the limiter logic did not handle.
03
Tests missed the invalid scaling edge case.
The intended behavior was to reduce limiting for a subset of users. Pre-production coverage did not include the state that instead limited every request.
Treat rate limiters as critical availability dependencies.Admission controls need defensive defaults, config validation, and fast rollback because they can deny requests while every backend is healthy.
02
Test partial flag states explicitly.Feature flags create intermediate states. Add tests for mixed old-new config, missing values, subset rollout, and rollback transitions before flags touch global control points.
03
Alert on traffic rejection shape, not only volume.A sudden rise in `403` responses from a limiter should page separately from normal user or quota denials. Pair anomaly monitors with flag rollout events so operators can revert quickly.