FM-016GitHub2026-04-27impact 6h 31mSEV-2

The Search Layer That Slowed GitHub.

A wave of anonymous scraping traffic did not take down GitHub, but it found a shared dependency hiding in plain sight. The load-balancing tier in front of search saturated, and ordinary workflows across the product started timing out.

searchscrapingload-balancingabuserate-limits

summary

Most users think of search as a box at the top of the page. On GitHub, it was also a dependency underneath everyday work: finding issues, loading pull requests, querying repositories, checking Actions, browsing packages, and surfacing Dependabot alerts. On April 27, 2026, that hidden dependency became visible when anonymous scraping traffic saturated the load-balancing tier in front of search, and product workflows that did not look like search began returning timeouts and errors.

On April 27, 2026, GitHub search began degrading at 16:15 UTC. Monitoring detected a drop in search results, and GitHub declared an incident at 16:21 UTC. The worst period came early: between 16:15 and 18:00 UTC, some search targets saw up to 65% of searches time out or return an error.

The traffic pattern mattered more than the daily total. GitHub later traced the incident to anonymous distributed scraping traffic crafted to avoid public API rate limits. The requests came from more than 600,000 unique IP addresses, with matching actor information across them. That traffic made up 30% of the day's total search traffic and was concentrated within a four-hour window. The load-balancing tier in front of search took the burst until it saturated.

The first signal told GitHub that search results were dropping. It did not immediately tell GitHub that concentrated scraping was the source of the pressure. The report says existing monitoring did not classify the increased scraping as a risk, and engineers discovered that dimension while working the incident. That distinction changes the response: a result-drop alert starts a reliability investigation; an abuse-pattern alert can start traffic restriction before the load balancers hit the wall.

GitHub mitigated the incident by relieving load-balancer pressure, scaling the load-balancing tier, blocking anomalous traffic, and tuning the balancers. It marked the incident mitigated at 21:33 UTC and resolved at 22:46 UTC. The lasting lesson is not only that search needed more headroom. The failure path ran through the gap between API rate limits and anonymous traffic controls, and through a shared load-balancing tier that many product workflows quietly depended on.

This scraping traffic made up 30% of the day’s total search traffic// GitHub availability report, April 2026

timeline · UTC

From the first signal to all-clear in 6h 31m.

16:15 UTC

Search connectivity degrades

GitHub search services begin experiencing degraded connectivity. Features that rely on search data, including Issues, Pull Requests, Projects, Repositories, Actions, Package Registry, and Dependabot Alerts, start seeing intermittent failures.

16:21 UTC

GitHub declares an incident

Monitoring detects a drop in search results, and GitHub declares an incident six minutes after the reported start of impact.

18:00 UTC

Search errors reach their worst window

Between 16:15 and 18:00 UTC, some search targets see up to 65% of searches time out or return an error.

During mitigation

Scraping pattern identified

GitHub discovers that anonymous distributed scraping traffic accounts for 30% of the day's total search traffic, concentrated within a four-hour period and originating from more than 600,000 unique IP addresses.

21:33 UTC

Incident marked mitigated

After scaling the load-balancing tier, blocking anomalous traffic, relieving pressure on the load balancers, and tuning balancer behavior, GitHub marks the incident mitigated.

22:46 UTC

Incident resolved

GitHub finishes monitoring the affected systems and declares the incident resolved. The customer-visible impact window lasted from 16:15 UTC to 22:46 UTC.

root cause

Search capacity was protected by limits the scraping path avoided.

GitHub search depended on a load-balancing tier deployed in front of the search infrastructure. That tier accepted search requests before they reached the systems that answered them. It also served more than the visible search page: issues, pull requests, projects, repositories, Actions, packages, and Dependabot alerts all relied on search data for user-facing workflows. When the tier saturated, those workflows saw intermittent failures even though the pressure came through search traffic.

The trigger was a large influx of anonymous distributed scraping traffic crafted to avoid GitHub's public API rate limits. The traffic came from more than 600,000 unique IP addresses and shared matching actor information across requests. It represented 30% of the day's search traffic, and GitHub says it was concentrated within a four-hour period. That combination of volume and timing saturated the load-balancing tier in front of search.

GitHub's existing monitoring detected the customer-visible symptom: search results were dropping. It did not classify the increased scraping as a risk before responders were already mitigating the incident. Engineers discovered that dimension while working the outage. Recovery required relieving load-balancer pressure, scaling the load-balancing tier, blocking anomalous traffic, and tuning balancer behavior.

contributing factors

What let search degradation become a product-wide failure.

Anonymous traffic bypassed the intended control point

GitHub had public API rate limits, but the scraping traffic was crafted to avoid them. That left another path able to create heavy search load without the same constraint. The service was protected at one entry point, while the cost was paid deeper in the system.

Traffic volume was concentrated, not merely large

The scraping accounted for 30% of the day's total search traffic, concentrated within a four-hour period. Daily totals can make a system look healthier than it is. Short bursts are what overload connection handling, load balancers, and shared dependencies.

The same dependency served many product paths

Search was not only a search box. Issues, pull requests, projects, repositories, Actions, packages, and Dependabot alerts all relied on search data. When one shared tier saturated, users saw failures in workflows that did not necessarily feel like search.

Monitoring saw the drop before it saw the abuse pattern

GitHub detected a drop in search results and declared the incident quickly, but its existing monitoring did not classify the increase in scraping as a risk. The team found the traffic pattern while mitigating the incident, so the first response started from the visible symptom rather than the pressure source.

Load balancer saturation shaped the blast radius

GitHub identified the saturated component as the tier in front of search infrastructure, not the search index itself. That distinction matters during response. Adding more capacity behind a constrained front door does not help unless the front door can pass the traffic through.

lessons

What to take from this incident.

Rate-limit by backend cost, not only by public API surface.Automated traffic will use whichever path lets it reach an expensive backend. Protect every route that can create the same load, including anonymous web paths, internal endpoints, and feature-specific query surfaces. The limit should follow the resource being consumed rather than only the API boundary.

Alert on traffic shape before it becomes an availability symptom.A drop in successful search results tells responders that users are already affected. Track earlier signals such as anonymous share, unique source count, connection churn, actor reuse, and concentrated growth rate. Those signals give operators a chance to restrict traffic before error rates climb.

Map shared dependencies by user workflow, not service name.Search degradation affected issues, pull requests, projects, repositories, Actions, packages, and Dependabot alerts because those workflows depended on search data. Dependency maps should show which user actions break when a shared service or fronting tier saturates, even when the UI does not expose that dependency.

Treat load balancers as stateful capacity systems.A load balancer can run out of usable capacity because of connection patterns, churn, reuse behavior, or uneven distribution before backend services hit their own limits. Test and monitor the fronting tier with bursty and hostile traffic shapes, not only average request volume.

Give incident responders controls that prefer authenticated users during abuse.GitHub added controls to restrict anonymous traffic so it could reduce impact to registered users during future events. That kind of control should exist before an incident. Operators need a way to shed low-trust load without taking down the feature for everyone.

sources

Read the original.

GitHub availability report: April 2026

github.blog ↗

← previous

FM-015 · The Impossible Date That Broke Azure VM Startup

FM-017 · The DNSSEC Failure That Made .de Look Fake