The Merge-Base Bug That Silently Rewrote 2,092 Pull Requests
How an incomplete feature-flag gate let an unreleased merge-base path reach production squash groups, why single-pull-request test coverage could not expose a defect that only appeared with two or more pull requests in a group, and why GitHub learned about three and a half hours of silent branch corruption from customer support tickets.
For four hours and 38 minutes, GitHub's merge queue accepted squash merges and reported success. During that window, later pull requests in multi-PR groups silently reversed changes introduced by earlier ones. Availability monitoring stayed green throughout because the issue corrupted merge correctness, not service uptime. GitHub became aware three hours and 33 minutes after deployment through customer support inquiries, with no automated alert having fired.
GitHub's merge queue batches pull requests and merges them in order. Each repository configures a merge method — squash, merge commit, or rebase — and the queue applies it to each group. The queue builds each group from the latest base branch plus all pull requests already ahead in the queue. Group state is sequential, so each item's result depends on every item before it. A merge base is the common ancestor two branches share before they diverge; a ref update is Git's term for a branch pointer moving forward. At approximately 16:05 UTC on April 23, GitHub deployed a change that adjusted merge-base computation for merge-queue ref updates. The new code path was intended for an unreleased feature behind a feature flag, but the gating was incomplete. The incomplete gate let the path execute immediately on every qualifying squash group. Feeding a wrong merge base into a three-way merge changes which content is treated as new, retained, or removed. In a multi-PR group, this caused later pull requests to undo changes introduced by earlier ones. Existing tests primarily exercised single-pull-request groups. The defect appeared only when a later item consumed state from an earlier item, so single-PR tests could not expose it.
Both of these control failures — the incomplete gate and the test suite's limited cardinality — had to occur together for the defect to reach customers. Production monitoring stopped short of the boundary where multi-pull-request squash groups produced incorrect Git contents. Customer support inquiries became the first effective end-to-end correctness signal. The path from triage to merge-base diagnosis is not described in the public record.
When support inquiries climbed at 19:38 UTC, GitHub identified the regression. By then, 2,092 pull requests across hundreds of repositories had produced incorrect merge commits. GitHub mitigated the regression by reverting the change and force-deploying the fix across environments, stopping new faulty merges. Affected default branches remained incorrect and could not all be repaired automatically. GitHub reported no data loss — all commits remained stored in Git. Retained commits did not make affected default branches correct. Recovery shifted to targeted, step-by-step guidance for administrators of each affected repository.
Code containment and state recovery were separate tracks. Rollback stopped the faulty execution path immediately, but affected default branches remained incorrect and required per-repository repair. Retained commits were recovery inputs, not proof of correct repository state. Repository-specific constraints bounded centralized recovery. GitHub could identify affected repositories but could not safely apply one automatic repair to all of them, shifting recovery to targeted administrator guidance. The fix also carried an environment-specific completion boundary. Data Residency remained exposed while the same fix rolled out, with no published per-environment completion times.
GitHub's stated follow-through targets the missing validation boundary. It adds regression checks that validate resulting Git contents across supported merge-queue configurations, catching correctness failures before they reach production. The public record does not establish when those checks were implemented, which configuration matrix they cover, or whether the specific multi-PR squash scenario this incident exposed is included.
From the first signal to all-clear in 4h38m.
A change adjusting merge-base computation for merge-queue ref updates completed deployment. The new path immediately began executing for multi-pull-request squash groups because its feature-flag gate was incomplete.
GitHub became aware of the issue after customer support inquiries increased — three hours and 33 minutes after deployment completed, with no automated alert having fired during that window.
GitHub reverted the code change and force-deployed the fix across all environments, stopping new faulty merges.
The behavior remained in GitHub Enterprise Cloud with Data Residency while the same fix was rolling out — a temporary environment-specific gap with no published completion time.
GitHub's reported impact window ended at 20:43 UTC, four hours and 38 minutes after the faulty deployment completed.
GitHub identified affected repositories and sent their administrators targeted step-by-step recovery instructions. Not every repository could be repaired automatically.