FM-029GitHub2026-04-23impact 4h38mSEV-1

The Merge-Base Bug That Silently Rewrote 2,092 Pull Requests

How an incomplete feature-flag gate let an unreleased merge-base path reach production squash groups, why single-pull-request test coverage could not expose a defect that only appeared with two or more pull requests in a group, and why GitHub learned about three and a half hours of silent branch corruption from customer support tickets.

merge-queue feature-flag squash-merge git-correctness silent-failure

citation

case study

For four hours and 38 minutes, GitHub's merge queue accepted squash merges and reported success. During that window, later pull requests in multi-PR groups silently reversed changes introduced by earlier ones. Availability monitoring stayed green throughout because the issue corrupted merge correctness, not service uptime. GitHub became aware three hours and 33 minutes after deployment through customer support inquiries, with no automated alert having fired.

GitHub's merge queue batches pull requests and merges them in order. Each repository configures a merge method — squash, merge commit, or rebase — and the queue applies it to each group. The queue builds each group from the latest base branch plus all pull requests already ahead in the queue. Group state is sequential, so each item's result depends on every item before it. A merge base is the common ancestor two branches share before they diverge; a ref update is Git's term for a branch pointer moving forward. At approximately 16:05 UTC on April 23, GitHub deployed a change that adjusted merge-base computation for merge-queue ref updates. The new code path was intended for an unreleased feature behind a feature flag, but the gating was incomplete. The incomplete gate let the path execute immediately on every qualifying squash group. Feeding a wrong merge base into a three-way merge changes which content is treated as new, retained, or removed. In a multi-PR group, this caused later pull requests to undo changes introduced by earlier ones. Existing tests primarily exercised single-pull-request groups. The defect appeared only when a later item consumed state from an earlier item, so single-PR tests could not expose it.

Both of these control failures — the incomplete gate and the test suite's limited cardinality — had to occur together for the defect to reach customers. Production monitoring stopped short of the boundary where multi-pull-request squash groups produced incorrect Git contents. Customer support inquiries became the first effective end-to-end correctness signal. The path from triage to merge-base diagnosis is not described in the public record.

When support inquiries climbed at 19:38 UTC, GitHub identified the regression. By then, 2,092 pull requests across hundreds of repositories had produced incorrect merge commits. GitHub mitigated the regression by reverting the change and force-deploying the fix across environments, stopping new faulty merges. Affected default branches remained incorrect and could not all be repaired automatically. GitHub reported no data loss — all commits remained stored in Git. Retained commits did not make affected default branches correct. Recovery shifted to targeted, step-by-step guidance for administrators of each affected repository.

Code containment and state recovery were separate tracks. Rollback stopped the faulty execution path immediately, but affected default branches remained incorrect and required per-repository repair. Retained commits were recovery inputs, not proof of correct repository state. Repository-specific constraints bounded centralized recovery. GitHub could identify affected repositories but could not safely apply one automatic repair to all of them, shifting recovery to targeted administrator guidance. The fix also carried an environment-specific completion boundary. Data Residency remained exposed while the same fix rolled out, with no published per-environment completion times.

GitHub's stated follow-through targets the missing validation boundary. It adds regression checks that validate resulting Git contents across supported merge-queue configurations, catching correctness failures before they reach production. The public record does not establish when those checks were implemented, which configuration matrix they cover, or whether the specific multi-PR squash scenario this incident exposed is included.

timeline · UTC

From the first signal to all-clear in 4h38m.

16:05 UTC

The change completes deployment

A change adjusting merge-base computation for merge-queue ref updates completed deployment. The new path immediately began executing for multi-pull-request squash groups because its feature-flag gate was incomplete.

19:38 UTC

Support inquiries surface the regression

GitHub became aware of the issue after customer support inquiries increased — three hours and 33 minutes after deployment completed, with no automated alert having fired during that window.

After diagnosis; exact time not published

GitHub reverts and force-deploys

GitHub reverted the code change and force-deployed the fix across all environments, stopping new faulty merges.

During rollout; exact time not published

Data Residency remains exposed

The behavior remained in GitHub Enterprise Cloud with Data Residency while the same fix was rolling out — a temporary environment-specific gap with no published completion time.

20:43 UTC

The reported impact window ends

GitHub's reported impact window ended at 20:43 UTC, four hours and 38 minutes after the faulty deployment completed.

After resolution; exact time not published

GitHub sends administrators recovery guidance

GitHub identified affected repositories and sent their administrators targeted step-by-step recovery instructions. Not every repository could be repaired automatically.

lessons

What to take away.

Monitor state-changing workflows for semantic invariants on their outputs, not only for request success and service availability.The service remained available while merge results were wrong, leaving customer reports as the first effective end-to-end correctness signal for roughly three and a half hours. For workflows that rewrite durable state, useful invariants can compare expected input effects with resulting state or sample outputs for impossible regressions. Such checks add computation and may require careful false-positive control; they should target high-consequence transformations rather than duplicate every business rule. The sources do not establish whether lower-level telemetry existed.

semantic_correctness_monitoring

For stateful batch or queue operations, derive test cases from the cross-product of supported modes and cardinalities, and assert the final state after later items consume earlier items' output.Single-item merge groups could not expose the faulty base relationship; the defect required multi-item squash groups and appeared as later merges undoing earlier content. Teams should prioritize combinations where cardinality introduces sequential state coupling, while using pairwise or risk-based selection when a full configuration matrix is too expensive. This pattern applies to supported modes whose behavior changes with group size, not to every independent option combination.

configuration_matrix_testing

Test feature gates at every execution boundary where unreleased behavior can enter a released workflow, including supported configurations that the new feature was not intended to change.The new merge-base path had a feature flag, but incomplete gating let it run for a released squash-group workflow. Gate verification should therefore prove both activation and non-activation across call paths, rather than only checking the flag's intended entry point. This applies when old and new behavior share execution machinery; it costs additional negative-path coverage and does not replace validation of the gated behavior itself. The record does not identify which gate condition or boundary was missing.

feature_flag_gate_coverage

For state-corrupting regressions, run code-path containment and durable-state repair as separate recovery tracks with independent completion criteria.Reverting and force-deploying the change stopped new faulty merges, but already modified default branches remained incorrect even though all commits were retained. Recovery plans should distinguish stopping mutation, identifying affected objects, and restoring correct logical state. This separation is most important when writes outlive the faulty code; it adds incident coordination and reconciliation work, and retained data is only a repair input rather than proof of correctness.

layered_recovery_planning

Classify corrupted objects by repair certainty so deterministic cases can be automated while ambiguous cases receive targeted evidence and operator guidance.GitHub identified affected repositories but could not safely repair every one automatically, so recovery shifted to instructions for repository administrators. A reusable repair system should separate provably reversible state from histories requiring owner judgment and record why automation is unsafe. The approach trades faster centralized recovery for conservative handling of ambiguity; the public record does not reveal the repair algorithm, ambiguity classes, completion rate, or repair outcomes.

repair_safety_classification

sources

Read the sources.

GitHub Community: [2026-04-23] Incident Thread

GitHub Community ↗

Incident with Pull Requests

GitHub Status ↗

An update on GitHub availability

The GitHub Blog ↗

Managing a merge queue

GitHub Docs ↗

← previous

FM-018 · The Overheated AWS Zone