FM-029GitHub2026-04-23impact 4h38mSEV-1

A silent merge queue bug corrupted 658 GitHub repos

GitHub's merge queue is supposed to protect the default branch. An incomplete feature-flag gate let an unreleased merge-base path run inside squash merge groups, producing valid-looking commits that silently reverted prior work while availability monitoring stayed green.

merge-queue feature-flag squash-merge git-correctness silent-failure

summary

The merge queue's job is to protect the last dangerous step before code reaches the default branch. It takes pull requests that have passed review and CI, groups them, retests the combined state, and lands them in order. That contract depends on one quiet guarantee: the commit the queue writes must contain exactly what was validated. If the queue produces a commit that looks valid but has the wrong tree, the protection mechanism becomes the thing corrupting main.

On April 23, 2026, GitHub deployed a change to its Pull Requests service. The change introduced a new code path for merge-base computation during merge queue ref updates. It was supposed to stay dormant behind a feature flag for an unreleased feature, but the gate was incomplete. In the specific case of merge queue groups using the squash method with more than one pull request, the new path ran anyway.

That mattered because squash merge groups depend on choosing the right base for the final three-way merge. With the wrong base, GitHub could create a commit that appeared to advance the default branch while quietly dropping changes that had already been merged into the group. The Git history still existed. The commit object was structurally valid. The damage was semantic: the branch tip no longer represented the code the queue was supposed to land.

For 3.5 hours, the failure stayed invisible to GitHub's automated monitoring. Request rates, latencies, and errors did not describe the problem, because the service was still operating. It was just writing the wrong result for one configuration: multi-PR merge queue groups using squash. Single-PR groups, ordinary merges, rebases, and pull requests merged outside merge queue were not affected.

The first useful signal came from customers. Users noticed that pull requests showed as merged even though the expected changes were missing from HEAD, and some saw restored commits appear while the public status page looked healthy. GitHub became aware at 19:38 UTC after an increase in support inquiries, then traced the issue back to the incomplete feature-flag gate.

GitHub reverted the code change and force-deployed the fix by 20:43 UTC. The final impact was 658 repositories and 2,092 pull requests. No commits were lost, but affected default branches were wrong, and GitHub could not safely repair every repository automatically. After resolution, it identified affected repositories and sent targeted remediation instructions to repository administrators.

The lasting failure mode is the partial gate. A feature flag can create a false sense of containment when it guards the common paths but misses one variant combination. Here, the missed combination was exactly where correctness mattered most: the automated system trusted to serialize merges silently produced commits that undid earlier work, while the availability dashboard had no reason to turn red.

Automated checks did not validate merge correctness for multi-PR squash groups.// GitHub, Merge Queue Availability, April 2026

timeline · UTC

From the first signal to all-clear in 4h38m.

16:05 UTC

Deployment with incomplete flag gating completes

A code change adjusting merge base computation for merge queue ref updates is deployed. The new code path was intended to be gated behind a feature flag for an unreleased feature, but the gating logic was incomplete for the squash merge case.

16:05–19:38 UTC

Merge queue silently corrupts squash commits

For over three hours, any merge queue group containing multiple pull requests using the squash merge method produces an incorrect three-way merge, reverting changes from previously merged PRs. No automated monitoring detects the correctness failure.

19:38 UTC

GitHub becomes aware via customer support

Users reporting unexpected commit contents prompt support inquiries that surface the incident. GitHub has no automated signal for merge correctness — the issue exists for 3.5 hours before a human notices.

~20:30 UTC

Root cause identified: incomplete flag gate

Engineers identify the new merge base computation code path as the cause and confirm it escaped its feature flag gating for squash merge groups.

20:43 UTC

Resolved via code reversion and force deployment

GitHub reverts the deployment and force-deploys the previous version. The merge queue returns to producing correct commits.

root cause

An unreleased code path escaped its feature flag.

The merge queue service received a new code path that changed how merge base was computed for merge queue ref updates. The path was written to be gated behind a feature flag for an unreleased feature. The gating logic was incomplete: in the specific case of squash merge groups, the new computation ran unconditionally regardless of the flag's state.

When a merge queue group contained multiple pull requests using the squash merge method, the incorrect merge base caused a faulty three-way merge. The result was a commit that appeared to advance the target branch but reverted changes from pull requests that had already been merged into the group. The commit landed in the repository's history looking structurally valid — the corruption was only visible when comparing the resulting tree against what the branch should have contained.

Automated monitoring tracked availability signals: request rates, latencies, error counts. It had no signal for whether the contents of a merge commit were correct. The bug was silent from monitoring's perspective for the entire window between deployment and the first customer report.

contributing factors

What let a correctness bug run undetected for hours.

Feature flag gating was partial, not total.

The code path was gated by a flag, but the gating logic did not cover the squash merge case. Partial gating creates exactly the gap that is hardest to notice: most paths behave correctly, but one combination of inputs escapes. A gate that guards nine of ten paths offers false assurance about the tenth.

Monitoring covered availability, not correctness.

A service can return HTTP 200 while silently writing wrong data. GitHub's automated checks validated throughput and error rates but had no assertion that a merge commit's tree contained the expected changes. There was no test that would have fired during the 3.5-hour window.

The failure required the intersection of two conditions.

The bug required both the squash merge method and a multi-PR merge queue group. Single-PR groups, rebase groups, and standard merge method groups all behaved correctly. Testing each condition in isolation missed the combination that triggered the wrong code path.

Detection depended on users reading commit contents.

The first signal came from customer support, not from automated monitoring or canary analysis. Correctness failures in write-path operations can run indefinitely when the only verification is a human inspecting the output.

lessons

What to take from this incident.

Audit feature flag gates against every code path they're meant to cover.A flag that gates one branch but not another is as dangerous as no flag at all for the uncovered case. Flag coverage should be verified at the code path level — enumerate every entry point that should be gated, and assert that none can reach the new behavior without the flag set. Relying on the author's mental model of coverage is the gap this incident fell through.

Add correctness checks to write-path operations, not just availability.Availability monitoring catches outages. It does not catch semantic failures. Systems that transform or merge data need output validation: sample a result and verify it matches expectations. For a merge queue, checking that a commit's tree equals the union of its constituent PR changes is the kind of assertion that turns a 3.5-hour silent corruption into a minutes-long incident.

Test the intersection of variant combinations, not each variant alone.Squash merge and multi-PR groups each worked correctly in isolation. Only the combination failed. Test matrices for configuration options should cover meaningful cross-products, especially when users can construct any combination themselves. The combination that triggers the bug is often the one not on the happy-path checklist.

Treat silent correctness bugs as higher severity than visible outages.An outage is immediately detected and bounded. A correctness bug silently accumulates damage for as long as it goes unnoticed — and the affected state may require manual remediation long after the deployment is rolled back. Invest in automated output validation so the detection window for data-integrity failures matches the detection window for availability failures.

sources

Read the original.

GitHub Merge Queue Availability — April 2026

github.com ↗

An Update on GitHub Availability

github.blog ↗

← previous

FM-028 · Copilot rate limiter returns 403s