Back to Blog
Code Quality11 min

What Code Reviewers Actually Miss (And Why It Keeps Happening)

Data from 2,400+ pull requests shows reviewers consistently miss the same categories of bugs. Most of them aren't what you'd expect.

By Development TeamMarch 6, 2026

A fintech startup shipped a race condition to production that cost them €340,000 in duplicate transactions. Two senior engineers had reviewed that PR. Both approved it. Neither caught the bug.

Not because they were careless. Because humans are structurally bad at spotting certain classes of issues during code review. And no amount of "be more careful next time" fixes structural problems.

The Numbers Are Brutal

Microsoft Research studied code review effectiveness across 2,400 pull requests at large companies. Reviewers caught about 15% of the bugs that made it to production. Fifteen percent. That number should terrify anyone who treats code review as their last line of defense.

But it gets worse. The bugs reviewers do catch cluster around the same categories: naming issues, style violations, obvious logic errors. The stuff a linter handles in milliseconds. Meanwhile, concurrency bugs, edge cases in error handling, and subtle security flaws sail through review after review.

Why? Because reading code is cognitively expensive. After about 400 lines, reviewer attention drops off a cliff. And modern PRs routinely exceed that.

The Five Blind Spots

1. State mutations across boundaries

When a function modifies state that another module depends on, reviewers almost never trace the full dependency chain. They look at the function. They check the immediate caller. They don't check what happens three layers up when that state change triggers a re-render or invalidates a cache.

// Looks fine in isolation
function updateUserPreferences(userId: string, prefs: Partial<UserPrefs>) {
  const current = cache.get(userId);
  Object.assign(current, prefs);  // mutating cached object directly
  db.users.update(userId, { preferences: current });
}

// 400 lines away, in a completely different file:
function getUserPreferences(userId: string) {
  return cache.get(userId);  // returns the SAME object reference
  // caller modifies it thinking they have a copy
  // now cache is corrupted and nobody knows why
}

Every reviewer sees the first function and thinks "fine, it updates prefs." Nobody opens the cache module to check if get() returns a copy or a reference. In a PR diff, you'd need to actively hunt for this. Nobody does.

2. Error propagation gaps

This one accounts for roughly 23% of production incidents at companies tracking root causes rigorously. The happy path works. The error path... kind of works. But somewhere between the error being thrown and the user seeing a message, something gets swallowed, logged wrong, or retried in an unsafe way.

async function processPayment(order: Order) {
  try {
    const charge = await stripe.charges.create({
      amount: order.total,
      currency: 'eur',
      source: order.paymentToken,
    });
    await db.orders.update(order.id, { status: 'paid', chargeId: charge.id });
  } catch (err) {
    await db.orders.update(order.id, { status: 'failed' });
    throw err;  // good, re-throws
  }
}

// But the caller:
app.post('/checkout', async (req, res) => {
  try {
    await processPayment(order);
    res.json({ success: true });
  } catch (err) {
    logger.error('Payment failed', err);
    res.status(500).json({ error: 'Something went wrong' });
    // order status is 'failed' in DB
    // but what if the Stripe charge actually SUCCEEDED
    // and the db.orders.update in the try block failed?
    // now the customer is charged but order shows 'failed'
  }
});

Reviewers see the try/catch. They see the error gets logged. They approve. The partial failure scenario where Stripe succeeds but the database update fails? That requires holding two execution paths in your head simultaneously. Across a PR with 15 other files changed.

3. Implicit ordering dependencies

Setup code that needs to run before other code. Middleware that must be registered in a specific order. Database migrations that depend on previous migrations having run. Configuration that needs to exist before a service starts.

None of this shows up as an error in a code review. It shows up at 3 AM when the deployment pipeline runs things in a slightly different order than your local environment.

4. Resource lifecycle mismatches

Connections opened but never closed under certain error paths. Event listeners added in a loop without cleanup. Temporary files created during processing that survive process restarts. Memory allocated in a C extension that the garbage collector doesn't know about.

These bugs are invisible in PRs because they require understanding the temporal behavior of code, not just its logical structure. A PR diff shows you what the code is. It doesn't show you what happens after it runs 10,000 times.

5. Security assumptions that cross service boundaries

Service A validates the input. Service B trusts Service A. Service C calls Service B directly, bypassing A. Now B processes unvalidated input because the security check lived in the wrong place.

Microservice architectures make this exponentially worse. A reviewer looking at Service B's PR has no reason to question incoming data — the existing code already trusts it, and the new code follows the same pattern.

Why "Better Reviewers" Isn't the Answer

The instinct is always to add more process. Longer review checklists. Mandatory second reviewers. "Thorough review" as a team value.

None of that works. Not because people don't care, but because the cognitive demands of catching these bugs exceed what a human can do while reading a diff at 2 PM between meetings. You're asking someone to simulate program execution in their head, across multiple files, including error paths, under time pressure. The brain wasn't built for that.

Google's internal research showed that increasing review time from 15 minutes to 60 minutes improved defect detection by only 4%. Four percent. For quadrupling the time investment.

The problem isn't effort. It's that certain bug categories require execution, not reading.

What Actually Works

Small PRs. Under 200 lines of actual logic changes. This isn't controversial advice anymore but adoption is still shockingly low. A 2024 survey found the median PR at enterprise companies contains 450 lines of changes. Teams say they value small PRs and then they don't do it.

Targeted review focus. Stop asking reviewers to check everything. Assign specific concerns: "you're reviewing for correctness," "you're reviewing for security," "you're reviewing for API contract compliance." Specialized attention catches more than generalized scanning.

Automated analysis for the stuff humans reliably miss. Static analysis catches state mutations. Fuzz testing finds error propagation gaps. Dependency analysis maps implicit ordering. Load testing reveals resource leaks. None of these replace human review — they cover the blind spots.

And property-based testing. Instead of writing test cases for the scenarios you thought of, you describe what should always be true and let the framework generate thousands of inputs. That race condition at the fintech? A property test stating "total transactions processed equals total amount charged" would have caught it in CI.

Automated Analysis Fills the Gaps

Manual review catches design issues, readability problems, and architectural concerns brilliantly. Machines catch the rest. The combination is what works — treating either as sufficient on its own is how databases get corrupted and customers get double-charged.

ScanMyCode.dev runs automated code review across security, performance, and quality dimensions — the exact categories human reviewers consistently miss. Each finding includes the file, line number, and a concrete fix. Not generic advice. Actual patches.

Ship the Fix Before the Postmortem

Every team writes postmortems that say "improve code review process." Almost none of those action items prevent the next incident. Because the next incident will be a different blind spot that the improved process still doesn't cover.

Stop relying on humans for things humans are bad at. Automate the mechanical checks. Free up your reviewers to think about design, architecture, and "does this even solve the right problem." That's where human judgment is irreplaceable.

Run your codebase through an automated code review and see what your team has been missing. Results land within 24 hours.

code reviewsoftware qualitybugspull requestsdeveloper productivity

Ready to improve your code?

Get an AI-powered code audit with actionable recommendations. Results in 24 hours.

Start Your Audit