Back to Blog
Performance11 min

Your Node.js App Is Leaking Memory and the Metrics Look Fine

Memory leaks in Node.js are slow killers. Most monitoring dashboards won't flag them until the OOM killer fires. Patterns that cause them and how to actually track them down.

By Performance TeamMarch 25, 2026

A SaaS platform running Express with about 4,000 daily active users started getting random 502s every Tuesday around 3 PM. The team checked CPU usage, database connections, response times. All green. What they didn't check: heap memory had been climbing 12 MB per hour since the last deploy, and every Tuesday afternoon it crossed the container's 512 MB limit. The OOM killer showed up right on schedule.

Took them three weeks to figure it out.

Memory leaks in Node.js are uniquely nasty because V8's garbage collector is actually pretty good. Good enough that developers trust it blindly. And for 95% of the code you write, that trust is justified. But the other 5% will put your containers in a restart loop at 3 AM on a Friday.

Why Standard Monitoring Misses This

Most APM tools sample memory usage at intervals. 30 seconds, 60 seconds, sometimes 5 minutes. A leak that adds 8 MB per hour looks like noise on a graph that bounces between 180 and 350 MB depending on traffic. You'd need to stare at a 72-hour trend line to spot the upward drift, and nobody does that unless something is already broken.

Kubernetes makes it worse. A pod hits its memory limit, gets killed, restarts fresh. The leak resets. Your uptime looks fine because the restarts happen fast enough. Meanwhile, every restart drops in-flight requests on the floor. Users get errors. But your dashboard shows 99.8% uptime because the health check passes again 4 seconds later.

The Usual Suspects

Event listeners that never get removed

Classic. Everyone knows about this one and everyone still does it.

// this runs on every request
app.get('/stream', (req, res) => {
  const handler = (data) => {
    res.write(JSON.stringify(data));
  };

  eventBus.on('update', handler);

  // connection closes, handler stays attached forever
  // after 10,000 requests you've got 10,000 listeners
  // eating memory and firing into dead sockets
});

The fix is obvious once you see it. Clean up on close.

app.get('/stream', (req, res) => {
  const handler = (data) => {
    res.write(JSON.stringify(data));
  };

  eventBus.on('update', handler);

  req.on('close', () => {
    eventBus.removeListener('update', handler);
  });
});

Node will even warn you when an emitter has more than 10 listeners. Teams set setMaxListeners(0) to suppress the warning. That warning exists for a reason.

Closures holding references they shouldn't

Trickier to spot. A closure captures a variable. That variable references a large object. The closure lives longer than expected. V8 can't collect the object because the closure still technically has access to it, even if it never reads it again.

function processOrder(order) {
  // 'order' contains the full payload: items, user, payment, history
  // maybe 200KB of data

  const summary = order.items.map(i => i.name).join(', ');

  // this timer fires once, 30 seconds later
  // but until it fires, the entire 'order' object stays in memory
  // because the closure technically has access to it
  setTimeout(() => {
    analytics.track('order_processed', { summary });
    // only uses 'summary', but V8 retains 'order' too
  }, 30000);
}

Under load, 500 orders per minute, you're holding 500 full order objects in memory for no reason. Restructure the closure to only capture what it needs, or extract the data before creating the timeout.

Growing caches with no eviction

The amount of production code that does this is genuinely alarming:

// "in-memory cache" that's really just a Map that grows forever
const cache = new Map();

function getUser(id) {
  if (cache.has(id)) return cache.get(id);

  const user = db.fetchUser(id);
  cache.set(id, user);  // never deleted, never expires
  return user;
}

10,000 unique users later, that Map is eating 80 MB and growing. Use an LRU cache. lru-cache on npm has 40 million weekly downloads for exactly this reason. Or just use Redis and stop pretending your process memory is a database.

Buffers from streams that don't get consumed

Readable streams in Node buffer data internally. If you create a stream and don't consume it, or you consume it slower than it produces, the internal buffer grows. Backpressure handling is one of those things that works automatically until it doesn't, and when it doesn't, it fails silently by eating RAM.

File upload handlers are the worst offenders. A user uploads a 200 MB file, your middleware buffers the whole thing before passing it to the route handler. Multiply by concurrent uploads. Now your 512 MB container is trying to hold three 200 MB files in memory simultaneously.

How to Actually Find the Leak

Forget process.memoryUsage(). Useful for confirming a leak exists, useless for finding it.

Heap snapshots are the real tool

V8 can dump its entire heap to a file. You take one snapshot when the app starts, another after the suspected leak has been running for a while, and compare them. The diff shows you exactly which objects grew.

// add this to your app, behind auth obviously
const v8 = require('v8');
const fs = require('fs');

app.get('/debug/heapdump', (req, res) => {
  const filename = `/tmp/heap-${Date.now()}.heapsnapshot`;
  const stream = v8.writeHeapSnapshot(filename);
  res.json({ file: stream });
});

Load the snapshots in Chrome DevTools (Memory tab → Load). The "Comparison" view between two snapshots is gold. Sort by "Size Delta" and the leaking objects float to the top.

The --inspect flag in production (carefully)

You can attach Chrome DevTools to a running Node process. Start with --inspect=0.0.0.0:9229, port-forward through kubectl, and you've got a live memory profiler on your production instance. Powerful. Also dangerous. Anyone who can reach that port can execute arbitrary code in your process. Use it, get your data, kill the flag. Don't leave it running.

Allocation tracking for the stubborn ones

Some leaks are tiny. 200 bytes per request. Takes days to matter. Heap snapshots won't show them clearly because the noise-to-signal ratio is too high. For these, use --heap-prof to generate allocation profiles over time. It records where every allocation happens, and you can filter by objects that survive garbage collection. The ones that survive are your leak candidates.

Patterns That Prevent Leaks in the First Place

WeakRef and WeakMap exist. Use them when you need a cache that references objects without preventing their garbage collection. If the only reference to an object is through a WeakMap, V8 can collect it.

// regular Map: holds strong reference, prevents GC
const strongCache = new Map();

// WeakMap: allows GC when no other references exist
const weakCache = new WeakMap();

// WeakRef: for individual values you want to cache loosely
const ref = new WeakRef(largeObject);
// later...
const obj = ref.deref(); // returns undefined if GC'd

Set memory limits explicitly. Don't rely on container limits being the backstop. Use --max-old-space-size=384 to tell V8 "you have 384 MB, start collecting aggressively when you get close." This makes leaks surface faster in development.

AbortController. Underused. If you have timers, fetch calls, or stream operations tied to a request lifecycle, pass an AbortSignal and clean up when it fires.

app.get('/api/data', async (req, res) => {
  const controller = new AbortController();

  req.on('close', () => controller.abort());

  try {
    // this fetch will be cancelled if the client disconnects
    const data = await fetch('https://slow-api.example.com/data', {
      signal: controller.signal,
    });
    res.json(await data.json());
  } catch (err) {
    if (err.name === 'AbortError') return; // client left, nothing to do
    throw err;
  }
});

The Monitoring Setup That Actually Works

Track process.memoryUsage().heapUsed every 10 seconds. Ship it to your metrics system. Set an alert not on the absolute value, but on the rate of change over 6 hours. If heap usage is consistently increasing by more than 1 MB per hour across multiple pods, you have a leak. Absolute thresholds miss slow leaks entirely because restarts keep resetting the baseline.

Track event listener counts. process._getActiveHandles().length and process._getActiveRequests().length are unofficial APIs, but they work and they've saved production incidents multiple times. A steadily increasing handle count is a leak. Period.

Run load tests with clinic.js before deploying. Specifically clinic doctor and clinic heapprofile. Takes five minutes to set up. Catches 80% of memory issues before they reach production. Most teams skip this because "it works on my machine" and then spend 40 hours debugging in production where the feedback loop is a thousand times slower.

When Automated Scanning Catches What You Miss

Manual code review can spot the obvious patterns. The unbounded Map, the missing removeListener call. But closures that accidentally capture large scopes? Stream handlers that don't implement backpressure correctly? Those require tooling. ScanMyCode.dev runs static analysis that flags these patterns automatically, including the subtle ones that show up as "this variable is captured but never used in the closure body." The performance audit specifically looks for unbounded growth patterns, missing cleanup handlers, and stream backpressure issues. Results include file paths, line numbers, and concrete fixes.

Stop Restarting and Start Fixing

The most common "fix" for memory leaks in production is a cron job that restarts the process every 4 hours. Teams actually do this. It works in the sense that the app stays up. It fails in every other sense: dropped connections, lost in-memory state, wasted compute restarting healthy instances alongside leaky ones.

Find the leak. Fix the leak. Use heap snapshots, track your metrics properly, and stop treating OOM kills as normal operational behavior. If your pods restart more than once a week from memory pressure, something is wrong with the code, not the infrastructure.

Not sure where the leak is? Submit your codebase for a performance audit. You'll get a report with specific files, line numbers, and fixes within 24 hours. Cheaper than another week of guessing.

performancenode-jsmemory-leaksdebuggingmonitoring

Ready to improve your code?

Get an AI-powered code audit with actionable recommendations. Results in 24 hours.

Start Your Audit