Back to Blog
Best practices

Testing Webhooks: From Local Development to Production Validation

Webhooks are hard to test because the sender controls the timing. Strategies for local tunneling, mock servers, integration suites, shadow mode, and production validation.

WebhookVault Team··8 min read

Why Webhooks Break the Testing Pyramid

The standard testing pyramid puts unit tests at the base, integration tests in the middle, and end-to-end tests at the top. Webhooks invert this. A unit test for a webhook handler tells you almost nothing useful, because the hard part isn't your handler logic. It's the interaction with an external system that controls the payload format, delivery timing, retry behavior, and authentication scheme. I've seen teams with 95% unit test coverage on their webhook handlers still break in production because the provider changed a field name in their payload.

The real challenges are structural. You can't trigger a webhook on demand from most providers during a test run. The payloads arrive asynchronously. Signature verification requires the exact raw body bytes, which test frameworks sometimes mangle. And staging environments often receive different webhook traffic than production, and many providers don't even support configuring separate webhook URLs per environment.

This means webhook testing requires a different strategy. You need to work from the outside in: start with realistic payloads, test the full request-to-response cycle, and build confidence incrementally as you move toward production.

Local Development with Tunnels

The fastest way to test against a real webhook provider is to expose your local server to the internet. Tools like ngrok, cloudflared, and localtunnel create a public URL that forwards traffic to localhost.

A typical setup with ngrok looks like this:

// Start your dev server
// npm run dev -- --port 3000

// In another terminal:
// ngrok http 3000

// ngrok output:
// Forwarding https://a1b2c3d4.ngrok-free.app -> http://localhost:3000

// Register the ngrok URL with your webhook provider:
// https://a1b2c3d4.ngrok-free.app/api/webhooks/stripe

This works well for initial development, but there are real downsides. The tunnel URL changes every time you restart ngrok (unless you pay for a fixed subdomain). You have to remember to update the webhook URL in your provider's dashboard. And there's a security angle that most teams overlook: if you point a production webhook at your tunnel, real customer data is flowing through ngrok's servers to your laptop. For a payment processor like Stripe, that's PCI-relevant traffic hitting a machine that probably also has Slack and Spotify running.

My recommendation: use tunnels only with test-mode credentials and synthetic data. Never route production webhooks to a local tunnel. Cloudflared is a good alternative to ngrok if you already use Cloudflare, since it avoids the third-party relay by tunneling directly through your own Cloudflare account.

Building a Mock Webhook Server

For repeatable testing, build a small service that sends webhook payloads on demand. This gives you full control over timing, payload content, and signatures, which are the three things that make real webhooks hard to test.

import crypto from 'node:crypto'
import express from 'express'

const app = express()

interface WebhookConfig {
  targetUrl: string
  secret: string
  event: string
  payload: Record<string, unknown>
}

app.post('/send-test-webhook', express.json(), async (req, res) => {
  const { targetUrl, secret, event, payload } = req.body as WebhookConfig
  const body = JSON.stringify(payload)
  const timestamp = Math.floor(Date.now() / 1000)
  const signature = crypto
    .createHmac('sha256', secret)
    .update(`${timestamp}.${body}`)
    .digest('hex')

  const response = await fetch(targetUrl, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-Webhook-Event': event,
      'X-Webhook-Timestamp': String(timestamp),
      'X-Webhook-Signature': `t=${timestamp},v1=${signature}`,
    },
    body,
  })

  res.json({
    status: response.status,
    body: await response.text(),
  })
})

app.listen(4000, () => console.log('Mock webhook server on :4000'))

This mock server lets you fire test webhooks at your handler whenever you want, whether during development, in CI, or as a manual smoke test. The signature generation mirrors what a real provider does, so your verification logic gets exercised too. I keep a library of JSON fixtures for different event types and edge cases (empty arrays, null fields, oversized payloads) and cycle through them during testing.

Integration Test Suites

A mock server is useful during development, but you also want automated tests that run in CI. The goal is to test the full webhook pipeline: HTTP request arrives, signature is verified, payload is parsed, handler processes the event, and the correct side effects happen.

import crypto from 'node:crypto'
import request from 'supertest'
import { app } from '../src/app'
import { db } from '../src/db'

describe('POST /api/webhooks/payments', () => {
  const secret = process.env.WEBHOOK_SECRET!

  function signPayload(body: string, timestamp: number): string {
    const sig = crypto
      .createHmac('sha256', secret)
      .update(`${timestamp}.${body}`)
      .digest('hex')
    return `t=${timestamp},v1=${sig}`
  }

  it('processes a payment.completed event', async () => {
    const payload = {
      event: 'payment.completed',
      data: { id: 'pay_123', amount: 4999, currency: 'eur' },
    }
    const body = JSON.stringify(payload)
    const timestamp = Math.floor(Date.now() / 1000)

    const res = await request(app)
      .post('/api/webhooks/payments')
      .set('Content-Type', 'application/json')
      .set('X-Webhook-Signature', signPayload(body, timestamp))
      .set('X-Webhook-Timestamp', String(timestamp))
      .send(body)

    expect(res.status).toBe(200)

    const order = await db.orders.findOne({ paymentId: 'pay_123' })
    expect(order?.status).toBe('paid')
    expect(order?.amountCents).toBe(4999)
  })

  it('rejects requests with invalid signatures', async () => {
    const body = JSON.stringify({ event: 'payment.completed', data: {} })

    const res = await request(app)
      .post('/api/webhooks/payments')
      .set('Content-Type', 'application/json')
      .set('X-Webhook-Signature', 't=123,v1=invalidsignature')
      .set('X-Webhook-Timestamp', '123')
      .send(body)

    expect(res.status).toBe(401)
  })
})

Notice how the test constructs the signature the same way the provider would. This is intentional: if your signature verification code has a bug, this test will catch it. I've seen teams skip signature testing because "it's the provider's SDK that handles it." Then they upgrade the SDK, the signature header format changes, and every webhook starts failing at 2 AM.

One pitfall with supertest: it sometimes re-serializes the body, which changes the byte representation and breaks HMAC verification. Pass the body as a pre-serialized string (not an object) to avoid this.

Shadow Mode in Production

At some point, you need to face real production traffic. Shadow mode is the safest first step: your handler receives real webhooks, logs everything, but skips all side effects. No database writes, no emails, no API calls to downstream services.

import type { Request, Response, NextFunction } from 'express'

interface ShadowOptions {
  flagKey: string
  logger: (event: string, data: unknown) => void
}

function shadowMode({ flagKey, logger }: ShadowOptions) {
  return async (req: Request, res: Response, next: NextFunction) => {
    const isShadow = process.env[flagKey] === 'true'

    if (isShadow) {
      logger('shadow_webhook_received', {
        method: req.method,
        path: req.path,
        headers: req.headers,
        body: req.body,
        timestamp: new Date().toISOString(),
      })
      // Acknowledge immediately so the provider thinks delivery succeeded
      return res.status(200).json({ status: 'shadow' })
    }

    next()
  }
}

// Usage
app.post(
  '/api/webhooks/payments',
  shadowMode({
    flagKey: 'PAYMENT_WEBHOOK_SHADOW',
    logger: (event, data) => structuredLog.info(event, data),
  }),
  paymentWebhookHandler
)

The key detail here: always return 200 in shadow mode. If you return an error code, the provider will retry, and you'll get duplicate deliveries when you eventually turn off shadow mode. I've seen a team return 204 in shadow mode thinking "no content" was appropriate. Their provider interpreted 204 correctly, but a different provider they onboarded later treated anything other than 200 as a failure. Stick with 200.

Run shadow mode for at least 48 hours, ideally a full week. This catches weekday-vs-weekend traffic patterns, batch processing jobs that only run at certain times, and edge-case event types that show up rarely.

Canary Rollouts for Webhook Handlers

After shadow mode gives you confidence, the next step is processing a small slice of real traffic. A canary rollout sends, say, 5% of webhooks to the new handler while 95% continue through the old path.

import crypto from 'node:crypto'

interface CanaryConfig {
  percentage: number // 0-100
  newHandler: (payload: unknown) => Promise<void>
  oldHandler: (payload: unknown) => Promise<void>
}

async function canaryRoute(
  payload: unknown,
  eventId: string,
  config: CanaryConfig
): Promise<void> {
  // Deterministic routing based on event ID, so the same event
  // always goes to the same handler, even on retries
  const hash = crypto.createHash('md5').update(eventId).digest()
  const bucket = hash[0] % 100

  if (bucket < config.percentage) {
    await config.newHandler(payload)
  } else {
    await config.oldHandler(payload)
  }
}

// Route 5% to the new handler
await canaryRoute(payload, event.id, {
  percentage: 5,
  newHandler: processPaymentV2,
  oldHandler: processPaymentV1,
})

The non-obvious detail here is deterministic routing. Using Math.random() means the same event might go to different handlers on retry, which can cause duplicate processing or data inconsistency. Hashing the event ID produces a stable bucket assignment. The same event always routes to the same handler, even if the provider retries delivery three times.

Ramp up gradually: 5% for a day, then 25%, then 50%, then 100%. At each stage, compare error rates and processing latency between the old and new handlers. If the new handler's error rate is more than 2x the old handler's, roll back immediately.

Contract Testing with Schemas

Webhook providers change their payload formats. Sometimes they announce it, sometimes they don't. A field gets renamed, a nested object gains a new required property, or a string field starts arriving as a number. Schema validation catches these changes before they cascade into your business logic.

Zod is my preferred tool for this in TypeScript:

import { z } from 'zod'

const PaymentEventSchema = z.object({
  event: z.enum([
    'payment.completed',
    'payment.failed',
    'payment.refunded',
  ]),
  data: z.object({
    id: z.string().min(1),
    amount: z.number().int().nonnegative(),
    currency: z.string().length(3),
    customer: z.object({
      id: z.string(),
      email: z.string().email(),
    }),
    metadata: z.record(z.string()).optional(),
  }),
  created_at: z.string().datetime(),
})

type PaymentEvent = z.infer<typeof PaymentEventSchema>

function validateWebhook(body: unknown): PaymentEvent | null {
  const result = PaymentEventSchema.safeParse(body)

  if (!result.success) {
    metrics.increment('webhook.schema_validation_failure')
    log.warn('Webhook schema mismatch', {
      errors: result.error.issues.map(i => ({
        path: i.path.join('.'),
        message: i.message,
      })),
    })
    return null
  }

  return result.data
}

When validation fails, log the specific field paths and error messages. This data is gold when you need to figure out what changed. A common mistake is to silently drop invalid webhooks. Instead, accept the delivery (return 200) but flag it for review. If you return an error code, the provider retries, and you'll keep failing on the same payload indefinitely.

I run schema validation in two modes. In production, mismatches log a warning but processing continues with defensive fallbacks. In CI, mismatches fail the test immediately. This gives you early warning without blocking production traffic.

Monitoring During Rollout

Deploying a new webhook handler without monitoring is like driving at night with the headlights off. You need real-time visibility into four metrics.

Error Rate

Track the percentage of webhook deliveries that result in a non-200 response or an unhandled exception. Baseline this before your rollout. A healthy webhook handler has an error rate under 0.1%. If your new handler exceeds 1%, something is wrong.

Processing Latency

Measure the time from request received to response sent, at p50, p95, and p99. Webhook providers have timeout windows: Stripe gives you 20 seconds, GitHub gives you 10. If your p99 latency creeps above 5 seconds, you're at risk of timeouts that trigger retries and duplicate processing.

Duplicate Detection

Track how often you see the same event ID more than once. Some duplication is normal, since providers retry on timeouts and network failures. But a spike in duplicates after a deploy usually means your handler is slow or returning errors, causing the provider to retry aggressively.

Queue Depth

If your webhook handler offloads work to a queue, monitor the queue depth. A growing backlog means you're receiving webhooks faster than you can process them. This often happens when a new handler introduces a slow database query or an extra API call in the processing path.

Set up automated alerts before you start the rollout, not after. I've made the mistake of deploying first and "planning to add monitoring later." By the time I noticed the problem, 6 hours of webhooks had failed silently and we had to replay them all manually.

Putting It All Together

A solid webhook testing strategy moves through distinct phases. During development, use tunnels with test-mode credentials and a mock webhook server for fast iteration. In CI, run integration tests with signed payloads and realistic fixtures against your full handler pipeline. Before production, turn on shadow mode for a week to observe real traffic patterns. Then canary the new handler at 5% with schema validation and monitoring in place. Ramp up to 100% only when your error rate, latency, and duplicate metrics stay within baseline thresholds.

Each phase catches different classes of bugs. Unit tests catch logic errors. Integration tests catch wiring problems. Shadow mode catches payload format surprises. Canary rollouts catch performance regressions. Schema validation catches provider-side breaking changes. No single phase is sufficient on its own, but together they give you the confidence to ship webhook handler changes without 2 AM pages.