Back to Blog
Best practices

Your Webhook Logs Are Useless (And How to Fix Them)

Most webhook logging captures the wrong data at the wrong level. Stop logging everything and start logging what matters for debugging production incidents.

WebhookVault Team··7 min read

You're logging everything and seeing nothing

Someone on your team decided to log every incoming webhook payload. Full JSON body, headers, timestamps, the works. "For debugging," they said. Six months later you've got 400GB of logs in CloudWatch, your monthly bill looks like a car payment, and when something actually breaks? You still can't figure out what happened.

Sound familiar?

The problem isn't that you're not logging enough. You're logging too much of the wrong stuff and not enough of what actually matters when things go sideways at 3 AM on a Saturday. I've watched teams grep through millions of log lines trying to correlate a failed Stripe payment webhook with its retry attempts, because nobody thought to include a correlation ID. Brutal.

What most teams get wrong about webhook logs

The default instinct is to dump the entire request body into your logs. Every field, every nested object, every bit of metadata the sender includes. And yeah, it feels thorough. But you're creating three problems at once: storage costs explode, log search becomes painfully slow, and sensitive data (API keys, email addresses, payment tokens) ends up sitting in your log aggregator with zero encryption.

Stripe sends webhook payloads that can be 5-10KB each. GitHub's can hit 20KB+ for push events on active repos. Multiply that by thousands of events per day. Your logging pipeline wasn't built for this.

The other common mistake: logging at a single level. Everything is info. Successful delivery? Info. Signature validation failed? Info. Retry attempt #4? Also info. When you're filtering logs during an incident, you need severity levels that actually mean something.

Structured logging changes the game

Forget free-text log messages. If your webhook logs look like Received webhook from Stripe at 2026-03-29T07:00:00Z, you're making life harder for yourself. Structured logs give you fields you can query, filter, and aggregate without regex gymnastics.

// good: structured, queryable, minimal
const logWebhookReceived = (req) => {
  logger.info({
    event: 'webhook.received',
    provider: extractProvider(req),
    webhookId: req.headers['x-webhook-id'],
    correlationId: req.headers['x-correlation-id'] || generateId(),
    contentLength: req.headers['content-length'],
    eventType: req.body?.type || req.body?.event || 'unknown',
    timestamp: Date.now()
  });
};

// bad: unstructured, ungrepable mess
console.log('Got webhook: ' + JSON.stringify(req.body));

See the difference? The structured version gives you six filterable dimensions. The unstructured version gives you a string you'll have to parse later when you're panicking.

The five fields you need on every webhook log entry

After debugging webhook failures across maybe 30 different integrations, these are non-negotiable:

1. Correlation ID. If the sender provides one (Stripe uses idempotency-key, GitHub has X-GitHub-Delivery), use it. If they don't, generate one on first receipt and propagate it through retries. Without this, correlating a webhook with its processing outcome is just... guessing.

2. Event type. Not the full payload. Just the type. payment_intent.succeeded, push, order.completed. You need to know what happened before you care about the details.

3. Processing duration. How long did your handler take? 50ms is normal. 30 seconds means your database query is doing a full table scan and the sender is about to time out and retry, which creates duplicates, which creates a whole other class of problems.

4. Response code you sent back. Did you return 200? 202? 500? The sender's retry logic depends on this. If you're returning 200 before actually processing the webhook (fire-and-forget pattern), log that distinction explicitly.

5. Outcome. Not just "processed" but what happened. webhook.processed.order_created or webhook.processed.duplicate_skipped or webhook.failed.invalid_signature. Specific outcomes let you build dashboards that actually tell you something.

Log the envelope, not the letter

This is the single most useful mental model for webhook logging. Think of it like mail: you want to track every envelope that arrives, when it came, who sent it, what type of document is inside. But you don't photocopy every page and file it in your tracking system.

Log metadata. Store payloads separately, if you need them at all.

async function handleWebhook(req, res) {
  const envelope = {
    id: crypto.randomUUID(),
    provider: 'stripe',
    eventType: req.body.type,
    receivedAt: new Date().toISOString(),
    signatureValid: null,
    processingMs: null,
    outcome: null
  };

  const start = performance.now();

  try {
    envelope.signatureValid = verifySignature(req);
    if (!envelope.signatureValid) {
      envelope.outcome = 'rejected.invalid_signature';
      logger.warn(envelope);
      return res.status(401).send();
    }

    // store the full payload in your database, not in logs
    await storeWebhookPayload(envelope.id, req.body);

    const result = await processEvent(req.body);
    envelope.outcome = result.action; // 'order_created', 'duplicate_skipped', etc.
    envelope.processingMs = Math.round(performance.now() - start);

    logger.info(envelope);
    res.status(200).send();

  } catch (err) {
    envelope.outcome = 'failed.' + err.code;
    envelope.processingMs = Math.round(performance.now() - start);
    // warn, not error. save error for when YOUR system is broken
    logger.warn(envelope);
    res.status(500).send();
  }
}

Notice the payload goes to the database. The logs get a lightweight envelope. When you need to debug, you query your logs to find the webhook ID, then pull the full payload from storage. Two lookups, but your log pipeline stays fast and cheap.

Severity levels that aren't just decoration

Most teams use three levels in practice: debug (never enabled in prod), info (everything), and error (panics). That's not enough for webhooks. You need a real hierarchy.

DEBUG: Full payload dumps. Only enable this per-provider when actively debugging. Never in production by default. Seriously.

INFO: Successful receipt and processing. The happy path. Your envelope log with a positive outcome.

WARN: Signature failures, duplicate deliveries, slow processing (>5s), unexpected event types. Things that aren't broken but need attention.

ERROR: Your handler crashed. Database is down. Queue is full. Something in YOUR infrastructure failed, not the webhook sender's.

The key distinction: a webhook arriving with an invalid signature is a WARN. Your webhook endpoint returning 500 because Postgres is unreachable is an ERROR. One is expected noise. The other is an incident.

Retention: stop paying to store logs you'll never read

Quick math. 10,000 webhooks per day, 500 bytes per structured log entry, that's about 5MB daily. Tiny. Keep those for 90 days, no problem. But if you're logging full payloads at the log level? 10,000 webhooks at 10KB each is 100MB per day. 9GB per quarter. And that's a small integration.

Tier your retention. Hot logs (last 7 days) in your fast query engine. Warm logs (7-30 days) in cheaper storage. Cold logs (30-90 days) in S3 or equivalent. Beyond 90 days, you probably don't need them, and if you do, your webhook payload database has the data anyway.

One team I worked with was spending $2,400/month on Datadog purely from webhook payload logging. They switched to envelope-only logging and dropped to $180. Same debugging capability, because the payloads lived in Postgres where they belonged.

The correlation trick that saves incidents

When a webhook triggers a chain of events (payment received, then order created, then inventory updated, then email sent), you need to trace that entire chain back to the original webhook. Pass the webhook's correlation ID through every downstream operation.

// propagate the webhook ID through your entire processing chain
async function processPaymentWebhook(webhookId, event) {
  const order = await createOrder({
    ...event.data,
    sourceWebhookId: webhookId  // now you can trace back
  });

  await updateInventory(order.id, { traceId: webhookId });
  await sendConfirmationEmail(order.id, { traceId: webhookId });

  // every log line in every service includes this ID
  // grep once, see everything
}

When the customer complains their confirmation email never arrived, you grep for the webhook ID. One search. You see it was received, processed, order created, inventory updated, email service returned 429 because SendGrid was rate limiting you. Three minutes to diagnosis instead of thirty.

Don't forget the non-events

Log when webhooks DON'T arrive. This sounds weird, but it's saved me multiple times. If you expect a webhook from Shopify every time an order is placed, and your order volume suddenly drops to zero webhooks for 2 hours on a Tuesday afternoon? Something is wrong. Maybe Shopify changed their webhook URL config. Maybe someone deleted the endpoint in the dashboard. Maybe DNS is flaking out.

Set up a dead man's switch. If you don't receive at least N webhooks from provider X within Y minutes, fire an alert. It won't be in your webhook logs because there's nothing to log, which is exactly why you need a separate mechanism for it.

Logging what happened is easy. Noticing what didn't happen is where monitoring gets interesting, and where most teams have a blind spot the size of a parking garage.