The Invoice That Charged Twice
A payments team I worked with had a fun Friday afternoon once. Their Stripe webhook handler processed 340 duplicate invoice.paid events in about twelve minutes. Customers got double-charged. Support queue exploded. The postmortem was brutal.
Root cause? Stripe retried a batch of webhooks after a brief network blip. The handler was fast, stateless, and completely unaware it had already seen those events. No idempotency checks anywhere.
This happens more than people admit.
Why Duplicates Are Inevitable
Webhook providers retry. That's the deal. If your endpoint returns a timeout, a 5xx, or the connection drops mid-response, the provider assumes failure and sends it again. Sometimes your server did process it successfully but the ACK got lost on the way back. The provider doesn't know that. It retries.
Stripe, GitHub, Shopify, Twilio, all of them. Every major provider documents this behavior somewhere in their docs, usually buried three pages deep. And some providers are aggressive about it. Shopify will retry up to 19 times over 48 hours.
But there's a subtler issue too. Load balancers can replay requests. Proxies can buffer and resend. Your own retry logic upstream can cause duplicates if you're chaining webhooks between services. The network is not your friend here.
What an Idempotency Key Actually Is
Simple concept. Every webhook event gets a unique identifier. Before you process it, you check: have I seen this ID before? If yes, skip. If no, process and record the ID.
Most providers give you one for free. Stripe sends an id field on every event (evt_1234abc). GitHub includes an X-GitHub-Delivery header with a UUID. Shopify has X-Shopify-Webhook-Id.
If your provider doesn't send one, you can derive it. Hash the payload body, or combine the event type with a timestamp and resource ID. Not perfect, but workable.
The Naive Implementation (and Why It Breaks)
First instinct is usually something like this:
app.post('/webhooks/stripe', async (req, res) => {
const eventId = req.body.id;
// Check if we've seen this event
const exists = await db.query(
'SELECT 1 FROM processed_events WHERE event_id = $1',
[eventId]
);
if (exists.rows.length > 0) {
return res.status(200).send('Already processed');
}
// Process the event
await handleStripeEvent(req.body);
// Record it
await db.query(
'INSERT INTO processed_events (event_id) VALUES ($1)',
[eventId]
);
res.status(200).send('OK');
});
Looks reasonable. It'll work in development and pass every test you write. Production will eat it alive.
The race condition is obvious once you think about it. Two identical webhooks arrive 50ms apart. Both hit the SELECT. Neither finds a record. Both proceed to process. Both insert. You've handled the event twice, which is the exact thing you were trying to prevent.
Making It Actually Work
You need atomicity. The check and the record need to happen as one operation. In PostgreSQL, INSERT ... ON CONFLICT is your best friend:
app.post('/webhooks/stripe', async (req, res) => {
const eventId = req.body.id;
// Atomic insert-or-skip
const result = await db.query(
`INSERT INTO processed_events (event_id, received_at)
VALUES ($1, NOW())
ON CONFLICT (event_id) DO NOTHING
RETURNING event_id`,
[eventId]
);
// If nothing was returned, we already processed this
if (result.rows.length === 0) {
return res.status(200).send('Duplicate');
}
try {
await handleStripeEvent(req.body);
} catch (err) {
// Roll back so retries can reprocess
await db.query(
'DELETE FROM processed_events WHERE event_id = $1',
[eventId]
);
throw err;
}
res.status(200).send('OK');
});
The ON CONFLICT DO NOTHING clause makes the check-and-insert atomic. No race window. If two requests hit simultaneously, the database handles the contention. One wins, one gets zero rows back.
Notice the catch block. If processing fails, you remove the record so the next retry can try again. Without this, a failed processing attempt permanently blocks that event. Seen that bug in production too.
Redis for Speed, Postgres for Truth
At high volume, hitting Postgres for every single webhook gets expensive. A common pattern is to layer Redis in front as a fast check:
async function isDuplicate(eventId) {
// SET with NX (only set if not exists) and EX (expire after 72h)
const result = await redis.set(
`webhook:seen:${eventId}`,
'1',
'EX', 259200, // 72 hours
'NX'
);
// If result is null, key already existed
return result === null;
}
app.post('/webhooks', async (req, res) => {
const eventId = extractEventId(req);
if (await isDuplicate(eventId)) {
return res.status(200).send('Duplicate');
}
// Process, then persist to Postgres for long-term record
await handleEvent(req.body);
await db.query(
'INSERT INTO processed_events (event_id) VALUES ($1) ON CONFLICT DO NOTHING',
[eventId]
);
res.status(200).send('OK');
});
Redis SET NX is atomic and fast. Sub-millisecond. The 72-hour TTL keeps the keyspace from growing forever; most providers stop retrying well before that window closes.
You still write to Postgres for the permanent record and for cases where Redis restarts and loses its in-memory state. Belt and suspenders.
The Table Schema Matters More Than You Think
Don't just throw event IDs into a table and call it done. Think about what you'll need later:
CREATE TABLE processed_events (
event_id VARCHAR(255) PRIMARY KEY,
source VARCHAR(50) NOT NULL, -- 'stripe', 'github', etc.
event_type VARCHAR(100),
received_at TIMESTAMPTZ DEFAULT NOW(),
processed BOOLEAN DEFAULT FALSE,
payload JSONB
);
-- Partition by month if volume is high
-- Clean up events older than 90 days via pg_cron
The processed boolean flag is useful. Set it to false on insert, true after successful handling. That way you can query for events that were received but never completed, which tells you about silent failures in your processing pipeline.
Partitioning is worth setting up early if you're handling more than a few thousand events per day. Deleting old partitions is dramatically faster than running bulk DELETEs on a single table.
When Hash-Based Keys Go Wrong
Sometimes you can't use the provider's event ID. Maybe you're receiving webhooks from a legacy system that doesn't include one. Maybe you're aggregating events from multiple sources and need your own dedup layer.
Hashing the payload seems obvious. SHA-256 the body, use that as your key. Works until it doesn't.
The problem: some providers include timestamps or sequence numbers in the payload that change on every retry even though it's semantically the same event. Stripe doesn't do this, but I've seen smaller payment processors that regenerate the created_at field on each delivery attempt. Your hash changes every time.
The fix is to hash only the stable fields. Event type plus resource ID plus the meaningful data, skip anything time-dependent. It takes more work but it's the only way to get reliable deduplication from unstable payloads.
Cleanup and Retention
Your processed_events table will grow. Forever. You need a retention policy.
Most webhook providers have a retry window of 24 to 72 hours. After that, they give up. So technically you only need to keep event IDs for that window to prevent duplicates. In practice, keep them for 90 days. It costs almost nothing in storage and gives you an audit trail when debugging issues weeks later.
A simple cron job works fine:
-- Run daily
DELETE FROM processed_events
WHERE received_at < NOW() - INTERVAL '90 days';
Or use table partitioning and drop old partitions. Cleaner at scale.
Testing Idempotency
Write a test that sends the same webhook twice in parallel. Not sequentially. Parallel. Use Promise.all or whatever your test framework gives you for concurrent requests. If your handler processes both, your idempotency is broken, regardless of what the unit tests say.
it('handles concurrent duplicate webhooks', async () => {
const payload = buildWebhookPayload({ id: 'evt_test_123' });
const [res1, res2] = await Promise.all([
sendWebhook(payload),
sendWebhook(payload),
]);
// Both should return 200
expect(res1.status).toBe(200);
expect(res2.status).toBe(200);
// But the side effect should only happen once
const invoices = await db.query(
"SELECT * FROM invoices WHERE event_id = 'evt_test_123'"
);
expect(invoices.rows.length).toBe(1);
});
This is the test that catches the naive implementation. Run it. If it fails, you've got work to do.
What WebhookVault Tracks for You
Monitoring duplicate deliveries manually is tedious. WebhookVault flags duplicate event IDs automatically, tracks retry patterns per provider, and alerts you when duplicate processing rates spike. Because a sudden increase in duplicates usually means something upstream broke, and you want to know about that before your customers do.