Resources · Status
Stripe webhook delays
Started 2026-05-14 09:42 NZST · Resolved 11:18 NZST · Total 1h 36m · Affected: webhooks (delayed)
What happened
Webhook delivery from Stripe Connect was delayed by up to 14 minutes for 1h 36m.
Our four-layer payment confirmation pipeline absorbed the delays. Bookings were still confirmed via the success-page hook and our internal scheduler — what slipped was the confirmation email arrival time, by 4–14 minutes for a subset of paid checkouts. Zero double-charges. Zero data loss.
Impact
Who was affected, and how much.
Services affected
- Stripe Connect webhooksLayer 3 of payment pipeline — delayed up to 14 min
- Booking confirmation emailsTriggered downstream of webhook receipt
- Web appNo customer-visible degradation
- Booking engineSlot reservation + success-page hook unaffected
- APINo elevated error rates observed
Customer impact
362 paid bookings received their confirmation email 4–14 minutes late. No bookings were missed. No double-charges. We have not issued any SLA credits because we did not breach the 99.95% SLA on this calendar month.
Read our SLA policyTimeline
What we did, in order.
Every public update we posted during the incident, plus the internal pager that opened it. Times are 2026-05-14 NZST (UTC+12).
- 11:18 NZST
Incident resolved · webhook backlog cleared
All queued webhook events were delivered. Booking confirmation emails for paid checkouts caught up within 4 minutes of resolution. Synthetic webhook replay confirms p95 webhook age is back inside our SLA. We will publish a full post-mortem within 7 days; the root cause and action items below are the preliminary summary.
- 11:02 NZST
Backlog draining · 92% caught up
Stripe restored normal delivery rates upstream. Our scheduler self-heal pipeline (Layer 4 of payment confirmation) is processing the backlog faster than new events are arriving. Estimated full catch-up: 15 minutes. No member-facing degradation reported in this interval.
- 10:24 NZST
Cause identified · upstream Stripe delivery delays
We have confirmed with Stripe support that the delays are originating on their Connect webhook delivery pipeline. Status page link: status.stripe.com/incidents/0190fa-…. Our four-layer payment confirmation pipeline is doing exactly what it was designed for — paid bookings are still being confirmed via the success-page hook and the platform webhook (Layer 1 and Layer 2), so the impact is limited to confirmation email timing.
- 09:58 NZST
Investigating · 4-layer payment pipeline absorbing impact
Our scheduler self-heal pipeline (the hourly cron that reconciles any missed webhooks) is being shortened to a 5-minute cadence for the duration of this incident. We have paged Stripe partner on-call. No data loss is possible — the booking row is created before the Stripe checkout begins, and idempotency on payment_intent_id is enforced at every layer.
- 09:48 NZST
Investigating · increased webhook latency
We are seeing p95 webhook age climb above 4 minutes (normal: < 30 seconds). Stripe Connect webhook delivery is delayed across our fleet. We are checking whether this is platform-wide or scoped to a subset of accounts.
- 09:42 NZST
Monitoring alert fired
Sentry alert webhook-age-sla-breach triggered. Pager-Duty paged the platform on-call (Anya). Initial triage in progress.
Root cause · post-mortem
Why this happened, in three paragraphs.
1. The trigger
At 09:42 NZST, Stripe's Connect webhook delivery pipeline began queuing events instead of delivering them in real time. This was a platform-wide issue inside Stripe — confirmed on status.stripe.com and via partner support. Our infrastructure did not cause the delay; we were a downstream consumer.
2. Why the impact was limited
LiteHQ confirms paid bookings via four independent layers — success-page hook, platform webhook, Connect webhook, and an hourly scheduler self-heal. With Layer 3 (Connect) delayed, Layers 1 and 2 still confirmed the bookings end-to-end. The only thing that depended on Layer 3 reaching us in real time was confirmation email timing, which arrived 4–14 minutes late for the affected 362 bookings.
3. What we're changing
We tightened the scheduler self-heal cadence from hourly to 15 minutes by default (already shipped). We added a synthetic webhook replay to our weekly DR drill so we exercise this code path even when upstream is healthy. We are also building an operator-facing inline banner that surfaces when upstream Stripe is degraded — currently operators relied on us watching the partner status page.
Action items
Concrete changes, with owners and dates.
Two of these are already done. The other three are on the engineering calendar and will be linked back here as they ship.
- Synthetic webhook replay added to weekly DR drillOwner · Platform / AnyaTarget · 2026-05-25
- Scheduler self-heal cadence shortened from hourly to 15-min by defaultOwner · Platform / TheoTarget · 2026-05-21
- Public Stripe-status sub-status page on litehq.com/status (auto-mirrored)Owner · Web / SaraTarget · 2026-06-04
- Add operator-facing inline banner when upstream Stripe is degradedOwner · Frontend / PriyaTarget · 2026-06-10
- Document the 4-layer confirmation pipeline in CUSTOMER-FAQ.mdOwner · Docs / MarcusTarget · 2026-06-15