Skip to content

Resources · Status

Calendar booking creation errors

Started 2026-03-05 11:30 NZST · Resolved 12:15 NZST · Total 45m · Affected: new booking creation

ResolvedMajor45mBooking engine

What happened

New booking creation failed for ~340 attempts over 45 minutes due to a race in the slot-reservation pipeline.

Booking rows are created at pending_payment before Stripe checkout begins (the slot-reservation model — slot is held while the user is in checkout). The uniqueness contract on (resource_id, slot_window) was enforced at the application layer, not the database. Two concurrent requests for the same slot raced, both inserted, and the second checkout confirmation 5xx'd. No member was charged for a failed booking — the 4-layer payment confirmation pipeline's idempotency on payment_intent_id held throughout.

Impact

Who was affected, and how much.

Failed attempts
~340
POST /api/bookings 5xx
Workspaces affected
8
of 88 active that morning
Median 5xx window
45 min
11:30 — 12:15 NZST
Double-charges
0
payment idempotency held

Services affected

  • Booking engine
    New booking creation 5xx rate elevated 11:30 — 12:15
    degraded
  • Booking-manager edge fn
    Race condition on concurrent slot-reservation inserts
    degraded
  • Web app
    Other routes unaffected; only the booking POST failed
    operational
  • Stripe Connect
    Payment capture + idempotency held throughout
    operational
  • Magic-link emails
    Auth + email delivery unaffected
    operational
  • API
    Read endpoints (list, search, member portal) at baseline
    operational

Customer impact

~340 members saw a 5xx when trying to book. All have been emailed an apology with a one-click rebook link and a 10% credit applied. We breached SLA for this calendar month on the booking-creation surface; affected operators have received SLA credits.

Read our SLA policy

Timeline

What we did, in order.

Every public update we posted during the incident, plus the internal pager that opened it. Times are 2026-03-05 NZST (UTC+13). Updates are listed in chronological order at the top, plus the resolution at the top of the page.

7 updates
  1. 12:15 NZST
    Resolved

    Incident resolved · booking creation restored

    All members who attempted to book during the incident window have been emailed with an apology and a one-click rebook link with a 10% credit applied. We have rolled the booking-manager edge function forward to a patched version that holds an advisory lock on (resource_id, slot_window) for the duration of the slot-reservation insert, eliminating the race. We will publish a full post-mortem within 7 days; the root cause and action items below are the preliminary summary.

  2. 12:02 NZST
    Monitoring

    Patch deployed · monitoring for regressions

    The patched booking-manager edge function is live across all regions. New booking attempts are succeeding at baseline rates. No 5xx responses observed since 11:58. We are leaving the page open for 15 minutes to confirm there is no regression on the next traffic peak.

  3. 11:47 NZST
    Identified

    Cause identified · race in slot-reservation pipeline

    Booking rows are created at pending_payment state before the Stripe checkout begins (this is by design — it holds the slot while the user is in Stripe). Two concurrent requests for the same slot were both passing the slot-availability check and both succeeding at the insert. The slot-reservation contract was only enforced by an application-level uniqueness check, not by a database constraint. Two members ended up with overlapping pending_payment rows for the same room+window; the second checkout completion attempted to confirm a slot already paid for, which surfaced as a 5xx to the second member.

  4. 11:42 NZST
    Identified

    Workaround in place · single-flight on slot reservation

    We have shipped a hotfix that serialises slot-reservation inserts via a Postgres advisory lock on (resource_id, slot_window). New booking attempts are now succeeding. We are leaving the previous logic in the codebase under a feature flag while we land the proper fix (database-level exclusion constraint).

  5. 11:51 NZST
    Investigating

    Investigating · scoping the affected window

    We have identified ~340 failed booking attempts during the window. We are correlating each against payment intent status to determine whether any member was charged for a failed booking (none expected — the row is created before the Stripe charge, but we want to verify). The 4-layer payment confirmation pipeline ensures idempotency on payment_intent_id, so any double-charge would already have been refused.

  6. 11:36 NZST
    Investigating

    Investigating · booking creation 5xx spike

    We are seeing a sustained spike in 5xx responses from the booking-manager edge function. Affected route: POST /api/bookings (and the slug-route equivalents). Other endpoints unaffected. We have paged the booking-platform on-call (Anya) and are diagnosing.

  7. 11:30 NZST
    Update

    Monitoring alert fired

    Sentry alert booking-creation-error-rate triggered. Pager-Duty paged the booking-platform on-call (Anya). The trigger is sustained >2% 5xx on POST /api/bookings; current rate is ~7%.

Root cause · post-mortem

Why this happened, in three paragraphs.

1. The trigger

Our booking model is slot-reservation: the booking row is created at pending_payment before Stripe checkout begins, so the slot is held while the member is in checkout. The uniqueness contract on (resource_id, slot_window) was enforced at the application layer with an availability probe before the insert. Under load — a popular room releasing simultaneously across multiple members' calendars — two concurrent requests both passed the probe and both inserted.

2. Why no double-charges happened

The 4-layer payment confirmation pipeline (success-page hook, platform webhook, Connect webhook, scheduler self-heal) enforces idempotency on payment_intent_id. When the second member completed checkout, the verify_payment action saw a row that was already paid for and returned a structured 409 conflict to the booking-manager — which surfaced as a 5xx to the second member but never committed a duplicate charge to Stripe. Our retros confirmed zero double-charges across all 340 affected bookings.

3. What we're changing

The race is fixed two ways: (1) an immediate hotfix using a Postgres advisory lock on (resource_id, slot_window) for the slot-reservation insert; (2) a permanent fix one week later — an exclusion constraint on bookings(resource_id, tstzrange(start_at, end_at)) partial-indexed where status is pending_payment or confirmed. Load testing now covers concurrent slot-reservation attempts on the same slot at 25 RPS, and the booking-creation 5xx alert is tightened from 2% to 1% with a 60-second window.

Action items

Concrete changes, with owners and dates.

Four of these are already done. The other two are on the engineering calendar and will be linked back here as they ship.

6 items · 4 done
  • Postgres advisory lock on (resource_id, slot_window) for slot-reservation inserts
    Owner · Booking platform / AnyaTarget · 2026-03-05
    Done
  • Database-level exclusion constraint on bookings(resource_id, tstzrange(start_at, end_at)) WHERE status IN ('confirmed','pending_payment')
    Owner · Booking platform / TheoTarget · 2026-03-12
    Done
  • All 340 affected members emailed an apology with one-click rebook + 10% credit
    Owner · Customer success / MarcusTarget · 2026-03-05
    Done
  • Load test now covers concurrent slot-reservation attempts on the same slot at 25 RPS
    Owner · QA / SaraTarget · 2026-03-19
    Done
  • Runbook: how to identify and refund any charged-but-failed bookings in a future regression
    Owner · Booking platform / AnyaTarget · 2026-04-02
    In progress
  • Sentry alert for booking-creation 5xx rate tightened to fire at 1% (was 2%) with 60s window
    Owner · Platform / PriyaTarget · 2026-04-15
    Scheduled