Resources · Status
Magic-link emails delivery delays
Started 2026-04-12 08:14 NZST · Resolved 09:42 NZST · Total 1h 28m · Affected: Resend API
What happened
Magic-link email delivery was delayed by up to 90 minutes for 1h 28m.
Our primary email provider (Resend) had a regional outage. New magic-link requests queued safely rather than failing — no emails were lost. Failover to our secondary SMTP relay completed in 24 minutes; the remaining ~480 emails delivered over the next hour. Tenants with password fallback configured saw zero failed logins.
Impact
Who was affected, and how much.
Services affected
- Magic-link emailsResend primary endpoint degraded; failed over to secondary relay
- Visitor invite emailsSame outbound queue; delays in line with magic-link
- Web appNo customer-visible degradation
- Booking engineBookings + invoicing unaffected
- APINo elevated error rates observed
- Stripe ConnectPayment capture and webhook delivery unaffected
Customer impact
~210 logins delayed while their magic-link arrived. Tenants with password fallback saw zero failed logins. We have not issued SLA credits because the calendar-month SLA was not breached.
Read our SLA policyTimeline
What we did, in order.
Every public update we posted during the incident, plus the internal pager that opened it. Times are 2026-04-12 NZST (UTC+12).
- 09:42 NZST
Incident resolved · queue drained to baseline
Resend reports their regional outage is fully resolved. Our outbound email queue has drained to within baseline (~6 seconds median age). The remaining queued magic-link emails have all been delivered. We will publish a full post-mortem within 7 days; the root cause and action items below are the preliminary summary.
- 09:18 NZST
Backlog draining · 80% caught up
Failover to the secondary SMTP relay is processing queued magic-link emails at full throughput. New magic-link requests are now delivered in <8 seconds median. Approximately 95 emails still queued from the affected window; expected to clear within 10 minutes.
- 08:58 NZST
Cause identified · Resend regional outage; failing over
Resend has acknowledged a regional outage affecting their ap-southeast-2 endpoint. Status link: status.resend.com/incidents/01HZ-…. We are routing outbound to our secondary SMTP relay. New magic-link requests will queue but be delivered as the failover completes (estimated 10 minutes for the queued backlog).
- 08:34 NZST
Investigating · Resend API returning 503s
Resend is returning sustained 503 errors on our primary endpoint. Our retry-with-backoff is honouring the responses and not double-billing the budget. Magic-link emails are queueing rather than being lost. We are checking whether failover to the secondary relay is warranted.
- 08:22 NZST
Investigating · elevated magic-link email delivery latency
We are seeing p95 magic-link email delivery time climb above 6 minutes (normal: <30 seconds). The Resend API is responding slower than usual but not yet erroring. We have paged the email-platform on-call (Sara) and are diagnosing.
- 08:14 NZST
Monitoring alert fired
Sentry alert magic-link-delivery-sla-breach triggered. Pager-Duty paged the email-platform on-call (Sara). Initial triage in progress.
Root cause · post-mortem
Why this happened, in three paragraphs.
1. The trigger
At 08:14 NZST, Resend's ap-southeast-2 region began returning sustained 503 errors. This was a regional outage at our primary email provider — confirmed on status.resend.com — and not caused by anything we did. Our infrastructure did not cause the delay; we were a downstream consumer.
2. Why the impact wasn't worse
Our outbound email pipeline retries with backoff and queues on persistent failure rather than dropping. Once we triggered failover to the secondary SMTP relay (Postmark), the queued backlog drained. Tenants with password fallback configured had a non-magic-link path through auth. The honest critique: failover took 24 minutes from the first 503 because the threshold was set at 5 minutes of consecutive errors.
3. What we're changing
The failover threshold is dropped from 5 minutes to 90 seconds of consecutive 5xx (shipped). The login page now shows a banner when delivery SLA is breached so members understand the delay (shipped). Secondary relay capacity doubled to handle a primary-down scenario at full load (shipped). And we're moving the email-platform DR drill from quarterly to monthly.
Action items
Concrete changes, with owners and dates.
Three of these are already done. The other two are on the engineering calendar and will be linked back here as they ship.
- Automatic failover threshold lowered from 5 minutes to 90 seconds of consecutive 5xxOwner · Email platform / SaraTarget · 2026-04-13
- Status banner on the login page when magic-link delivery SLA is breachedOwner · Frontend / PriyaTarget · 2026-04-20
- Secondary SMTP relay capacity doubled to handle primary-down scenarios at full loadOwner · Platform / TheoTarget · 2026-04-30
- Email-platform DR drill added to monthly schedule (was quarterly)Owner · Platform / AnyaTarget · 2026-05-15
- Investigate password-reset and 2FA delivery alternatives during email outagesOwner · Security / MarcusTarget · 2026-06-10