Resources · Status

Magic-link emails delivery delays

Started 2026-04-12 08:14 NZST · Resolved 09:42 NZST · Total 1h 28m · Affected: Resend API

ResolvedModerate1h 28mResend API

What happened

Magic-link email delivery was delayed by up to 90 minutes for 1h 28m.

Our primary email provider (Resend) had a regional outage. New magic-link requests queued safely rather than failing — no emails were lost. Failover to our secondary SMTP relay completed in 24 minutes; the remaining ~480 emails delivered over the next hour. Tenants with password fallback configured saw zero failed logins.

Jump to timeline Root cause Action items

Impact

Who was affected, and how much.

Emails delayed

480

all eventually delivered

Median delay

32 min

p95: 88 min

Logins affected

~210

of 2,140 in the window

Failed logins

zero auth failures

Services affected

Magic-link emails
Resend primary endpoint degraded; failed over to secondary relay
delayed
Visitor invite emails
Same outbound queue; delays in line with magic-link
delayed
Web app
No customer-visible degradation
operational
Booking engine
Bookings + invoicing unaffected
operational
API
No elevated error rates observed
operational
Stripe Connect
Payment capture and webhook delivery unaffected
operational

Customer impact

~210 logins delayed while their magic-link arrived. Tenants with password fallback saw zero failed logins. We have not issued SLA credits because the calendar-month SLA was not breached.

Read our SLA policy

Timeline

What we did, in order.

Every public update we posted during the incident, plus the internal pager that opened it. Times are 2026-04-12 NZST (UTC+12).

6 updates

09:42 NZST
Resolved
Incident resolved · queue drained to baseline
Resend reports their regional outage is fully resolved. Our outbound email queue has drained to within baseline (~6 seconds median age). The remaining queued magic-link emails have all been delivered. We will publish a full post-mortem within 7 days; the root cause and action items below are the preliminary summary.
09:18 NZST
Monitoring
Backlog draining · 80% caught up
Failover to the secondary SMTP relay is processing queued magic-link emails at full throughput. New magic-link requests are now delivered in <8 seconds median. Approximately 95 emails still queued from the affected window; expected to clear within 10 minutes.
08:58 NZST
Identified
Cause identified · Resend regional outage; failing over
Resend has acknowledged a regional outage affecting their ap-southeast-2 endpoint. Status link: status.resend.com/incidents/01HZ-…. We are routing outbound to our secondary SMTP relay. New magic-link requests will queue but be delivered as the failover completes (estimated 10 minutes for the queued backlog).
08:34 NZST
Investigating
Investigating · Resend API returning 503s
Resend is returning sustained 503 errors on our primary endpoint. Our retry-with-backoff is honouring the responses and not double-billing the budget. Magic-link emails are queueing rather than being lost. We are checking whether failover to the secondary relay is warranted.
08:22 NZST
Investigating
Investigating · elevated magic-link email delivery latency
We are seeing p95 magic-link email delivery time climb above 6 minutes (normal: <30 seconds). The Resend API is responding slower than usual but not yet erroring. We have paged the email-platform on-call (Sara) and are diagnosing.
08:14 NZST
Update
Monitoring alert fired
Sentry alert magic-link-delivery-sla-breach triggered. Pager-Duty paged the email-platform on-call (Sara). Initial triage in progress.

Root cause · post-mortem

Why this happened, in three paragraphs.

1. The trigger

At 08:14 NZST, Resend's ap-southeast-2 region began returning sustained 503 errors. This was a regional outage at our primary email provider — confirmed on status.resend.com — and not caused by anything we did. Our infrastructure did not cause the delay; we were a downstream consumer.

2. Why the impact wasn't worse

Our outbound email pipeline retries with backoff and queues on persistent failure rather than dropping. Once we triggered failover to the secondary SMTP relay (Postmark), the queued backlog drained. Tenants with password fallback configured had a non-magic-link path through auth. The honest critique: failover took 24 minutes from the first 503 because the threshold was set at 5 minutes of consecutive errors.

3. What we're changing

The failover threshold is dropped from 5 minutes to 90 seconds of consecutive 5xx (shipped). The login page now shows a banner when delivery SLA is breached so members understand the delay (shipped). Secondary relay capacity doubled to handle a primary-down scenario at full load (shipped). And we're moving the email-platform DR drill from quarterly to monthly.

Action items

Concrete changes, with owners and dates.

Three of these are already done. The other two are on the engineering calendar and will be linked back here as they ship.

5 items · 3 done

Automatic failover threshold lowered from 5 minutes to 90 seconds of consecutive 5xx
Owner · Email platform / SaraTarget · 2026-04-13
Done
Status banner on the login page when magic-link delivery SLA is breached
Owner · Frontend / PriyaTarget · 2026-04-20
Done
Secondary SMTP relay capacity doubled to handle primary-down scenarios at full load
Owner · Platform / TheoTarget · 2026-04-30
Done
Email-platform DR drill added to monthly schedule (was quarterly)
Owner · Platform / AnyaTarget · 2026-05-15
In progress
Investigate password-reset and 2FA delivery alternatives during email outages
Owner · Security / MarcusTarget · 2026-06-10
Scheduled

Magic-link email delivery was delayed by up to 90 minutes for 1h 28m.

Who was affected, and how much.

Services affected

What we did, in order.

Incident resolved · queue drained to baseline

Backlog draining · 80% caught up

Cause identified · Resend regional outage; failing over

Investigating · Resend API returning 503s

Investigating · elevated magic-link email delivery latency

Monitoring alert fired

Why this happened, in three paragraphs.

1. The trigger

2. Why the impact wasn't worse

3. What we're changing

Concrete changes, with owners and dates.