Skip to content

Resources · Status

Magic-link emails delivery delays

Started 2026-04-12 08:14 NZST · Resolved 09:42 NZST · Total 1h 28m · Affected: Resend API

ResolvedModerate1h 28mResend API

What happened

Magic-link email delivery was delayed by up to 90 minutes for 1h 28m.

Our primary email provider (Resend) had a regional outage. New magic-link requests queued safely rather than failing — no emails were lost. Failover to our secondary SMTP relay completed in 24 minutes; the remaining ~480 emails delivered over the next hour. Tenants with password fallback configured saw zero failed logins.

Impact

Who was affected, and how much.

Emails delayed
480
all eventually delivered
Median delay
32 min
p95: 88 min
Logins affected
~210
of 2,140 in the window
Failed logins
0
zero auth failures

Services affected

  • Magic-link emails
    Resend primary endpoint degraded; failed over to secondary relay
    delayed
  • Visitor invite emails
    Same outbound queue; delays in line with magic-link
    delayed
  • Web app
    No customer-visible degradation
    operational
  • Booking engine
    Bookings + invoicing unaffected
    operational
  • API
    No elevated error rates observed
    operational
  • Stripe Connect
    Payment capture and webhook delivery unaffected
    operational

Customer impact

~210 logins delayed while their magic-link arrived. Tenants with password fallback saw zero failed logins. We have not issued SLA credits because the calendar-month SLA was not breached.

Read our SLA policy

Timeline

What we did, in order.

Every public update we posted during the incident, plus the internal pager that opened it. Times are 2026-04-12 NZST (UTC+12).

6 updates
  1. 09:42 NZST
    Resolved

    Incident resolved · queue drained to baseline

    Resend reports their regional outage is fully resolved. Our outbound email queue has drained to within baseline (~6 seconds median age). The remaining queued magic-link emails have all been delivered. We will publish a full post-mortem within 7 days; the root cause and action items below are the preliminary summary.

  2. 09:18 NZST
    Monitoring

    Backlog draining · 80% caught up

    Failover to the secondary SMTP relay is processing queued magic-link emails at full throughput. New magic-link requests are now delivered in <8 seconds median. Approximately 95 emails still queued from the affected window; expected to clear within 10 minutes.

  3. 08:58 NZST
    Identified

    Cause identified · Resend regional outage; failing over

    Resend has acknowledged a regional outage affecting their ap-southeast-2 endpoint. Status link: status.resend.com/incidents/01HZ-…. We are routing outbound to our secondary SMTP relay. New magic-link requests will queue but be delivered as the failover completes (estimated 10 minutes for the queued backlog).

  4. 08:34 NZST
    Investigating

    Investigating · Resend API returning 503s

    Resend is returning sustained 503 errors on our primary endpoint. Our retry-with-backoff is honouring the responses and not double-billing the budget. Magic-link emails are queueing rather than being lost. We are checking whether failover to the secondary relay is warranted.

  5. 08:22 NZST
    Investigating

    Investigating · elevated magic-link email delivery latency

    We are seeing p95 magic-link email delivery time climb above 6 minutes (normal: <30 seconds). The Resend API is responding slower than usual but not yet erroring. We have paged the email-platform on-call (Sara) and are diagnosing.

  6. 08:14 NZST
    Update

    Monitoring alert fired

    Sentry alert magic-link-delivery-sla-breach triggered. Pager-Duty paged the email-platform on-call (Sara). Initial triage in progress.

Root cause · post-mortem

Why this happened, in three paragraphs.

1. The trigger

At 08:14 NZST, Resend's ap-southeast-2 region began returning sustained 503 errors. This was a regional outage at our primary email provider — confirmed on status.resend.com — and not caused by anything we did. Our infrastructure did not cause the delay; we were a downstream consumer.

2. Why the impact wasn't worse

Our outbound email pipeline retries with backoff and queues on persistent failure rather than dropping. Once we triggered failover to the secondary SMTP relay (Postmark), the queued backlog drained. Tenants with password fallback configured had a non-magic-link path through auth. The honest critique: failover took 24 minutes from the first 503 because the threshold was set at 5 minutes of consecutive errors.

3. What we're changing

The failover threshold is dropped from 5 minutes to 90 seconds of consecutive 5xx (shipped). The login page now shows a banner when delivery SLA is breached so members understand the delay (shipped). Secondary relay capacity doubled to handle a primary-down scenario at full load (shipped). And we're moving the email-platform DR drill from quarterly to monthly.

Action items

Concrete changes, with owners and dates.

Three of these are already done. The other two are on the engineering calendar and will be linked back here as they ship.

5 items · 3 done
  • Automatic failover threshold lowered from 5 minutes to 90 seconds of consecutive 5xx
    Owner · Email platform / SaraTarget · 2026-04-13
    Done
  • Status banner on the login page when magic-link delivery SLA is breached
    Owner · Frontend / PriyaTarget · 2026-04-20
    Done
  • Secondary SMTP relay capacity doubled to handle primary-down scenarios at full load
    Owner · Platform / TheoTarget · 2026-04-30
    Done
  • Email-platform DR drill added to monthly schedule (was quarterly)
    Owner · Platform / AnyaTarget · 2026-05-15
    In progress
  • Investigate password-reset and 2FA delivery alternatives during email outages
    Owner · Security / MarcusTarget · 2026-06-10
    Scheduled