Resources · Status
Xero invoice sync delays
Started 2026-04-30 14:22 NZST · Resolved 16:08 NZST · Total 1h 46m · Affected: Xero sync (delayed, no data loss)
What happened
Xero invoice sync was delayed by up to 60 minutes for 1h 46m. No data loss.
A reference-back-fill job ran concurrently with the normal hourly sync, and together they exceeded Xero's per-organisation rate-limit. Local invoice state was accurate the whole time; what lagged was the visible-on-Xero state. All 2,400 in-flight invoices eventually synced, with idempotency enforced on the invoice id.
Impact
Who was affected, and how much.
Services affected
- Xero synccron-xero-sync honouring 429s, backlog growing
- Web appNo customer-visible degradation
- Booking engineInvoices created and locally accurate throughout
- Stripe ConnectPayment capture and webhook delivery unaffected
- APINo elevated error rates observed
Customer impact
Zero member-visible impact. The delay was operator-visible only — Xero showed stale invoice state for up to an hour for 12 workspaces. We have not issued any SLA credits because this did not breach the 99.95% SLA on this calendar month.
Read our SLA policyTimeline
What we did, in order.
Every public update we posted during the incident, plus the internal pager that opened it. Times are 2026-04-30 NZST (UTC+12).
- 16:08 NZST
Incident resolved · Xero sync caught up
All 2,400 in-flight invoices have synced to Xero. The back-fill job that triggered the rate-limit has been re-throttled to half its previous batch size and scheduled overnight. We will publish a full post-mortem within 7 days; the root cause and action items below are the preliminary summary.
- 15:38 NZST
Backlog clearing · ~70% caught up
Xero is accepting requests at normal throughput. Our cron-xero-sync function is draining the backlog and is currently processing approximately 480 invoices per 5-minute window. Estimated full catch-up: 30 minutes. No data loss possible — sync direction for in-flight invoices is local→Xero with idempotency on the invoice id.
- 15:02 NZST
Cause identified · rate-limit triggered by overlapping back-fill
A KARO-440 reference back-fill job (rewriting legacy LiteHQ #N references to the canonical LiteHQ.com #N — <user-ref> shape) ran concurrently with the normal hourly sync. Together they exceeded the Xero per-organisation rate-limit of 5,000 calls/day. Xero began returning 429s with Retry-After headers; our cron honoured them but the backlog grew faster than the budget could replenish.
- 14:38 NZST
Investigating · Xero 429 rate-limit responses elevated
We are seeing a sustained spike in Xero 429 responses across 12 operator workspaces. Hourly cron-xero-sync is honouring Retry-After headers and not double-billing the rate-limit budget. Invoices are queued safely — they will sync once the budget recovers — but the visible-on-Xero state is lagging local state.
- 14:22 NZST
Monitoring alert fired
Sentry alert xero-sync-lag-sla-breach triggered. On-call paged (Theo). Sync lag is currently ~12 minutes against an SLA of <5 minutes; we are diagnosing.
Root cause · post-mortem
Why this happened, in three paragraphs.
1. The trigger
A KARO-440 reference back-fill job (rewriting legacy LiteHQ #N references to the canonical LiteHQ.com #N shape) was queued to run during the day. It ran concurrently with the normal hourly cron-xero-sync, and the two together pushed us over the Xero per-organisation rate-limit of 5,000 calls/day. Xero began returning 429 responses with Retry-After headers.
2. Why no data was lost
cron-xero-sync honoured Retry-After headers correctly and never double-billed the rate-limit budget. Local invoice state remained the source of truth. Every queued invoice was idempotent on its invoice id, so when the budget recovered, the cron drained the queue without duplicates. The sync direction for in-flight invoices is local → Xero only; Xero never overwrote a local row mid-incident.
3. What we're changing
Back-fill jobs now run overnight only and are explicitly mutex'd against the hourly cron (already shipped). We are adding a per-tenant Xero rate-limit counter so we don't have to infer remaining budget from 429s (in progress). And we are building an operator-facing inline banner that fires when Xero sync lag breaches SLA — currently operators relied on us catching it on the Sentry dashboard.
Action items
Concrete changes, with owners and dates.
Two of these are already done. The other two are on the engineering calendar and will be linked back here as they ship.
- Back-fill jobs now run overnight only, never overlapping the hourly cronOwner · Platform / TheoTarget · 2026-04-30
- Per-tenant Xero rate-limit budget tracked in a Postgres counter (not just inferred from 429s)Owner · Platform / AnyaTarget · 2026-05-07
- Operator-facing banner when Xero sync lag breaches SLAOwner · Frontend / PriyaTarget · 2026-05-18
- Rate-limit-aware retry scheduler with circuit breaker on sustained 429sOwner · Platform / TheoTarget · 2026-05-30