Issue with database connections
Resolved
Oct 01 at 02:30am HDT
Incident Report: US Database Cluster Outage
Summary
At approximately 04:10 AEST on 1st of October 2025, Relevance experienced a platform outage when our US database cluster entered a failure state. While the cluster appeared online, it was unable to process operations correctly, which caused the global API to become unavailable by 04:30 AEST.
On-call team was automatically paged. They began investigating and identified the issue was in our database infrastructure rather than our application itself. Initial remediation steps did not restore service, and we proceeded with a full database cluster reboot and redeploy of dependent services. This restored the platform to a healthy state by 07:00 AEST.
Timeline (AEST)
- ~04:10 – US database cluster entered failure state.
- ~04:17 – On-call engineer paged and began investigating.
- ~04:30 – Global API became unavailable for users in all three regions.
- ~06:00 – AU and EU regions stabilised after user migration.
- ~07:00 – US region connections terminated, services redeployed, platform restored.
Impact
- Platform outage lasted for ~3 hours.
- AU and EU users recovered by ~06:00 AEST; US fully restored at ~07:00 AEST.
Next Steps / Mitigation
- We have implemented immediate changes to reduce the likelihood of similar failures.
- Additional redundancy has been provisioned in our database cluster.
- Monitoring has been expanded to detect conditions that may lead to this type of failure earlier.
- Further hardening of the system is planned to improve resilience against provider-level issues.
Affected services
Updated
Sep 30 at 02:25pm HDT
All services are now fully operational across all regions. The incident has been resolved, and our team will share a detailed postmortem once it is ready. Thank you for your patience and understanding during this disruption.
Affected services
Updated
Sep 30 at 12:31pm HDT
We’re still investigating the database connection issue. The global API is currently available in the AU and EU regions, while the US region continues to be affected. Our team is actively working on restoring full service in the US and we’ll provide another update soon.
Affected services
Created
Sep 30 at 09:15am HDT
We’re investigating an issue with our database connections that is preventing jobs and triggers from synchronising. Our team is actively working on identifying the cause and restoring normal service. We’ll share another update shortly.
Affected services