Write-up published
Resolved
Starting at 10:40am on December 26th, 2024, multiple OpenAI products saw degraded availability. ChatGPT, Sora video creation, and many APIs \(agents, realtime speech, batch, DALL-E\) saw > 90% error rates during the incident. The text completions API was unaffected. All systems fully recovered by 3:11 PM except ChatGPT, which fully recovered by 6:20 PM.
The root cause was a power failure in a cloud provider data center which impacted critical services such as databases in that region for an extended period.
OpenAI’s databases are globally replicated but region-wide failover currently requires manual intervention from the hosting cloud provider. We were able to work with the cloud provider to fail over some databases to other regions but our scale elongated the mitigation time. We kicked off several workstreams to explore workarounds; final recovery only came when the cloud provider fully recovered the region.
In the coming weeks, we will embark on a major infrastructure initiative to ensure our systems are resilient to an extended outage in any region of any of our cloud providers by adding a layer of indirection under our control in between our applications and our cloud databases. This will allow significantly faster failover.
We know that extended outages can impact your products, businesses, and lives. We will prioritize the preventative measures outlined above to continue improving our reliability.
Resolved
Starting at 10:40 AM PST, we experienced high error rates on ChatGPT, Sora, and a subset of APIs. We began to see recovery for Sora at approximately 2:58 PM PST, API traffic starting at approximately 3:05 PM PST, and full recovery for ChatGPT around 8:16 PM PST. We are currently investigating a separate incident regarding Sora and will be updating the status page.
OpenAI will run a full root-cause analysis of this outage and will share details on this page when complete.
Identified
ChatGPT is mostly recovered and we are continuing to work on an overall fix.
Identified
ChatGPT is recovering and we are continuing to work on an overall fix.
Identified
APIs are now operational.ChatGPT is recovering and we are continuing to work on an overall fix.
Identified
Sora is now fully operational and we are continuing to monitor.APIs are starting to recover.We are continuing to work on an overall fix for ChatGPT and APIs.
Identified
ChatGPT is partially recovered while chat history is still not loading.We are continuing to work on a fix for this issue.
Identified
We are continuing to work on a fix for this issue.
Identified
We are continuing to work on a fix for this issue.
Identified
This issue is caused by an upstream provider and we are currently monitoring.
Investigating
We are currently experiencing an issue with high error rates on ChatGPT, the API, and Sora. We are currently investigating and will post an update as soon as we are able.