High error rates for ChatGPT, APIs, and Sora
Incident Report for OpenAI
Postmortem

Impact

Starting at 10:40am on December 26th, 2024, multiple OpenAI products saw degraded availability. ChatGPT, Sora video creation, and many APIs (agents, realtime speech, batch, DALL-E) saw > 90% error rates during the incident. The text completions API was unaffected. All systems fully recovered by 3:11 PM except ChatGPT, which fully recovered by 6:20 PM.

Root Cause & Remediation

The root cause was a power failure in a cloud provider data center which impacted critical services such as databases in that region for an extended period.

OpenAI’s databases are globally replicated but region-wide failover currently requires manual intervention from the hosting cloud provider. We were able to work with the cloud provider to fail over some databases to other regions but our scale elongated the mitigation time. We kicked off several workstreams to explore workarounds; final recovery only came when the cloud provider fully recovered the region.

Prevention

In the coming weeks, we will embark on a major infrastructure initiative to ensure our systems are resilient to an extended outage in any region of any of our cloud providers by adding a layer of indirection under our control in between our applications and our cloud databases. This will allow significantly faster failover.

We know that extended outages can impact your products, businesses, and lives. We will prioritize the preventative measures outlined above to continue improving our reliability.

Posted Jan 02, 2025 - 15:31 PST

Resolved
Starting at 10:40 AM PST, we experienced high error rates on ChatGPT, Sora, and a subset of APIs. We began to see recovery for Sora at approximately 2:58 PM PST, API traffic starting at approximately 3:05 PM PST, and full recovery for ChatGPT around 8:16 PM PST. We are currently investigating a separate incident regarding Sora and will be updating the status page.

OpenAI will run a full root-cause analysis of this outage and will share details on this page when complete.
Posted Dec 26, 2024 - 20:38 PST
Update
ChatGPT is mostly recovered and we are continuing to work on an overall fix.
Posted Dec 26, 2024 - 18:04 PST
Update
ChatGPT is recovering and we are continuing to work on an overall fix.
Posted Dec 26, 2024 - 17:11 PST
Update
APIs are now operational.
ChatGPT is recovering and we are continuing to work on an overall fix.
Posted Dec 26, 2024 - 16:05 PST
Update
Sora is now fully operational and we are continuing to monitor.
APIs are starting to recover.
We are continuing to work on an overall fix for ChatGPT and APIs.
Posted Dec 26, 2024 - 15:16 PST
Update
ChatGPT is partially recovered while chat history is still not loading.
We are continuing to work on a fix for this issue.
Posted Dec 26, 2024 - 14:05 PST
Update
We are continuing to work on a fix for this issue.
Posted Dec 26, 2024 - 13:05 PST
Update
We are continuing to work on a fix for this issue.
Posted Dec 26, 2024 - 12:06 PST
Identified
This issue is caused by an upstream provider and we are currently monitoring.
Posted Dec 26, 2024 - 11:18 PST
Investigating
We are currently experiencing an issue with high error rates on ChatGPT, the API, and Sora. We are currently investigating and will post an update as soon as we are able.
Posted Dec 26, 2024 - 11:00 PST
This incident affected: API, ChatGPT, and Sora.