High error rates for ChatGPT, APIs, and Sora

Resolved·Full outage

We’ve published a write-up of this incidentRead the write-up

Read it here

Affected components

Dec 26, 2024, 07:00 PM

Dec 27, 2024, 04:38 AM

Updates

Write-up published

Read it here

Resolved

Impact

Starting at 10:40am on December 26th, 2024, multiple OpenAI products saw degraded availability. ChatGPT, Sora video creation, and many APIs \(agents, realtime speech, batch, DALL-E\) saw > 90% error rates during the incident. The text completions API was unaffected. All systems fully recovered by 3:11 PM except ChatGPT, which fully recovered by 6:20 PM.

Root Cause & Remediation

The root cause was a power failure in a cloud provider data center which impacted critical services such as databases in that region for an extended period.

OpenAI’s databases are globally replicated but region-wide failover currently requires manual intervention from the hosting cloud provider. We were able to work with the cloud provider to fail over some databases to other regions but our scale elongated the mitigation time. We kicked off several workstreams to explore workarounds; final recovery only came when the cloud provider fully recovered the region.

Prevention

In the coming weeks, we will embark on a major infrastructure initiative to ensure our systems are resilient to an extended outage in any region of any of our cloud providers by adding a layer of indirection under our control in between our applications and our cloud databases. This will allow significantly faster failover.

We know that extended outages can impact your products, businesses, and lives. We will prioritize the preventative measures outlined above to continue improving our reliability.

Thu, Jan 2, 2025, 11:28 PM

Resolved

Starting at 10:40 AM PST, we experienced high error rates on ChatGPT, Sora, and a subset of APIs. We began to see recovery for Sora at approximately 2:58 PM PST, API traffic starting at approximately 3:05 PM PST, and full recovery for ChatGPT around 8:16 PM PST. We are currently investigating a separate incident regarding Sora and will be updating the status page.

OpenAI will run a full root-cause analysis of this outage and will share details on this page when complete.

Fri, Dec 27, 2024, 04:38 AM(6 days earlier)

Identified

ChatGPT is mostly recovered and we are continuing to work on an overall fix.

Fri, Dec 27, 2024, 02:04 AM(2 hours earlier)

Identified

ChatGPT is recovering and we are continuing to work on an overall fix.

Fri, Dec 27, 2024, 01:11 AM(52 minutes earlier)

Identified

APIs are now operational.ChatGPT is recovering and we are continuing to work on an overall fix.

Fri, Dec 27, 2024, 12:05 AM(1 hour earlier)

Identified

Sora is now fully operational and we are continuing to monitor.APIs are starting to recover.We are continuing to work on an overall fix for ChatGPT and APIs.

Thu, Dec 26, 2024, 11:16 PM(49 minutes earlier)

Identified

ChatGPT is partially recovered while chat history is still not loading.We are continuing to work on a fix for this issue.

Thu, Dec 26, 2024, 10:05 PM(1 hour earlier)

Identified

We are continuing to work on a fix for this issue.

Thu, Dec 26, 2024, 09:05 PM(1 hour earlier)

Identified

We are continuing to work on a fix for this issue.

Thu, Dec 26, 2024, 08:06 PM(59 minutes earlier)

Identified

This issue is caused by an upstream provider and we are currently monitoring.

Thu, Dec 26, 2024, 07:18 PM(48 minutes earlier)

Investigating

We are currently experiencing an issue with high error rates on ChatGPT, the API, and Sora. We are currently investigating and will post an update as soon as we are able.

Thu, Dec 26, 2024, 07:00 PM(17 minutes earlier)

Availability metrics are reported at an aggregate level across all tiers, models and error types. Individual customer availability may vary depending on their subscription tier as well as the specific model and API features in use.