Increased errors for ChatGPT

Resolved·Full outage

We’ve published a write-up of this incidentRead the write-up

Read it here

Affected components

Jan 23, 2025, 01:12 PM

04:05 PM

Updates

Write-up published

Read it here

Resolved

Impact

Between 3:30am PT and 7:08am PT chatgpt.com and a small number of APIs experienced elevated error rates.

‌

Root Cause & Remediation

The root cause of the initial issue was a failure with Cosmos DB in ChatGPT. Our service provider recovered the affected DBs at 4:25am PT.

‌

The second issue was caused by web service pods crash-looping due to their health checks failing, marking pods as unhealthy and subsequently not having enough resources available to service all the requests to the web layer. At this time we believed this was due to users retrying requests. These issues recovered after we implemented a fix to the web server’s health checking logic.

‌

Prevention

We will make a series of changes to the web systems to more gracefully degrade when they fail including concurrency control and better load shedding.
A larger investment in making our DB layer more resilient to service provider outages is already underway.

‌

We know that extended outages can impact your products, businesses, and lives. We will prioritize the preventative measures outlined above to continue improving our reliability.

Thu, Jan 30, 2025, 03:57 PM

Resolved

This issue has now been resolved. Between 4:23am and 7:10am PST, customers experienced elevated errors on ChatGPT

Thu, Jan 23, 2025, 04:05 PM(6 days earlier)

Monitoring

We are continuing to monitor for any further issues.

Thu, Jan 23, 2025, 03:11 PM(54 minutes earlier)

Monitoring

A fix has been implemented and we are monitoring the results.

Thu, Jan 23, 2025, 03:09 PM

Identified

We are continuing to work on a fix for this issue.

Thu, Jan 23, 2025, 02:34 PM(34 minutes earlier)

Identified

We have identified the root cause of this issue, and are currently working to implement a fix.

Thu, Jan 23, 2025, 01:43 PM(51 minutes earlier)

Investigating

We are currently experiencing elevated error rates for ChatGPT. We are currently investigating.

Thu, Jan 23, 2025, 01:12 PM(30 minutes earlier)