ChatGPT Outage

Resolved·Full outage

We’ve published a write-up of this incidentRead the write-up

Read it here

Affected components

Mar 15, 2023, 06:14 PM

06:20 PM

Updates

Write-up published

Read it here

Resolved

At 10:55am on March 15th, a deploy of the ChatGPT service cut over to a new Redis cluster for additional capacity. This new Redis cluster had a misconfigured maximum connections setting. This was missed on staging, canary, and production dual writing, because we had not yet hit connection thresholds. A rollback to redirect traffic to the original Redis cluster restored service.

An internal audit concluded that this particular setting was the only relevant difference in config between the two Redis clusters. Moving forward, we’re increasing our telemetry on Redis, and auditing all relevant configuration templates to ensure no other updates are needed.

Fri, Mar 17, 2023, 01:46 AM

Resolved

This incident has been resolved.

Wed, Mar 15, 2023, 06:18 PM(1 day earlier)

Monitoring

The fix has been deployed. We are no longer seeing elevated error rates.

Wed, Mar 15, 2023, 06:14 PM

Investigating

We are currently investigating this issue.

Wed, Mar 15, 2023, 05:55 PM(19 minutes earlier)