ChatGPT is unavailable for some users.
Incident Report for OpenAI
Postmortem

On June 3, 2024, at 11:49 PM PDT, ChatGPT experienced a significant outage affecting all user tiers (paid, enterprise, free, anonymous).

By 4:10 AM PDT, service was fully restored.

A second phase of the outage began a few hours later at 7:14 AM PDT on June 4 again impacting the same user cohorts.

Service was restored for a second time at 10:07 AM PDT.

The issue resulted from a database that ChatGPT depends on becoming unavailable due to traffic surges initiated by the connection pooling service and the way that service was configured.

The team initially attempted to mitigate in a variety of ways, including restarting the primary server and assessing failover options to other replicas. Despite the various attempts at recovery, the primary database continued to be unreachable. We eventually blocked all traffic to ChatGPT to remove all load from the DB and were able to promote a secondary target to be the new primary and began redirecting traffic to it. Re-ramping incoming traffic concluded at 10:07 AM at which time, all services were recovered.

As part of the incident response, we have already implemented the following measures:

  • Tuned the number of connections the pooling service makes to the DB backend.
  • Increased timeouts on connections made to the DB to avoid deadlocks.
  • Implemented exponential backoff, gradually increasing the wait time between subsequent retry attempts for DB connection failures.
  • Modified our load shedding tooling to make it easier to degrade more gracefully.

Additionally, we will be implementing the following changes to prevent future incidents of this type altogether:

  • Re-architect the DB design to increase its redundancy.
  • Improve our ability to load shed at the DB layer (in addition to the clients).
  • Expand the load testing and benchmarking we do for the backend layer.
Posted Jun 11, 2024 - 12:57 PDT

Resolved
We experienced a major outage impacting all users on all plans of ChatGPT. The impact included all ChatGPT related services. The impact did not include platform.openai.com or the API. This incident started June 4th at 2:15p GMT and was resolved June 4th at 5:01p GMT.

UPDATE (5:59p GMT)
A 'hard refresh' may be necessary for users of ChatGPT on web at chatgpt.com. This should not be necessary for anyone using ChatGPT on the Mac app or our mobile (iOS/Android) apps. See below for how to perform a 'hard refresh' by browser.

Mac:
Chrome and/or Firefox = Press Cmd + Shift + R
Safari = Press Cmd + Option + R

PC:
Chrome, Firefox, Microsoft Edge = Press Ctrl + F5

Mobile devices:
To hard refresh in your browser on a mobile device you will need to manually clear the cache before reloading the page.
Posted Jun 04, 2024 - 10:17 PDT
Investigating
We are currently investigating this issue.
Posted Jun 04, 2024 - 07:33 PDT
This incident affected: ChatGPT.