ChatGPT Availability Impacted

Write-up

Summary

On February 4, 2026, between 9:06 AM and 9:59 AM PT, some users experienced 403 errors when accessing ChatGPT. The issue occurred after a brief reduction in database capacity during a cloud infrastructure maintenance event, triggered increased load and unintended traffic amplification from unexpected internal service retries. We restored availability by scaling capacity and redistributing traffic, and we are implementing additional safeguards to prevent similar incidents in the future.

The issue occurred after a brief reduction in database capacity during a cloud infrastructure maintenance event, triggered increased load and unintended traffic amplification from unexpected internal service retries.

Root Cause

The incident was triggered when a database read replica in one of our regions was taken offline for a routine scheduled maintenance by our cloud provider. ChatGPT relies on multiple database read replicas to serve authentication and session-related requests. When one replica became unavailable, traffic shifted to the remaining replicas, which experienced a sudden increase in load.

This surge led to elevated latency and connection pressure, resulting in request failures in upstream services responsible for handling authentication and user session validation. In some cases, retry behavior in dependent services further increased traffic, amplifying the impact.

Mitigation

We restored service by scaling up database connections and diverting traffic away from the most heavily impacted clusters. As core services recovered, downstream services briefly required additional time to scale back up before stabilizing. The incident was fully mitigated at 9:59 AM PST.

Impact

Elevated errors for ChatGPT users for approximately 53 minutes
Peak global error rate of approximately 18%
Web experienced the highest impact
API services were not affected

Prevention and Next Steps

We are taking the following actions to reduce the likelihood and impact of similar incidents:

Increased database replica capacity in critical regions and making improvements to better distribute load.
Continuing work to reduce ChatGPT’s dependency on this specific database path within authentication flows
Improving automatic traffic routing across regions during localized infrastructure degradation
Strengthening monitoring and early detection such as enhancing monitoring to detect database connection saturation earlier
Improving retry and backoff behavior in upstream services to reduce amplification during partial outages

We regret the disruption and appreciate your patience as we continue to improve system resilience.