Users unable to load ChatGPT, Codex and API Platform

Write-up

Summary

On April 20, 2026, from approximately 7:18 AM to 9:00 AM PDT, OpenAI experienced a service disruption that affected ChatGPT availability for some users. The most significant impact was observed in South America, India, and Europe.

During the incident, ChatGPT availability dropped, and some users were unable to login or signup across other surfaces including platform.openai.com and Codex.

Impact

Between approximately 7:18 AM and 9:00 AM PDT:

Some users experienced errors using ChatGPT.
Some users were prompted to log in again or encountered login and signup failures.
Some users experienced login or signup errors using other products including Codex app, platform.openai.com.

Availability had substantially recovered by 8:40 AM PDT, and overall ChatGPT metrics had fully recovered by 9:00 AM PDT.

Root Cause

The incident was caused by an overload in an internal authentication edge dependency responsible for authorizing ChatGPT requests. A temporary backend disruption exposed insufficient headroom in a critical authentication path, and the system’s retry, caching, and routing behavior amplified the overload preventing the service to recover gracefully.

Mitigation and Recovery

We took several actions to stabilize the system and restore service:

Reduced load on affected backend services by disabling or reverting traffic-increasing changes.
Gradually shifted the traffic from affected edge services to healthier capacity.
Evacuated impacted clusters where appropriate.
Increased cache durations in the authentication edge path to reduce backend pressure.
Restored overloaded backend dependencies to a healthy state, including scaling and failover actions where needed.
Reduced load on login-related backend systems to restore authentication flows.

Authentication systems had fully recovered by approximately 8:20 AM PDT. ChatGPT edge availability had recovered by approximately 8:40 AM PDT, and overall ChatGPT service metrics had fully recovered by approximately 9:00 AM PDT.

Prevention and Follow-up Actions

We are implementing several improvements to reduce the likelihood and impact of similar incidents:

Increase resilience in the authentication edge path: Expand capacity headroom for critical authentication edge services and their dependencies.
Prevent overload amplification: Improve load shedding, circuit breakers, and retry behavior so temporary backend failures do not cascade into broader service degradation.
Improve failure handling: Enable backend availability errors to be surfaced appropriately and not unnecessarily trigger login flows.
Enhance validation and detection: Expand continuous validation, observability, alerting for critical authentication and edge paths.

We are continuing to harden these systems to improve ChatGPT availability during periods of high load or partial infrastructure downtime.

Availability metrics are reported at an aggregate level across all tiers, models and error types. Individual customer availability may vary depending on their subscription tier as well as the specific model and API features in use.

Write-up

Users unable to load ChatGPT, Codex and API Platform

Partial outage

View the incident

Summary

During the incident, ChatGPT availability dropped, and some users were unable to login or signup across other surfaces including platform.openai.com and Codex.

Impact

Between approximately 7:18 AM and 9:00 AM PDT:

Some users experienced errors using ChatGPT.
Some users were prompted to log in again or encountered login and signup failures.
Some users experienced login or signup errors using other products including Codex app, platform.openai.com.

Availability had substantially recovered by 8:40 AM PDT, and overall ChatGPT metrics had fully recovered by 9:00 AM PDT.

Root Cause

Mitigation and Recovery

We took several actions to stabilize the system and restore service:

Reduced load on affected backend services by disabling or reverting traffic-increasing changes.
Gradually shifted the traffic from affected edge services to healthier capacity.
Evacuated impacted clusters where appropriate.
Increased cache durations in the authentication edge path to reduce backend pressure.
Restored overloaded backend dependencies to a healthy state, including scaling and failover actions where needed.
Reduced load on login-related backend systems to restore authentication flows.

Prevention and Follow-up Actions

We are implementing several improvements to reduce the likelihood and impact of similar incidents:

Increase resilience in the authentication edge path: Expand capacity headroom for critical authentication edge services and their dependencies.
Prevent overload amplification: Improve load shedding, circuit breakers, and retry behavior so temporary backend failures do not cascade into broader service degradation.
Improve failure handling: Enable backend availability errors to be surfaced appropriately and not unnecessarily trigger login flows.
Enhance validation and detection: Expand continuous validation, observability, alerting for critical authentication and edge paths.

We are continuing to harden these systems to improve ChatGPT availability during periods of high load or partial infrastructure downtime.