On February 12, 2026, between 1:20 PM and 1:40 PM PST, ChatGPT experienced elevated conversation error rates due to an issue introduced during a routine service deployment. The deployment caused a subset of production clusters, serving the majority of ChatGPT traffic at the time, to reject requests.
The issue was automatically mitigated as the deployment completed and new service instances came online. Full recovery was observed by 1:40 PM PST.
Successful conversation request rates dropped by approximately 35% during the peak impact window (1:20 PM–1:25 PM PST).
Affected clusters experienced elevated error rates, and request retries temporarily increased traffic volume.
Some Codex Cloud operations including creating new tasks, creating pull requests, and listing environments returned elevated error.
Services outside the affected clusters continued operating normally.
During a periodic deployment of an internal service used to support ChatGPT conversations and certain Codex operations, a configuration resource required for request routing was unintentionally removed at the start of the rollout.
As a result:
Existing service instances in the affected clusters were unable to process requests.
Requests routed to those clusters failed until new service instances were started with the correct configuration.
Because the rollout targeted clusters serving a large portion of traffic, the impact was significant but brief.
The issue resolved automatically as new service instances replaced the impacted ones.
The deployment completed, bringing up new service instances with the correct configuration.
Traffic was evacuated from one slower-updating cluster to accelerate recovery.
The rollout was paused to prevent the issue from affecting additional clusters once the behavior was understood.
Full recovery was confirmed by 1:40 PM PST, with conversation error rates returning to normal.
We have implemented the following changes:
Updated deployment configuration to not allow required routing resources to be deleted during upgrades.
Initiated work so that certain service dependencies can degrade gracefully, so that a failure does not cause entire conversation requests to fail.
Improved monitoring and safeguards around service configuration changes during deployments.
We apologize for the disruption and appreciate your patience. We are committed to continuing to improve the resilience of our systems.