ChatGPT Conversation Issues

Write-up

Summary

On February 12, 2026, between 1:20 PM and 1:40 PM PST, ChatGPT experienced elevated conversation error rates due to an issue introduced during a routine service deployment. The deployment caused a subset of production clusters, serving the majority of ChatGPT traffic at the time, to reject requests.

The issue was automatically mitigated as the deployment completed and new service instances came online. Full recovery was observed by 1:40 PM PST.

Impact

Successful conversation request rates dropped by approximately 35% during the peak impact window (1:20 PM–1:25 PM PST).
Affected clusters experienced elevated error rates, and request retries temporarily increased traffic volume.
Some Codex Cloud operations including creating new tasks, creating pull requests, and listing environments returned elevated error.
Services outside the affected clusters continued operating normally.

Root Cause

During a periodic deployment of an internal service used to support ChatGPT conversations and certain Codex operations, a configuration resource required for request routing was unintentionally removed at the start of the rollout.

As a result:

Existing service instances in the affected clusters were unable to process requests.
Requests routed to those clusters failed until new service instances were started with the correct configuration.
Because the rollout targeted clusters serving a large portion of traffic, the impact was significant but brief.

The issue resolved automatically as new service instances replaced the impacted ones.

Mitigation

The deployment completed, bringing up new service instances with the correct configuration.
Traffic was evacuated from one slower-updating cluster to accelerate recovery.
The rollout was paused to prevent the issue from affecting additional clusters once the behavior was understood.

Full recovery was confirmed by 1:40 PM PST, with conversation error rates returning to normal.

Prevention and Improvements

We have implemented the following changes:

Updated deployment configuration to not allow required routing resources to be deleted during upgrades.
Initiated work so that certain service dependencies can degrade gracefully, so that a failure does not cause entire conversation requests to fail.
Improved monitoring and safeguards around service configuration changes during deployments.

We apologize for the disruption and appreciate your patience. We are committed to continuing to improve the resilience of our systems.

Availability metrics are reported at an aggregate level across all tiers, models and error types. Individual customer availability may vary depending on their subscription tier as well as the specific model and API features in use.

Write-up

ChatGPT Conversation Issues

Degraded performance

View the incident

Summary

The issue was automatically mitigated as the deployment completed and new service instances came online. Full recovery was observed by 1:40 PM PST.

Impact

Successful conversation request rates dropped by approximately 35% during the peak impact window (1:20 PM–1:25 PM PST).
Affected clusters experienced elevated error rates, and request retries temporarily increased traffic volume.
Some Codex Cloud operations including creating new tasks, creating pull requests, and listing environments returned elevated error.
Services outside the affected clusters continued operating normally.

Root Cause

As a result:

Existing service instances in the affected clusters were unable to process requests.
Requests routed to those clusters failed until new service instances were started with the correct configuration.
Because the rollout targeted clusters serving a large portion of traffic, the impact was significant but brief.

The issue resolved automatically as new service instances replaced the impacted ones.

Mitigation

The deployment completed, bringing up new service instances with the correct configuration.
Traffic was evacuated from one slower-updating cluster to accelerate recovery.
The rollout was paused to prevent the issue from affecting additional clusters once the behavior was understood.

Full recovery was confirmed by 1:40 PM PST, with conversation error rates returning to normal.

Prevention and Improvements

We have implemented the following changes:

Updated deployment configuration to not allow required routing resources to be deleted during upgrades.
Initiated work so that certain service dependencies can degrade gracefully, so that a failure does not cause entire conversation requests to fail.
Improved monitoring and safeguards around service configuration changes during deployments.

We apologize for the disruption and appreciate your patience. We are committed to continuing to improve the resilience of our systems.