On July 15, 2025, ChatGPT experienced reduced availability between 7:43 PM and 8:38 PM PDT due to an invalid configuration value used by a core service. This issue caused new server instances to fail when restarting, resulting in elevated error rates for a small portion of ChatGPT users. We resolved the issue by restoring the correct configuration and forcing an immediate resynchronization of affected services. We are implementing stronger safeguards around critical configuration updates, splitting configuration shared between services, and improving deployment safeguards to prevent similar issues.
Increased error rates for ChatGPT conversations for a small percentage of users for approximately 55 minutes from 7:43 PM and 8:38 PM PDT.
An internal engineer applied a configuration change with an invalid value. The configuration value was read by multiple services which expanded the blast radius. The invalid value quickly propagated, causing pods to enter crash loops and leading to elevated error rates for a subset of ChatGPT users.
The misconfiguration was identified and reverted to the correct value at 7:12 PM PDT.
A rolling restart of affected pods was initiated to accelerate recovery, with full availability restored by 8:38 PM PDT.
We are continuing work to prevent this from happening in the future.
Future critical configuration changes for the service will roll out to a single cluster first with automated health guards that halt further rollout if availability degrades.
Migrate to service-specific configuration value to minimize cross-service dependency risks.
We sincerely apologize for the disruption and are improving our safeguards to prevent this from happening in the future.